To analyze a speech or dialog between people, we need to analyze what keywords were used in continuous flow of communication. One way is to use the heavy ASR(automatic speech recognition) systems to transcribe the entire speech. Apart from them being heavy, its impossible to tune these models for detecting domain specific keywords or rare keywords (which are often the most important). We solve this by building a keyword spotting algorithm which works by detecting keywords on flowing speech. Keyword spotting algorithms are what virtual assistants like Alexa and Cortana use to trigger their heavy ASR algorithms (by trigger words like "Alexa!" or "Ok Google"). We tune the method to train these algorithms so that they can be used to analyze what keywords are used in running speech than keywords in a distinct sentence like "Hi Alexa". The other constraint that we had while making this algorithm was making it easy and fuss-free to train. The training data set is a couple of utterances of the keywords to be analyzed by a set of 40 people. In a open demo we present here, our algorithm can detect a large list of keywords (related to Samsung) in sentences. So for example, if you say "Is UHD Television available ?" or "UHD TV hai kya?" (in Hindi) or in any other language, it should be able to catch the "UHD" keyword. Detecting specific words like UHD (and a lot of listed words listed in the demo) is hard in normal ASR systems.
This technology can be used to analyze interviews, audio transcripts of discussions, mystery shopper interaction, salesman pitches and in smart IVRS.Our algorithm consists of finetuning open models with a combination of metric loss and prototype loss on a synthetic dataset to achieve the desired results.