Research Papers Library

Subword and Phonetic Search for Detecting Out-of-Vocabulary Keywords

We compare several approaches, separately and together, for spotting of out-of-vocabulary (OOV) keywords, in terms of their ATWV scores. We considered three types of recognition units (whole words, syllables, and subwords of different lengths) and two basic search strategies (whole-unit, fuzzy phonetic search). In all cases, the search was performed by collapsing the recognition lattice into a consensus network, either in terms of the recognized whole units, or by first splitting the recognized units into phonemes. We ran experiments on five languages, for which the language model and vocabulary were derived from only 10 hours of transcriptions (70k-100k words of text), resulting in keyword OOV rates varying from 10% to 63% on new data, depending on the language. Our conclusions were that: 1) In all cases, the fuzzy phonetic search on phoneme-split lattices is better than searching for the whole units, 2) The syllable units are the best of the subword units for OOV keyword detection using fuzzy phonetic search, and 3) These methods combine very well, sometimes resulting in ATWV scores for OOV terms which are not too far below those of IV terms.

Download PDF

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar - GET 70% OFF FOR MEMBERS ONLY      Register Now