Research Papers Library

Identifying Similarities, Periodicities and Bursts for Online Search Queries

 

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving’ or 'Christmas gifts’) where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to ef- ficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform ’query-by-burst’ on the database of timeseries. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

Download PDF

 

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Newsletter Subscription

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now