fbpx

Research Papers Library

Estimating deep web data source size by capture-recapture method

 

This paper addresses the problem of estimating the size of a deep web data source that is accessible by queries only. Since most deep web data sources are noncooperative, a data source size can only be estimated by sending queries and analyzing the returning results. We propose an efficient estimator based on the capture-recapture method. First we derive an equation between the overlapping rate and the percentage of the data examined when random samples are retrieved from a uniform distribution. This equation is conceptually simple and leads to the derivation of an estimator for samples obtained by random queries.

Download PDF

 

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media