Load-balancing by applying a bayesian learning automata (BLA) scheme in a non-stationary web-crawler network
MetadataVis full innførsel
Distributed Web-Crawlers, i.e. , Web-Crawler Networks, have been known to retrieve massive amount of web-data to centralized search indexes the last decade. Companies like Google, Yahoo and lately Integrasco, have built their business model upon this retrieval. However, to load-balance Web-Crawler Networks have been proved difficult. Especially difficult are the geographical distributed Web-Crawler Networks that newly have emerged. In geographically distributed Web-Crawler Networks, the load-balancing algorithm need to consider that the capacity is constantly changing (non-stationary), and location-aware retrieval. The novel approach in this thesis, is to apply the machinelearning technique BLA-Kalman to dynamically load-balance Web-Crawler Networks. We apply the technique by combining the domain of load-balancing, with machine learning concepts and Web-Crawler Networks. A prototype algorithm named KALMANBLAWLB is designed, and tested in a simulated environment. We measure; how fair the KALMAN-BLAWLB is able to load-balance, system utilization and scalability. KALMANBLAWLB outperform all the algorithms that we are able to test in the simulated environment. Finally we conclude that KALMAN-BLAWLB is able to fairly load-balance, achieve a decent system utilization and is scalable, but further tests are needed to confirm large-scale usage.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad