Monthly Archives: June 2008

Web Crawling with Perl

If you are looking to write a web crawler, Perl, with all its great CPAN modules, is one of the best platforms you can pick. There are CPAN modules for most of the common components of a web crawler. Here, I’ll point to some of the modules that you would want to start out with.
Read [...]

Posted in Crawling | Tagged , | 3 Comments

Introduction to Web Crawling

In the context of the World Wide Web, crawling refers to gathering web pages, by following hyperlinks, starting from a small set of web pages, for the purposes of further processing. For example, a Web search engine needs to gather as many pages as possible before it indexes and makes them available for searching.
A program [...]

Posted in Crawling | Tagged | 5 Comments
  • About grok.in

    This is a blog primarily focussed on the subjects of Information Engineering—Retrieval, Extraction & Management, Machine Learning, Scalability and Cloud Computing.