I recently came across an interesting post by Mark Johnson, senior product manager at Powerset:
While demoing Live Search at the Web 2.0 Expo, people continually asked the same questions: “What makes Live different?” or “Show me some features that will make me want to switch from my search engine” or the extremely confrontational “Why do [...]
A couple of weeks ago I participated in an interesting discussion on Cloud Computing at an unconference in Bangalore. Though the discussion was to be on “whether Cloud Computing is inevitable or not”, we hardly got past defining it! That just about demonstrates the confusion that surrounds Cloud Computing — it isn’t even [...]
A lot of applications have a requirement to search the full-text of some content they have for some words it might contain. This kind of functionality is often referred to as full-text search. For example, a blogging software might need to provide a search functionality that searches the blog posts for the user entered query [...]
Over at “Coding Horror” blog, Jeff Atwood published an interesting article titled “Maybe Normalizing Isn’t Normal“.
But more than the article itself, the debate that ensued in the comments there is very interesting. The “High Scalability” blog published a compilation of some of the interesting quotes from the debate. This compilation provides a great overview of [...]
If you are looking to write a web crawler, Perl, with all its great CPAN modules, is one of the best platforms you can pick. There are CPAN modules for most of the common components of a web crawler. Here, I’ll point to some of the modules that you would want to start out with.
Read [...]
Posted in Crawling | Tagged Crawling, perl |
In the context of the World Wide Web, crawling refers to gathering web pages, by following hyperlinks, starting from a small set of web pages, for the purposes of further processing. For example, a Web search engine needs to gather as many pages as possible before it indexes and makes them available for searching.
A program [...]
Posted in Crawling | Tagged Crawling |
Statistics can be quite bewildering. Consider the following problem:
It is given that if a person having a disease takes a diagnostic test for the disease, the test returns a positive result 99% of the time, or with a probability of 0.99. Now, for some person picked at random, if the test returns a positive result, [...]
sanket asked a very interesting question in the comments to my previous post on Monty Hall Problem:
Assume that boys and girls are equally likely to be born. Let us say that a family has two children. Given that one of them is a boy, what is the probability that the other one is a boy [...]
I recently came across a very interesting problem known as “The Monty Hall Problem.” This is a statistical puzzle named after the host of an old television show “Let’s Make a Deal” which featured a similar problem albeit a little more involved than the basic version that mathematicians use. Here is a simple description of [...]
Anand Rajaraman, who teaches a class on Machine Learning at Stanford, recently wrote an interesting blog post: More data usually beats better algorithms, he claimed. The post makes for an interesting read and so do the plethora of comments on it. He made a follow-up post, which is equally interesting.
I do agree with [...]