Author Archives: Siddhartha Reddy

Machine Learning: Classification

Machine Learning is a branch of Computer Science that is concerned with designing systems that can learn from the provided input. Usually the systems are designed to use this learned knowledge to better process similar input in the future. Machine learning can be considered as a subfield of Artificial Intelligence.
A very familiar example is the [...]

Posted in Databases, Machine Learning | Tagged | 3 Comments

Fun with Google Sets

Google Sets is a real fun experiment from the Google Labs. It basically allows you to “automatically create sets of items from a few examples.” So you can enter “Sachin Tendulkar”, “Rahul Dravid” and “Sourav Ganguly,” and, be presented with a much larger set of the players of the Indian Cricket Team. Or enter “Athens”, [...]

Posted in Information Extraction, Information Retrieval, Machine Learning | Tagged , | Leave a comment

OpenDNS: the Good, the Bad and the Ugly

OpenDNS is a free alternative to the DNS resolution service that your ISP provides you.
Update: OpenDNS has recently added an option to turn off the “ugly” proxying I describe below. See David Ulevitch’s comment below.
The Good:
OpenDNS is fast and reliable, more than the service offered by any ISP I have used. In addition, it offers [...]

Posted in Tips | Tagged | 9 Comments

Get lucky, navigate with Firefox and Google

A lot of the searches I do every day are navigational. What this means in plain English is that I do a search to find a particular web page/site and then just navigate to it. This is in contrast to the exploratory searches where I usually end up visiting more than one of the search [...]

Posted in Tips | Tagged , | 3 Comments

BarCamp Hyderabad 5

BarCamp Hyderabad #5 is being held on the 16th of this month. Register at BarCampHyderabad5.
By sheer coincidence, I am in Hyderabad that day and am planning to attend.

Posted in Uncategorized | Tagged , | Leave a comment

MapReduce and Scale

Lately, I have become extremely interested in MapReduce, specifically the open source implementation of this in Hadoop.
From Wikipedia (MapReduce):
MapReduce is a software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on unreliable clusters of computers. This framework is largely taken from map and reduce functions commonly used [...]

Posted in Scalability, Tools | Tagged , | 2 Comments

[Hack] Convert any video file for an iPod

Information should be consumable any which way.
Neither my iPod nor my phone support the flv format in which you can download videos from Youtube and several other video sites (using the All-In-One Video Bookmarklet).
I found a very good ruby script that can convert from the flv (or any other format) to mp4 (the format that [...]

Posted in Hacks | Tagged | Leave a comment

Information R/evolution

This is a must watch video by Michael Wesch, Assistant Professor of Cultural Anthropology at the Kansas State University.
This video explores the changes in the way we find, store, create, critique, and share information. This video was created as a conversation starter, and works especially well when brainstorming with people about the near future and [...]

Posted in Information Retrieval | Tagged , , | Leave a comment

[Hack] “Check this out!”

Do you find yourself sending out frequent emails which broadly fall under this pattern: the subject is something on the lines of ‘Check this out’ and the body contains a URL and hardly anything else?
Here is simple bookmarklet that will make sending out those emails easier, if you use GMail that is. If you click [...]

Posted in Hacks | Tagged | Leave a comment

The Database Column

The Database Column is a new “multi-author blog on database technology and innovation.” What makes this a great resource is the amazing list of authors, which includes bigwigs from the age-old (sic) database industry like Michael Stonebraker, Jerry Held and Don Haderle from. (For those who do not know, Stonebraker and Held were the architects [...]

Posted in Databases | Tagged , | Leave a comment
  • About grok.in

    This is a blog primarily focussed on the subjects of Information Engineering—Retrieval, Extraction & Management, Machine Learning, Scalability and Cloud Computing.