Machine Learning is a branch of Computer Science that is concerned with designing systems that can learn from the provided input. Usually the systems are designed to use this learned knowledge to better process similar input in the future. Machine learning can be considered as a subfield of Artificial Intelligence.
A very familiar example is the email spam-catching system: given a set of emails marked as spam and not-spam, it learns the characteristics of spam emails and is then able to process future email messages to mark them as spam or not-spam.
Google Sets is a real fun experiment from the Google Labs. It basically allows you to “automatically create sets of items from a few examples.” So you can enter “Sachin Tendulkar”, “Rahul Dravid” and “Sourav Ganguly,” and, be presented with a much larger set of the players of the Indian Cricket Team. Or enter “Athens”, “Sydney”, “Atlanta”, “Barcelona” and “Tokyo” to get a much larger set of the cities that have hosted (or will be hosting) Olympics. Neat.
OpenDNS is a free alternative to the DNS resolution service that your ISP provides you.
Update: OpenDNS has recently added an option to turn off the “ugly” proxying I describe below. See David Ulevitch’s comment below.
The Good:
OpenDNS is fast and reliable, more than the service offered by any ISP I have used. In addition, it offers a host of other features: Content Filtering, Phishing Protection, Domain Blocking, Adult Site Blocking, Web Proxy Blocking, Domain Whitelisting, Statistics, Typo Correction, Web Shortcuts.
I have been using OpenDNS for well over an year now although I hardly use any of the advanced features they offer. It’s worked out quite well.
A lot of the searches I do every day are navigational. What this means in plain English is that I do a search to find a particular web page/site and then just navigate to it. This is in contrast to the exploratory searches where I usually end up visiting more than one of the search results. Navigational search does not imply that I know beforehand which page to go to, it might be that I had a hunch such a page would exist or I discover that from the search results. Some examples of the former are when I want to navigate to the page that discusses a particular topic on Wikipedia or a the page of particular movie/actor on IMDB; the later happens when I find such pages in the search results. Because of a couple of beautiful features in Firefox and Google, I end up not seeing the search result pages at all for most of the navigational searches.
MapReduce is a software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on unreliable clusters of computers. This framework is largely taken from map and reduce functions commonly used in functional programming.
That description, though quite accurate, does not do justice to MapReduce.
Neither my iPod nor my phone support the flv format in which you can download videos from Youtube and several other video sites (using the All-In-One Video Bookmarklet).
I found a very good ruby script that can convert from the flv (or any other format) to mp4 (the format that iPod supports): mp4ize. You can specify the target resolution that you want by editing the file (default is 320×240) and the script will convert the original video to this resolution (by padding the video if the original aspect ratio is different from what you want).
All is good, except that the script fails when the resolution of the original video is less than the resolution that you want (some videos on Youtube are in the weird resolution of 288×240). So I just fixed up the script. You can download it below.
This is a must watch video by Michael Wesch, Assistant Professor of Cultural Anthropology at the Kansas State University.
This video explores the changes in the way we find, store, create, critique, and share information. This video was created as a conversation starter, and works especially well when brainstorming with people about the near future and the skills needed in order to harness, evaluate, and create information effectively.
Do you find yourself sending out frequent emails which broadly fall under this pattern: the subject is something on the lines of ‘Check this out’ and the body contains a URL and hardly anything else?
Here is simple bookmarklet that will make sending out those emails easier, if you use GMail that is. If you click on this while you are on a certain page, it will open a new GMail compose window and fill out the subject line with the title of the page (prepended with [Check this out]) and fill out the body with the URL of the current page as well as any text you might have selected.
The Database Column is a new “multi-author blog on database technology and innovation.” What makes this a great resource is the amazing list of authors, which includes bigwigs from the age-old (sic) database industry like Michael Stonebraker, Jerry Held and Don Haderle from. (For those who do not know, Stonebraker and Held were the architects of INGRES and POSTGRES, the relational database management systems (RDBMSs) that started it all; Haderle was the architect of DB2.) Well known academics like Mitch Cherniack, David Dewitt, Samuel Madden, Stan Zdonik are also writing there.
This blog seems very promising. The first main post on the blog was almost proclaiming that the end of RDBMS is near and that the Column-Oriented DBMS is the next bug thing, at least for the Data Warehousing applications. (Note that Stonebraker is now the CTO of Vertica, a Column-Oriented DBMS company.) They followed it up with a couple of interesting posts on compression in Column-Oriented DBMSs.
This is a blog primarily focussed on the subjects of Information Engineering—Retrieval, Extraction & Management, Machine Learning, Scalability and Cloud Computing.