Tag Mirror

LibraryThing (an online service to help people catalogue their books easily) recently launched a very useful feature that they call “Tag Mirror“. This is one of the more interesting things that has been done with tags. In fact, I would wager that this is one of the best thing to happen to tagging since tag clouds came along.

Tag Mirror “holds a mirror” up to your books and to you. Instead of showing what you think about your books—what a regular tag cloud shows—it shows you what others think of them.

Compare my tag cloud with my tag mirror and you’ll instantly see just how useful this is. My tag cloud shows my perspective of the books that I have while the tag mirror shows the world’s (the LibraryThing community) perspective of the same.

Take all the entities that you have tagged (in this case, books), pull in the tags that others have used for them and you have your tag mirror. Simple. Yet very powerful. This seems like such an obvious thing to do, that it is almost surprising that no one has ever done it! (Or am I plain ignorant?)

Additional notes:

  1. Check out Thingology — LibraryThing’s ideas blog, on the philosophy and methods of tags, libraries and suchnot.
  2. LibraryThing has done some other fun things with tags (and other data in general). Their “tag merge” feature allows their users to group different tags that are in fact not that different. The usefulness is obvious.
  3. LibraryThing is a rather cool site.
Posted in Information Extraction | Tagged | Leave a comment

The Dude Experiment

A friend had once shown me something very interesting: no matter how many [u]s you use in [dude], Google always has results for it! (One of those results for some twenty-odd [u]s was a blog entry of his — that’s how he came up with this). I just decided to run this experiment using the Google Search API and record the results. You can see the results (from 1 [u] to 900; Google only allows 1000 searches per day per IP :( ) at http://spreadsheets.google.com/pub?key=p0O6ZaIrydVN03TRYGjvBPA.

The Dudes

But what is this supposed to show? It’s not one thing, it’s many :) . Draw your own conclusions!

Posted in Information Retrieval | Tagged | 2 Comments

Rationality and social networks

Well, let me start off my first post by taking an inclusive definition of IR and including social networks as part of IR. (In fact, Aditya’s thesis is primarily based on the assertion that IR is generally a social activity rather than a database query).

There are these axioms in the theory of rational choice:

Rational choice theory makes two assumptions about individuals’ preferences for actions. First, is the assumption of completeness, that is that all actions can be ranked in an order of preference (indifference between two or more is possible). Second, is the transitivity, the assumption that if action a1 is preferred to a2, and action a2 is preferred to a3, then a1 is preferred to a3.

(Source: Wikipedia page http://en.wikipedia.org/wiki/Rational_choice_theory)

Focus on the second axiom. If you observe me choosing a1 over a2 whenever presented a choice between a1 and a2; and choosing a2 over a3, whenever presented a choice between a2 and a3, you conclude that I would prefer a1 over a3 whenever presented a choice between a1 and a3.

If instead, I choose a3, when presented a choice between a1 and a3, you conclude that I am irrational.

I. don’t. think. so. :-)

Hint: Especially when a1, a2 and a3 are people, rather than inanimate things.

We had a long debate about it yesterday in the lab and I finally found an example. Later over dinner, my wife found many more examples — all involving people, though. Not inanimate things.

Let me not give out the examples right now, just to let your minds get tickled a bit..

Posted in Uncategorized | Tagged , | 2 Comments

Tools/Libraries for IR

We’ve created a page to gather together some of the tools that a researcher or engineer working on IR problems might find useful. Hopefully this will be useful to many.

Tools/Libraries for Information Retrieval

We will be updating that page as we come across more tools.

We are enabling comments on the page, please leave comments about any of those tools or to point out any (of the many) that are missing from there.

Posted in Tools | Tagged , | 2 Comments

Interesting Papers on Web Spam at AIRWeb 2007

AIRWeb (Adversarial Information Retrieval on the Web) is workshop on IR in the world of Web Spam. From the call for papers page:

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is “search engine spamming” or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.

The list of papers to be presented is released and there are some very interesting ones. Search Engine Land has posted a nice listing.

Posted in Information Extraction | Tagged , | 1 Comment

Hello world!

There already are so many blogs on search why another? (just check out the blogroll on searchengineland.com)

There are at least a couple of reasons.

Information Retrival (or IR) is not exactly search. It is search and more. Much more.

Semantics aside, there is a second and probably more important reason: none of all those blogs discuss the tech, at least not to the extend this blog is intended to do. They are more focussed on search marketing, search engine news etc. and often present the view of/for a marketer, webmaster or user.

I guess the purpose of this blog will be better justified by the posts to come than my scribblings. So I will leave it at this.

Posted in Uncategorized | Tagged | Leave a comment
  • About grok.in

    This is a blog primarily focussed on the subjects of Information Engineering—Retrieval, Extraction & Management, Machine Learning, Scalability and Cloud Computing.