<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Machine Learning: Classification</title>
	<atom:link href="http://www.grok.in/blog/2008/03/27/machine-learning-classification/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.grok.in/blog/2008/03/27/machine-learning-classification/</link>
	<description>(ignorance killed the cat, curiosity was framed)</description>
	<lastBuildDate>Wed, 16 Jun 2010 18:48:41 +0000</lastBuildDate>
	
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Will Dwinnell</title>
		<link>http://www.grok.in/blog/2008/03/27/machine-learning-classification/comment-page-1/#comment-96</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Sat, 05 Apr 2008 09:30:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.grok.in/blog/2008/03/27/machine-learning-classification/#comment-96</guid>
		<description>In the particular case of spam detectors, one very common input data representation is called &quot;bag of words&quot;, which is just a list of the distinct words in the message, or a 0/1 vector representing the same.  Some pre-processing will likely precede the construction of a bag of words, such as removal of very common words (&quot;the&quot;, &quot;of&quot;), de-stemming (&quot;walking&quot;, &quot;walks&quot; and &quot;walked&quot; become &quot;walk&quot;) and selection of words with high predictive value (&quot;Viagra&quot;).</description>
		<content:encoded><![CDATA[<p>In the particular case of spam detectors, one very common input data representation is called &#8220;bag of words&#8221;, which is just a list of the distinct words in the message, or a 0/1 vector representing the same.  Some pre-processing will likely precede the construction of a bag of words, such as removal of very common words (&#8220;the&#8221;, &#8220;of&#8221;), de-stemming (&#8220;walking&#8221;, &#8220;walks&#8221; and &#8220;walked&#8221; become &#8220;walk&#8221;) and selection of words with high predictive value (&#8220;Viagra&#8221;).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Siddhartha Reddy</title>
		<link>http://www.grok.in/blog/2008/03/27/machine-learning-classification/comment-page-1/#comment-61</link>
		<dc:creator>Siddhartha Reddy</dc:creator>
		<pubDate>Sun, 30 Mar 2008 04:22:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.grok.in/blog/2008/03/27/machine-learning-classification/#comment-61</guid>
		<description>@sanket,

The training data is made up of entities that need to be classified. So, in the case of the spam classifier, the training data is emails labeled as &#039;spam&#039; and &#039;not-spam&#039;.

Words/sets-of-words are used as &#039;features&#039; for the classification process. What this means is that the classification is made based on the occurrence of words.

Feature selection is an important and usually non-trivial step in classification. I&#039;ll be posting about this later on.</description>
		<content:encoded><![CDATA[<p>@sanket,</p>
<p>The training data is made up of entities that need to be classified. So, in the case of the spam classifier, the training data is emails labeled as &#8217;spam&#8217; and &#8216;not-spam&#8217;.</p>
<p>Words/sets-of-words are used as &#8216;features&#8217; for the classification process. What this means is that the classification is made based on the occurrence of words.</p>
<p>Feature selection is an important and usually non-trivial step in classification. I&#8217;ll be posting about this later on.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sanket</title>
		<link>http://www.grok.in/blog/2008/03/27/machine-learning-classification/comment-page-1/#comment-58</link>
		<dc:creator>sanket</dc:creator>
		<pubDate>Sat, 29 Mar 2008 18:41:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.grok.in/blog/2008/03/27/machine-learning-classification/#comment-58</guid>
		<description>But what is the training data in this case? Words? Sets of words?</description>
		<content:encoded><![CDATA[<p>But what is the training data in this case? Words? Sets of words?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
