Internet & Search
Twitter stopped tracking URLs clicks on Twitter.com
It looks like Twitter stopped tracking clicks on the URLs shared by the users on its own website.
Given the recent run of the URLs shorteners towards analytics products and the promises of Twitter about the “resonance” on the ADs platform this seems a very odd move.
Maybe they were overwhelmed with data/logs and thought about turning this feature off for now since they were not using it. It is also possible that the traffic on the website (and thus the clicks on the links) is so small that it was not worth to keep the service up for all the URLs (maybe they will just track clicks on ADs links).
Web Pages Language Classification: Bayes, Characters and n-grams
Most search engines start focusing on only one language (e.g., English) because it is simpler, requires almost no characters encoding, and has a wide audience. Wherever you want to index only pages written in English or support all the language of the world, a fast page language classification is one of the first tasks that you will have to deal with.
Simple word-based classification techniques like Naive Bayes will do the trick but require a very big training set especially for the foreign languages. Even if in the last years memory and processing power became less and less expensive, they are not free and especially in a startup you may need to optimize every single function.
For this reason, you may want to consider characters classification. In its simplest form, you just want to compute the frequency in which alphabet characters appear in each language and then compute the distance (e.g., geometric distance) between the text and your models. The memory requirement of this solution are very small (i.e., a float for each of the 26 letters) and CPU can be easily bounded as well (e.g., you can stop after N characters of the input text) trading off precision for speed.
If that is not enough, n-grams of characters (e.g., sequences of 2 or 3 adjacent characters) will probably work even better but require an higher memory footprint.
The following graph shows the frequency of the alphabet letters across the 5 most common European languages. For some letters the difference in usage is pretty high, e.g., the letter “A” is used twice as much in Spanish than in German while “H” is frequently used in German and English but almost never used in the other languages.
TurboTax costs $5 less without Cookies

With the exception of last year (changing job and state, I needed professional help) I always filed my taxes using TurboTax. It is nice, simple to use, and reasonably priced.
This year was no exception, but before finding out that Chase and Bank of America customers have a 35% discount on it, I went on their website to checkout the prices. Oddly, I discovered that visiting www.turbotax.com without accepting cookies shows lower prices ($5 less) than the ones offered to whom visits it with cookies enabled.
Glitch in the system? Naaaaa, I say it is an A/B comparison to see what customers are willing to pay.
Google integrates Profile Results and search links to Social Networks
Looking for names on Google now shows a “Profile Search” results box with the two best hits, an invite to create your Google Profile (in case you are doing a vanity search) and quick links to MySpace, Facebook, Classmates and LinkedIn search pages.
Google Search Suggestions: Men are More Worried about Manhood than IQ
Analysing query logs is really amusing some times. This is a screenshot from Google’s Search Suggestions for queries which start with the word “average”.
Here are a few extrapolation from these suggestions:
1) Men are more worried about the length of their penis than their IQ.
2) Height is more important for men, weight for women.
3) Salary is a concern only once you have established that you have a big penis, you are smart and tall.




