Archive for May, 2010

Google Reader tells Google what you Like and what to Index

Google ReadersWhile most of the Facebook‘s generation kids who populate the Internet nowadays have no idea about what RSS feeds are (but likely “follow” CNN on Twitter), there is still some percentage of tech-savvy people (me included) who take a look to their favorites feeds every morning before starting their day.

Not many RSS feeds reader product exists, and the few are pretty much all the same. The most widely used online RSS aggregators are probably Google Reader and Bloglines.

While Bloglines is clearly supported by online advertising, why do you think Google created its own for free? Yes, you guessed it: to get your traffic information.

They probably use the number of people subscribed to each RSS feed and the frequency of their visits to Google Reader to optimize the frequency of refresh (i.e., when and how often they should recrawl it) for that particular feed/domain. Then, they look at how many people open each post/link and use that information to make decisions on its priority in the crawling queue or ranking of those pages.

I bet there are a lot of subscribers to the CNN feed and some of them log in pretty often. This probably makes its RSS feed refresh rate very high and the number of clicks that each article receives indicate their priority in crawling and has some influence in their PageRank. After all, if 10,000 people looked at a the title/snippet of a piece of news and followed through, it must be interesting no? Conversely, if everyone skipped it, it must be not.

In addition to this, with every click that you do (or do not) they learn something more about you and which kind of content you like. Since Google Reader is hosted on the same domain as all the others Google product (i.e., www.google.com) the cookies that they setup after your login will follow you everywhere there are Google ADs.

They will know even more about you and show “better and better” contextual advertisements.

Tags : , , , , , , , , ,

NASDAQ or S&P500 Components List: How to Download it from Yahoo! Finance

Stocks TickerIn many are trying to apply Machine Learning Algorithms to the stock market to learn and predict the fluctuation of stocks. Yahoo! Finance is of great help because allows you to download in simple CSV files price and volume of all the stocks going back a few years.

The API is easy to use and to script but unfortunately there is often no list of symbols available to start with. Those lists exist on the sites of the markets (e.g., here is the one for NASDAQ) but they are hard to piece up together.

Luckily, with a could of lines of bash it is possible to create one from the list of components of the NASDAQ or S&P500. The following is the code necessary:

for p in $(seq 0 50 2739); do
    wget -O - "http://download.finance.yahoo.com/d/quotes.csv?s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=$p";
done >> /tmp/nasdaq.symbols

This script will loop in step of 50 from 0 to 2739 (numbers of element in the components list of NASDAQ) and downloading the list of symbols in each page and concatenating into the file /tmp/nasdaq.symbols.

The URL to download can be found at the bottom in pages list this one and you just need to substitute the &h=50 with &h=$p, which is the iterator variable.

Tags : , , , , , , , ,

Google Maps Traffic data is crowd-sourced through cell-phones’ Google Apps

San Francisco Traffic on Google Maps Google Maps have been able to display traffic conditions, even on small road, for quite a while. How does Google know that? The technology behind it is very interesting.

They use GPS/time data sent back from people using Google Maps Apps on their phones (iPhones, Blackberry, Android, …).

At regular intervals the phone sends to Google its GPS coordinates and with simple calculations they can figure out how long it took to you to move from point A to point B.

If that happens to be on a road, and your speed seems to be the one of a car, they can use the data to estimate the traffic on the road, comparing the current data with historical ones.

Tags : , , , ,

Facebook will use the Like Button to Personalize Search and Improve ADs

Facebook LogoThere has been a lot of chatting around the new Facebook’s Like Button. Some people believe it will be great for SEO, other that will increase distribution on Facebook, etc…

But Facebook is smarter than that: they want to create a great search engine, possibly a personalized one, and improve their ADs platform.

Traditional search engines spend a lot of time crawling the web, discovering new pages or updating old ones, and computing the pagerank of each one. What if you could have a graph of the web, updated in real time, with the counts of how many people have been on each page?

This is what Facebook is aiming to do. They do not care if you click that button or not. When the browser renders the page (the button is in a iFrame), it sends a request to Facebook’s server telling them a lot of info about you (e.g., your browser, which page are you on, which language you understand, your IP, your screen resolution, …) and possibly even who you are (e.g., because you logged on Facebook and you still have the cookies around).

Since plugins for the Like button are already widespread for popular content management softwares (e.g., WordPress, Blogger, …) I am sure there will be a wide adoption by content creators.

Facebook will know in real-time about all the new pages created, how many people are visiting them, and who they are (even if you do not have a Facebook account since your browser information are pretty unique). This will allow them to prioritize the crawling and refresh of the pages, compute the ranking based on the popularity (discounting the click on their search results) and also personalize your search results (e.g., ranking higher results visited/liked by your friends, neighbors, etc…).

Augment that with their geo-location project and you also have a pretty good platform for behavioral ADs targeting. They already have a profile of you (you wrote it!), they are about to know where you are (geolocation), and with this they will know the sites that you and your friends visited. This is heaven for the ADs folks.

Tags : , , , , , , , , , ,

Google Collected Wifi Data for Geo-location Purposes

Google LogoIn the past days there have been a lot of discussions about the public admission of collecting Wifi data from Google. This has been labeled as “mistake” but do not you wonder why Google was collecting those Wifi data in the first place? They were collecting MAC addresses and network SSIDs for geo-location purposes.

Since wireless networks are pretty popular, and the combination MAC/SSID is unique, associating those with the car’s GPS coordinates allowed Google to create a pretty detailed map. This map could then be used to figure out your coordinates given the MAC/SSIDs around you. The technology is generally called Wi-Fi Positioning System (WPS).

A possible use of that is the Google Maps application. If the device does not have a GPS, it uses GSM cells triangulation (cells coordinates have probably be obtained in a similar fashion) to figure out the location. While it generally works, it cannot be very accurate and often has a 2 miles range approximation. However, if some wireless networks are detected in the surroundings, they can be used to produce a much better estimation of the location.

These data are probably also sold/used by the Automatic Geo-Tagging feature of the Eye-Fi memory card. Not surprisingly, those cards are sold in promotion with Google Picasa.

Tags : , , , , , ,