How to Integrate Skype and MythTV
One of the things I am really confident will happen in the near future is an integration between our TV and phones. In the past years Skype and VoIP improved significantly but yet we have to see an example of seamless integration between those technologies.
In my living room I have a big flat screen TV connected to Linux MythTV server, which I use for recording and watching TV and DVDs.
Thanks to the long weekend I had some time to attach a webcam (VF0415 Live! Cam Vid. IM Ultra) to that computer, mount it over the TV, and make it work (nothing to do, really) with Skype.
It works great, but I still needed to get out of MythTV and use mouse/keyboard to access Skype and make calls. I am sure it would not be that hard to create a proper plugin to make MythTV work with the client-side Skype API, but it probably makes little sense now that they are about to release their SkypeKit platform and I am sure someone will just convert the good SIP plugin for that.
Here is how you can add an entry in the Main Menu of MythTV to start Skype:
- Find and save somewhere on your disk a reasonably sized Skype logo in PNG format
- Add to “/usr/share/mythtv/themes/<your_theme>/menu-ui.xml” an entry like the following
<state name="SKYPE"> <imagetype name="watermark"> <filename>watermark/skype.png</filename> </imagetype> </state>
- Add to “/usr/share/mythtv/themes/defaultmenu/mainmenu.xml” an entry like the following
<button> <type>SKYPE</type> <text>Skype</text> <description>Launch Skype</description> <action>EXEC /usr/bin/skype</action> </button>
Clearly you will have to change <your_theme> with the name of the theme you use (I use “Retro”), “watermark/skype.png” with the real location of your Skype logo and “/usr/bin/skype” with the location of your Skype executable (try with the command “which skype” if you do not know it), but everything else should work.
Restart MythTV and your new shiny Skype entry should appear at the bottom of the main menu. Clicking on it will stop MythTV and launch Skype. At first launch, maximize the Skype window with the mouse, then it will do it automatically. When you close Skype (for real, right-click on the systray icon and click close) it will go back to MythTV.
I am also fiddling with Lirc to allow to completely control Skype with the remote. It should not be too hard. I will update this post when/if I manage to do it.
Google Reader tells Google what you Like and what to Index
While most of the Facebook‘s generation kids who populate the Internet nowadays have no idea about what RSS feeds are (but likely “follow” CNN on Twitter), there is still some percentage of tech-savvy people (me included) who take a look to their favorites feeds every morning before starting their day.
Not many RSS feeds reader product exists, and the few are pretty much all the same. The most widely used online RSS aggregators are probably Google Reader and Bloglines.
While Bloglines is clearly supported by online advertising, why do you think Google created its own for free? Yes, you guessed it: to get your traffic information.
They probably use the number of people subscribed to each RSS feed and the frequency of their visits to Google Reader to optimize the frequency of refresh (i.e., when and how often they should recrawl it) for that particular feed/domain. Then, they look at how many people open each post/link and use that information to make decisions on its priority in the crawling queue or ranking of those pages.
I bet there are a lot of subscribers to the CNN feed and some of them log in pretty often. This probably makes its RSS feed refresh rate very high and the number of clicks that each article receives indicate their priority in crawling and has some influence in their PageRank. After all, if 10,000 people looked at a the title/snippet of a piece of news and followed through, it must be interesting no? Conversely, if everyone skipped it, it must be not.
In addition to this, with every click that you do (or do not) they learn something more about you and which kind of content you like. Since Google Reader is hosted on the same domain as all the others Google product (i.e., www.google.com) the cookies that they setup after your login will follow you everywhere there are Google ADs.
They will know even more about you and show “better and better” contextual advertisements.
NASDAQ or S&P500 Components List: How to Download it from Yahoo! Finance
In many are trying to apply Machine Learning Algorithms to the stock market to learn and predict the fluctuation of stocks. Yahoo! Finance is of great help because allows you to download in simple CSV files price and volume of all the stocks going back a few years.
The API is easy to use and to script but unfortunately there is often no list of symbols available to start with. Those lists exist on the sites of the markets (e.g., here is the one for NASDAQ) but they are hard to piece up together.
Luckily, with a could of lines of bash it is possible to create one from the list of components of the NASDAQ or S&P500. The following is the code necessary:
for p in $(seq 0 50 2739); do
wget -O - "http://download.finance.yahoo.com/d/quotes.csv?s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=$p";
done >> /tmp/nasdaq.symbols
This script will loop in step of 50 from 0 to 2739 (numbers of element in the components list of NASDAQ) and downloading the list of symbols in each page and concatenating into the file /tmp/nasdaq.symbols.
The URL to download can be found at the bottom in pages list this one and you just need to substitute the &h=50 with &h=$p, which is the iterator variable.
Google Maps Traffic data is crowd-sourced through cell-phones’ Google Apps
Google Maps have been able to display traffic conditions, even on small road, for quite a while. How does Google know that? The technology behind it is very interesting.
They use GPS/time data sent back from people using Google Maps Apps on their phones (iPhones, Blackberry, Android, …).
At regular intervals the phone sends to Google its GPS coordinates and with simple calculations they can figure out how long it took to you to move from point A to point B.
If that happens to be on a road, and your speed seems to be the one of a car, they can use the data to estimate the traffic on the road, comparing the current data with historical ones.
Facebook will use the Like Button to Personalize Search and Improve ADs
There has been a lot of chatting around the new Facebook’s Like Button. Some people believe it will be great for SEO, other that will increase distribution on Facebook, etc…
But Facebook is smarter than that: they want to create a great search engine, possibly a personalized one, and improve their ADs platform.
Traditional search engines spend a lot of time crawling the web, discovering new pages or updating old ones, and computing the pagerank of each one. What if you could have a graph of the web, updated in real time, with the counts of how many people have been on each page?
This is what Facebook is aiming to do. They do not care if you click that button or not. When the browser renders the page (the button is in a iFrame), it sends a request to Facebook’s server telling them a lot of info about you (e.g., your browser, which page are you on, which language you understand, your IP, your screen resolution, …) and possibly even who you are (e.g., because you logged on Facebook and you still have the cookies around).
Since plugins for the Like button are already widespread for popular content management softwares (e.g., WordPress, Blogger, …) I am sure there will be a wide adoption by content creators.
Facebook will know in real-time about all the new pages created, how many people are visiting them, and who they are (even if you do not have a Facebook account since your browser information are pretty unique). This will allow them to prioritize the crawling and refresh of the pages, compute the ranking based on the popularity (discounting the click on their search results) and also personalize your search results (e.g., ranking higher results visited/liked by your friends, neighbors, etc…).
Augment that with their geo-location project and you also have a pretty good platform for behavioral ADs targeting. They already have a profile of you (you wrote it!), they are about to know where you are (geolocation), and with this they will know the sites that you and your friends visited. This is heaven for the ADs folks.
Google Collected Wifi Data for Geo-location Purposes
In the past days there have been a lot of discussions about the public admission of collecting Wifi data from Google. This has been labeled as “mistake” but do not you wonder why Google was collecting those Wifi data in the first place? They were collecting MAC addresses and network SSIDs for geo-location purposes.
Since wireless networks are pretty popular, and the combination MAC/SSID is unique, associating those with the car’s GPS coordinates allowed Google to create a pretty detailed map. This map could then be used to figure out your coordinates given the MAC/SSIDs around you. The technology is generally called Wi-Fi Positioning System (WPS).
A possible use of that is the Google Maps application. If the device does not have a GPS, it uses GSM cells triangulation (cells coordinates have probably be obtained in a similar fashion) to figure out the location. While it generally works, it cannot be very accurate and often has a 2 miles range approximation. However, if some wireless networks are detected in the surroundings, they can be used to produce a much better estimation of the location.
These data are probably also sold/used by the Automatic Geo-Tagging feature of the Eye-Fi memory card. Not surprisingly, those cards are sold in promotion with Google Picasa.
Twitter stopped tracking URLs clicks on Twitter.com
It looks like Twitter stopped tracking clicks on the URLs shared by the users on its own website.
Given the recent run of the URLs shorteners towards analytics products and the promises of Twitter about the “resonance” on the ADs platform this seems a very odd move.
Maybe they were overwhelmed with data/logs and thought about turning this feature off for now since they were not using it. It is also possible that the traffic on the website (and thus the clicks on the links) is so small that it was not worth to keep the service up for all the URLs (maybe they will just track clicks on ADs links).
Web Pages Language Classification: Bayes, Characters and n-grams
Most search engines start focusing on only one language (e.g., English) because it is simpler, requires almost no characters encoding, and has a wide audience. Wherever you want to index only pages written in English or support all the language of the world, a fast page language classification is one of the first tasks that you will have to deal with.
Simple word-based classification techniques like Naive Bayes will do the trick but require a very big training set especially for the foreign languages. Even if in the last years memory and processing power became less and less expensive, they are not free and especially in a startup you may need to optimize every single function.
For this reason, you may want to consider characters classification. In its simplest form, you just want to compute the frequency in which alphabet characters appear in each language and then compute the distance (e.g., geometric distance) between the text and your models. The memory requirement of this solution are very small (i.e., a float for each of the 26 letters) and CPU can be easily bounded as well (e.g., you can stop after N characters of the input text) trading off precision for speed.
If that is not enough, n-grams of characters (e.g., sequences of 2 or 3 adjacent characters) will probably work even better but require an higher memory footprint.
The following graph shows the frequency of the alphabet letters across the 5 most common European languages. For some letters the difference in usage is pretty high, e.g., the letter “A” is used twice as much in Spanish than in German while “H” is frequently used in German and English but almost never used in the other languages.
TurboTax costs $5 less without Cookies

With the exception of last year (changing job and state, I needed professional help) I always filed my taxes using TurboTax. It is nice, simple to use, and reasonably priced.
This year was no exception, but before finding out that Chase and Bank of America customers have a 35% discount on it, I went on their website to checkout the prices. Oddly, I discovered that visiting www.turbotax.com without accepting cookies shows lower prices ($5 less) than the ones offered to whom visits it with cookies enabled.
Glitch in the system? Naaaaa, I say it is an A/B comparison to see what customers are willing to pay.
Google integrates Profile Results and search links to Social Networks
Looking for names on Google now shows a “Profile Search” results box with the two best hits, an invite to create your Google Profile (in case you are doing a vanity search) and quick links to MySpace, Facebook, Classmates and LinkedIn search pages.



