Archive for February, 2010
Most Used URLs Shortener on Twitter, January 2010
URLs shorteners are definitively a hot business right now: Twitter made them popular restricting the tweets to only 140 characters, and while developing a URLs shortener is pretty simple, the amount and quality of data that they can collect (e.g., number of time a URL has been clicked on) is amazing.
It is easy to imagine how big search engines like Google, Bing or Ask.com are interested in the click streams of these companies. Traditional search engines generally discover pages through crawling (which is getting increasingly more difficult due to the ever growing size of the web), with the expansion of Twitter and the data of Bit.ly, users will “report” the hot pages directly to them and clicks will tell their importance.
According to my studies, in January 2010 the Twitter crowd produced about 41 Million tweets per day and of those about 38% contained an URL. Pretty impressive, considering that a few months ago there were only 26 Million of tweets per day and 22% contained URLs.
The table below shows the top 100 most used URLs Shortener and their relative percentages of URLs in the Twitter stream for January 2010.
| Percentage #URLs | Service Name |
| 69.63% | bit.ly |
| 7.17% | tinyurl.com |
| 6.50% | ow.ly |
| 2.55% | url4.eu |
| 1.83% | is.gd |
| 1.82% | cli.gs |
| 1.42% | goo.gl |
| 1.05% | tl.gd |
| 0.74% | ff.im |
| 0.72% | 4sq.com |
| 0.51% | su.pr |
| 0.51% | j.mp |
| 0.44% | s1z.us |
| 0.42% | lnk.ms |
| 0.42% | wp.me |
| 0.36% | shar.es |
| 0.31% | tiny.cc |
| 0.25% | ping.fm |
| 0.23% | fb.me |
| 0.22% | digg.com |
| 0.21% | fwix.com |
| 0.20% | r2u.at |
| 0.19% | dlvr.it |
| 0.16% | tr.im |
| 0.13% | siga.st |
| 0.13% | post.ly |
| 0.13% | nxy.in |
| 0.12% | mnt.to |
| 0.11% | nyti.ms |
| 0.09% | ur1.ca |
| 0.07% | u.nu |
| 0.07% | 3.ly |
| 0.06% | fxn.ws |
| 0.06% | uol.com |
| 0.05% | kele.es |
| 0.05% | sbne.ws |
| 0.05% | flic.kr |
| 0.05% | p.gs |
| 0.05% | kl.am |
| 0.05% | ad.vu |
| 0.04% | blip.fm |
| 0.04% | idek.net |
| 0.04% | ur.ly |
| 0.04% | trim.su |
| 0.03% | eca.sh |
| 0.03% | url.ie |
| 0.03% | digs.by |
| 0.03% | tcrn.ch |
| 0.03% | fk.cm |
| 0.03% | htxt.it |
| 0.02% | moby.to |
| 0.02% | om.ly |
| 0.02% | minu.me |
| 0.02% | tgam.ca |
| 0.02% | icio.us |
| 0.02% | vur.me |
| 0.02% | uurl.in |
| 0.02% | bub.bz |
| 0.02% | ning.it |
| 0.02% | mltp.ly |
| 0.02% | que.es |
| 0.02% | awe.sm |
| 0.02% | trim.li |
| 0.01% | flne.ws |
| 0.01% | vf.cx |
| 0.01% | 76k.com |
| 0.01% | askp.me |
| 0.01% | olha.biz |
| 0.01% | rp.pe |
| 0.01% | job.bs |
| 0.01% | znl.me |
| 0.01% | twa.lk |
| 0.01% | zz.gd |
| 0.01% | twib.es |
| 0.01% | rago.ca |
| 0.01% | sp2.ro |
| 0.01% | twlv.net |
| 0.01% | tynt.com |
| 0.01% | pk.gd |
| 0.01% | doms.bz |
| 0.01% | xr.com |
| 0.01% | hyux.com |
| 0.01% | bit2.ca |
| 0.01% | bz9.cc |
| 0.01% | tol.bz |
| 0.01% | act.ly |
| 0.01% | blip.tv |
| 0.01% | 9mp.com |
| 0.01% | dw.am |
| 0.01% | f1a.me |
| 0.01% | fwd4.me |
| 0.01% | amzn.com |
| 0.01% | bte.tc |
| 0.01% | gmed.net |
| 0.01% | r.im |
| 0.01% | sn.im |
| 0.01% | vai.la |
| 0.01% | boo.fm |
| 0.01% | elmo.st |
| 0.01% | im.ly |
(Disclaimer: some of those may not in fact be URLs Shorteners. The list was too long, and my life too short, to have the time to go through each of them and verify what their business is. If you find any error, please feel free to let me know.)
Use shared_clone() to Share Variables among Perl Threads
Sharing variables across threads is generally very annoying in Perl. You have to declare the variable as shared before using it, and pay attention to the values you put in it.
Things get especially messy with multi-level hashes, since you are obligated to pre-declare each level as shared.
Luckily, there is a way to make things easier. If you upgrade threads::shared to version 1.32 using CPAN and can afford to waste some memory for a little, you can create your objects normally and then create shared copies of them using shared_clone().
This function will recursively traverse the object, create a shared clone of each element in it, and return you a nice reference which you can pass around to your threads.
At that point, to save memory, you can undef() the original object and keep only the clone.
This works great and flawlessly for read-only objects but it will still require some caution when you want to modify or add/append data to them since they need to be pre-declared as shared.
Perl: if you chomp() to split(), skip the first
In Perl it is common to write a readline() while loop over a file to read its content in memory.
When the file contains tab-separated data, many use chomp() to remove the newline from each input line and then split(/\t/) to separate the values into an array.
Today, trying to improve the performances of one program I wrote, I discovered that I was spending the same amount of time on both functions. Eliminating one of them would have doubled the speed of my code.
If your input data do not contain spaces, you can skip the chomp() and use split(/\s+/) or use split(/[\t\n]/) if it does.

