Reza's Tech Talks

Sunday, February 8, 2009

Search Engine for Dummies - part 1

What would Internet be without search engines? You have a humongous amount of data, but without a tool to find what you need, you would be at lost.

Today you probably take it for granted that Google is the most popular search engine. But that wasn't always the case. If you've been around in the Internet long enough (i.e. a decade or more), you'd probably remember those days before Google become household name. Before the turn of the new millennium, the most popular search engine was Altavista.

Back than Altavista was considered cutting-edge and the pioneer of a truly usable Internet search engine. It was one of the top web destinations. But by early 2000's. It's popularity quickly dropped due to the arrival of a new kid on the block: the venerable Google.

When Google entered the search engine market, many people thought the market was already saturated. There were already a bunch of search engines (You may recall names such as Lycos, Magellan, InfoSeek, Excite, etc). But Google proved them wrong. It rose steadily in prominence, and soon enough it grasped the top spot and remained there ever since.

So how does Google do it? And where did Altavista and those other search engines fail?

All search engines begin with a web crawler. It is a piece of software which automatically crawls the web, collecting information about every page it encounters. This data is then stored and indexed in the search engine's database, making it available for search queries from users.

The quality of a search engine from user's point of view depends on the relevance of the result set it gives back based on the search term. The primary difference between old generation search engines (Altavista et al) and newer generation ones is the method to determine the most relevant web pages to be put in the result set.

Old search engines' method is based on the textual content of the web pages. In this method, a page's relevance to a search term is calculated based on how many times that search term occurs in the page. For example, if you search for "Nuclear Weapon", the search engine would probably return a page where this term occurs multiple times at the top of the search result.

That sounds reasonable, right? But it turns out that this simple method has serious flaws. Supposed you create a page which contains nothing but the term "Nuclear Weapon" repeated a dozen times. This page would rank high in the result set, despite the fact that it's a useless page. This is the reason why results returned using this method typically have low quality. How do you improve the quality then?

to be continued...

How Big is Your Internet Footprints?

A while ago, for one reason or another I remembered a good old friend from my college years in Illiois, US which I haven't heard for more than five years. We now lives at the opposite ends of the earth. He stayed in the US after graduating, while I went back to my home country, Indonesia.

Time passed, life got me busy, hence I no longer kept in touch with him. His old email was no longer active and we didn't have many mutual friends. So when I wanted to find out news about him, I was at lost. What do you do then when you're in a situation like this?

Well, if you use Facebook (or any other social networking sites), most likely that would be the first place to go. That I did. But nop, he wasn't there. The next logical thing would be to go to uncle Google. I did that too. Voila, after a few searches I found some info about him. Now I know that instead of Facebook he uses LinkedIn. I found out know where he works, his gmail email address, and things he did for college project assigments. Search a little more, I found out that last year he had married his longtime girlfriend (they hooked up since the college days. Dude, what took you so long...). That's not it, I even found that just recently, they had a baby daughter. Had I searched deeper, probably I could find his home address, phone number, and his shoe number (ok, probably not that one).

Pretty scary, huh?! His personal information is all exposed on the Internet. In the old days, probably you need to be a cop or a private investigator to obtain that kind of information. Now, a Google skill is all you need.

What about yourself? If you have used Internet for a while and you have been active (use a lot of web services, join social networks, forum discussions, etc), chances are your personal information are all over the web. You may think that some of the websites that you use have somehow violated your privacy. But the fact is, in most cases it is you who expose to the world about yourself. Just think of what others can see in your profile in your favorite social network.

I heard now it's not uncommon for employers to find out as much as they can about their prospective employees via Internet. The Internet can tell them what you don't mention in your CV. It's easier for the employer to Google for your information than ask your reference giver. If you have been behaving "nicely", than you have nothing to worry about. But if you've been "bad", well, your Internet profile is like a criminal record which is difficult to erase. Perhaps now you should be more careful about what kind of footprint you left in the Internet. In my case, it's already to late, my footprints are all over the place....

Tuesday, November 18, 2008

Mobile TV Wish List

Mobile-TV, especially DVB-H is starting to gain traction nowadays. Dozens of commercial & trial services have been launched all over the world. Nevertheless, I still have my doubts whether this will actually take off or not. There are several issues which I find not very appealing. Then there are also a couple of things which I would like to have on my mobile TV. Thus, here is my wish list regarding this mobile TV thing:

1. More flexibility on supported subscription model
In subscription model supported by Nokia's solution (which in turn is based on OMA BCAST DRM Profile), subscription process is fully controlled by subscribers. I.e. a subscriber can start, cancel, change subscription at (almost) any time. This is actually good from the subscriber's point of view since they have the freedom over the use of the service. However in a few cases, The service provider (especially when the service provider is more like a traditional cable/satellite TV broadcaster instead of a mobile operator) may prefer typical subscription model where Service provider can “push” and “cancel” subscription centrally. They may also want to limit subscriber’s ability to buy/cancel/change subscription on their own.

2. Widespread adoption of SPP standards
Currently, the available open standard for the Service Protection & Purchase (SPP) is OMA BCAST DRM Profile. The issue is that I still don't see widespread adoption of the standard, resulting in lack of support from terminal vendors. To date it seems that only Nokia, Samsung, & LG(?) that produces compatible handsets. Even viewer vendors provide the complying head-end equipments. Most CAS vendors still provide proprietary OSF-based systems.

3. Variety of Terminal types
Only few varieties of the terminals, especially those that supports encrypted channels are available in the market. More diverse terminal types, not just handphone are definitely needed. They have to be affordable too.

4. Less limitation on user experience
In all the devices that I have seen, user cannot record TV broadcast or send the output to another device. This limit is not due technical issues, but intended limitation. I think users should be liberated just like in the typical broadcast/cable TV. I have a feeling this lacking of recording capability is actually a desired feature demanded by content rights owner.

5. Free mobile TV services
One way to guarantee quick market adoption is to offer free mobile TV services. Then we can forget about SPP standards, forget the lack of terminal availability. Service providers should think of other ways to get revenue (advertisement, interactivity). You wish!!!

Sunday, September 14, 2008

TTL of Multicast Packets in Linux

One of the key enabling technology for Mobile TV is IP Multicasting. IP Multicast allows the distribution of streaming multimedia content with efficient bandwidth utilization.

So it was, during a DVB-H implementation project, one of the software tool that I had to develop is called a "Multicast Proxy". What the tool does basically is listen to a multicast address, and forward all the received packets to a different multicast address. Java came to the rescue. J2SE comes with java.net.MulticastSocket class which makes receiving and sending multicast packets very simple to do.

Typically multicast and broadcast packets are treated with cautions by the OS. This is because both carry the risk of flooding the network. Hence the OS usually set the TTL (Time To Live) attribute of the those packets to 1. This attribute defines the maximum number of hops the packets can go through (1 means the packet can only be received in originating LAN). However, in my network I needed a larger TTL, since the packets must go through several hops. Thankfully the MulticastSocket class already provides a method called setTimeToLive. In my notebook (under Windows), it worked fine and everytyhing was hunky-dory.

Yet, as is always the case, problem occured the code was deployed to the production machine which was running on Ubuntu. Everything seemed to be working except no packets reached the receiving hosts. After banging my head against the wall for a while, I found that the packets can only be received in the same LAN segment. Apparently the TTL of those packets was still 1. Somehow the OS ignores the TTL setting from the java application. I needed to bang my head for another while until I found the solution by Googling.

Luckily, Some linux systems provide a way to configure the TTL of the IP packets using iptables command (works on Linux kernel 2.4 - 2.6, CMIIMW). In my case I used the following command:

iptables -t mangle -A OUTPUT -j TTL --ttl-set 32

The above line sets the TTL for all outgoing IP packets to 32. I know I should've set the TTL only for outgoing Multicast packets only, but I'm too lazy to dig deeper into iptables manual. Besides, my problem is now solved. Here's a tutorial on iptables if you're interested to learn (a lot!) more.

Thursday, September 11, 2008

Intro

These last few months I've been involved in a Mobile TV implementation project using DVB-H technology. It is quite an interesting assignment as this is a rather new technology and also new to me. In fact this is also the first implementation in Indonesia for my company, NSN. Hence, it is in my company's interest to quickly build local competency on this field. And fate has it that I was just released from my previous project (I made it sounds as if my previous project was like a prison. Well, it was). Thus, here I am involved in this mobile TV thing.

The project itself is quite challenging. It covers end-to-end implementation from the head-end systems to the deployment of the physical DVB-H network with several transmitters. I myself was involved in the head-end implementation which includes subsystems such as the audio/video head-end, conditional access, interactivity, network management, and some business support systems.

Along the duration of the project I have learned several interesting things (some of those were not so interesting back then, since I had to learn them the hard way). For reasons unknown even to myself, somehow I have the urge to write about those things I learn. Given the fact that by now most of the work in the project is done, I find myself having the luxury of spare time. Now, blogging is something that I've been wanting to try for a while. Thus, I'll take this opportunity to start writing and sharing my humble experiences in this project, and perhaps also some other rambling thoughts that I have. I have never been a big fan of writing, hence I'm hoping I can also learn something by blogging.

Reza's Tech Talks

Pages

Sunday, February 8, 2009

Search Engine for Dummies - part 1

How Big is Your Internet Footprints?

Tuesday, November 18, 2008

Mobile TV Wish List

Sunday, September 14, 2008

TTL of Multicast Packets in Linux

Thursday, September 11, 2008

Intro

About Me

My Other Blogs

Blog Archive

Labels

Followers