wnd's weblog

phx.gbl and Microsoft's search

3 Oct 2007 13:38:39 misc

Nobody uses Microsoft’s Live Search. Why does it appear in my webserver logs much more often than Google? Is Microsoft just trying to boost their visibility through statistics?

I often have tail printing Apache’s access log on one part of my virtual desktop. This doesn’t serve any particular purpose, but when I get really bored, I may have a look at it, usually to see how people have landed on my web site. Most of the time there’s nothing worthwhile going on, just Google, MSN, and Yahoo crawling my site, over and over again. Sometimes I can spot a botnet zombies trying to find a vulnerable server, usually through well-known PHP applications, but that’s all – most of the time.

Like so many times before, I was bored, and took a look at the log. Against all the odds there were traces of real human beings accessing my site, using IE 7! Wow. To make this occasion even more special, he used Microsoft’s Live Search to get there. Funny, I didn’t know people actually used Microsoft’s search engine. And how did he get here? Using just “forbidden” as key word? I quickly pasted the referer URL to web browser to see what kind of results the search would return. 57 million generic hits, none on the first pace refering my page. I suppose some people are as bored as I am. Oh well.

I continued to read the logs, and spotted another hit from Microsoft’s search, using “kyoto”. And another, this one using “conditioning”. The IPs do not match, but they’re similar. Are these IPs for a proxy of some sort? - - [03/Oct/2007:09:17:10 +0000] "GET /photos/china_2005/00000029.html HTTP/1.0" 200 2934 "http://search.live.com /results.aspx?q=forbidden&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"

% dig -x
;; ANSWER SECTION: 646  IN      PTR     bl2sch1081905.phx.gbl.
55.65.in-addr.arpa.     2380    IN      NS      NS1.MSFT.NET.

gbl? MSFT.NET? Microsoft?

% whois
OrgName:    Microsoft Corp
OrgID:      MSFT
Address:    One Microsoft Way
City:       Redmond
StateProv:  WA
PostalCode: 98052
Country:    US

NetRange: -

I took a closer look at the logs and realised, that MSNBot had crawled most of target pages just moments before this Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322) found it through the search. (Log file.) Smells fishy. And what is this gbl TLD anyway? It’s not valid TLD, that’s for sure. Is Microsoft trying to distort web browser user agent statistics, or what is this?

[Searching Google for .phx.gbl][google] returns links to number of websites, but none give definite answer. It seems that phx.gbl is tightly related to Microsoft, as it also appears with other Microsoft’s services such as MSN Chat and Hotmail. The wildest theories go as far as to suggest that phx.gbl, or [Phoenic Global Information Systems][int1], is used to monitor traffic that goes through Microsoft’s public servers.

I don’t really know what’s going on, but I wonder why Microsoft can’t have the IPs to reverse resolve properly. Also, since it’s quite obvious that there’s no ordinary web browser at the other end of the connection, why does it pretend to be one? Put your tinfoil hats on and start watching your neighbours.