IP Address to Geolocation

Background


Few months ago I found an interesting website: http://ipinfodb.com/, it provided API which could "translate" any IP Address into a geography location including City/Region/Country as well as latitude/longitude and time zone information, to invoke its API, a registered API key is required (which is free). Since beforehand I stored visitor's IP Addresses into my own database, I decided to utilize InfoDB API to store visitor's GEO locations.

Just few days ago, I casually emitted an idea: summarize those GEO location records and display them on Google Map, hum, it is feasible:)

So, the process is: Track visitor's IP addresses -> "Translate" them to Geography location -> Show them on Google Map!

(PS, I've been used Google Analytics for my Geek Place - http://WayneYe.com for more than two years, it is no double extremely powerful, and it already contains a feature "Map Overlay", however, due to privacy policy, Google Analytics does NOT display visitor's IP address, see http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86214).

Implementation


The first task I need to do is track visitor's IP Address, most of the time, user visits a website in browser submits an HTTP GET request (an HTTP data package) based on Transmission Control Protocol (most of the time) , browser passed the ball to DNS server and DNS server delivered the request to the designation - the web host server, during the process, the original Http request was possibly transferred through a number of routers/proxies and many other stuff, the request's header information might have been updated: Via (Standard HTTP request header) or X-Forwarded-For (non-standard header but widely used), could be the original ISP's information/IP Address OR possibly one of the proxy's IP Address.

So, usually the server received the request and saw Via/X-Forwarded-For header information, it got to know visitor's IP address (NOT all the time, some times ISP's IP address), in ASP.NET, it is simply to call Request.UserHostAddress, however, we can never simply trust this because of two major reasons:

  1. Malicious application can forge HTTP request with modified X-Forwareded-To header (for example: X-Forwarded-To: dangerous code), if you are unlucky to trust it and have it inserted into Database, then SQL Injection hole will be utilized by Malicious application.

  2. Not all the visitors are human-been, part of them could be search engine spiders, I must distinguish human visitors and spiders, otherwise for example, I will be happy to see a lot of "visitors" came from "Mountain View, CA" ^_^.


For #1: I use regular expression to validate the string I got from Request.UserHostAddress:
public static Boolean IsValidIP(string ip)
{
if (System.Text.RegularExpressions.Regex.IsMatch(ip, "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"))
{
string[] ips = ip.Split('.');
if (ips.Length == 4 || ips.Length == 6)
{
if (System.Int32.Parse(ips[0]) < 256 && System.Int32.Parse(ips[1]) < 256
& System.Int32.Parse(ips[2]) < 256 & System.Int32.Parse(ips[3]) < 256)
return true;
else
return false;
}
else
return false;
}
else
return false;
}

If the result is "0.0.0.0", I will ignore it.
For #2, so far I haven't found a "perfect way" to solve this issue (and I guess there might be no perfect solution to identify all the search engines in the world, please correct me if I am wrong); However, I've defined two rules to try my best to identify them for general and normal situations:

Rule #1:


Request which contains "Cookie" Header with "ASP.NET_SessionId" AND its value is equal with server side, then it should be a normal user who has just visited my website within the one session.

Notes: there might be two exceptions for rule #1,

  1. If user's browser has disabled Cookie then this rule will NOT be effective since the client request will never contain a Cookie header since the browser disabled it:).

  2. Assume there is a crawler who crawls my website and accept storing cookie, then #1 will not be effective. However, I don't think a crawler will firstly request a SessionID and then request again with the SessionID).


Rule #2:


Define a crawler list and analyses whether "User-Agent" header contains one of them, this should be configurable. Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers

Talk is cheap, show me the code, I wrote a method to identify crawlers by applying two rules above.
public static Boolean IsCrawlerRequest()
{
// Rule 1: Request which contains "Cookie" Header with "ASP.NET_SessionId" and its value is equal with server side,
// then it should be a normal user (except maliciously forging, I don't think a crawler will firstly request a sessionID and then request again with the SessionID).
//if (HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null
// && HttpContext.Current.Request.Cookies["ASP.NET_SessionId"].Value == HttpContext.Current.Session.SessionID)
if (HttpContext.Current.Request.Headers["Cookie"] != null
&& HttpContext.Current.Request.Headers["Cookie"].Contains("ASP.NET_SessionId"))
return false; // Should be a normal user browsing my website using a browser.

// Rule 2: define a crawler list and analyses whether "User-Agent" header contains one of them, this should be configurable
// Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers
var crawlerList = new String[] { "google", "bing", "msn", "yahoo", "baidu", "sosospider", "sogou", "youdao" };

if (!String.IsNullOrEmpty(HttpContext.Current.Request.UserAgent))
foreach (String bot in crawlerList)
if (HttpContext.Current.Request.UserAgent.ToLower(CultureInfo.InvariantCulture).Contains(bot))
return true; // It is a crawler

return false;
}

Please be aware that I commented out HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null, since I found that Request.Cookie will ALWAYS contain "ASP.NET_SessionId" EVENT IF the browser disabled Cookie storing, I will do further investigation and double check later!

Ok, now we get normal users' IP Addresses and filtered search engine crawlers, the next step is invoking InfoDB API to "translate" IP Address to Geolocation, you need register an API KEY here, and then submit an HTTP GET request to:
http://api.ipinfodb.com/v2/ip_query.php?key=[API KEY]&ip=[IP Address]&timezone=false

It returns XML below, I take IP="117.136.8.14" for example:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Status>OK</Status>
<CountryCode>CN</CountryCode>
<CountryName>China</CountryName>
<RegionCode>23</RegionCode>
<RegionName>Shanghai</RegionName>
<City>Shanghai</City>
<ZipPostalCode></ZipPostalCode>
<Latitude>31.005</Latitude>
<Longitude>121.409</Longitude>
<Timezone>0</Timezone>
<Gmtoffset>0</Gmtoffset>
<Dstoffset>0</Dstoffset>
<TimezoneName></TimezoneName>
<Isdst></Isdst>
<Ip>117.136.8.14</Ip>
</Response>

Wow, looks precise:), I am going to show visitor's geolocation on Google Map (I know this compromises visitor's privacy but my personal blog http://WayneYe.com is not a company and I will NEVER earn a cent by doing this:)).

Anyway, I use the latest Google Map JavaScript API V3, and there are two major functionalities:

1. Display visitor's Geolocation as long as user's browser support "navigator.geolocation" property (Google Chrome, Mozilla Filefox support it, IE not support and I will set default location to New York City), a sample below:

VisitorInfo

2. Display a specified blog's visitors' geolocations on Google Map, screenshot below shows the visitors' geolocations who visited my blog: My new Dev box - HP Z800 Workstation, by clicking each geolocation, it will show on Google Map.

Visitors of <My new Dev box - HP Z800 Workstation>

The JavaScript code showing below:
<script type="text/javascript">
var initialLocation;
var newyork = new google.maps.LatLng(40.69847032728747, -73.9514422416687);
var browserSupportFlag = new Boolean();
var map;
var myOptions
var infowindow = new google.maps.InfoWindow();

function initialize() {
myOptions = {
zoom: 6,
mapTypeId: google.maps.MapTypeId.ROADMAP
};
map = new google.maps.Map(document.getElementById("googleMapContainer"), myOptions);

// Try W3C Geolocation (Preferred)
if (navigator.geolocation) {
browserSupportFlag = true;
navigator.geolocation.getCurrentPosition(function (position) {
map.setCenter(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
infowindow.setContent('Hi, dear WayneYe.com visitor! You are here:)');
infowindow.setPosition(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
infowindow.open(map);
}, function () {
handleNoGeolocation(browserSupportFlag);
});
} else {
browserSupportFlag = false;
handleNoGeolocation(browserSupportFlag);
}

function handleNoGeolocation(errorFlag) {
//contentString = 'Cannot track your location, default to New York City.';

map.setCenter(newyork);
//infowindow.setContent(contentString);
//infowindow.setPosition(newyork);
infowindow.open(map);
}
}

function setGoogleMapLocation(geoLocation, latitude, longitude) {
contentString = geoLocation;

var visitorLocation = new google.maps.LatLng(latitude, longitude);

map.setCenter(visitorLocation);
infowindow.setContent(contentString);
infowindow.setPosition(visitorLocation);
infowindow.open(map);
}
</script>

Ok, all done, eventually I built a visit record page which shows every's blog's visitors' Geolocations, the location is: http://wayneye.com/VisitRecord.

Tags:

Categories:

Updated:

Leave a comment