Why does your data differ from the WHOIS data?
The purpose of the WHOIS data is to identify the entity (person or company) to which a block of addresses have been delegated. It is essentially an ISP Map, rather than an IP Address Map. While the granularity of the Geobytes map, is that of a single subnet – placing each subnet individually within the geographic area that it services, the WHOIS data has a granularity of “company” (or ISP), placing all subnets “allocated” to a given ISP in the same geographic location. Compounding the problem further, the large and successful ISP’s that carry the bulk of the Internet’s traffic tend to service wide geographic areas. Given these points and the fact that the WHOIS data will locate all of an ISP’s address space to the same city, then it is difficult to see how the WHOIS data, even if it was kept up to date, could be geographically accurate.
On the other hand, the purpose of the Geobytes map is to map IP Addresses to geographical locations. To achieve this we acquire seed data from a number of sources. All of these sites ask the web surfer to provide their geographic location, and this location along with the user’s IP Address is forwarded to us as seed data. We then run this data through a series of algorithms which identify and extract collaborating seed points.
In addition to accuracy, the problem with using WhoIs data is that it contains phantom addresses – addresses that have been allocated but are not used. (Addresses that are not configured in any ISP’s BGP router tables.) Only about 10% of the theoretical 4 Billion IP addresses are actually routable across the net. The remaining 90% will make it as far as your ISP’s BGP router and then go nowhere. What this means is that a WhoIs based database will have to be 10 times larger than it would otherwise need to be.
I enter a random IP Address and it could not locate it. Why not?
Because our map is built from real traffic it does not contain the locations for Phantom IP Addresses – A Phantom IP Address is one that does not appear in any ISP’s BGP (Border Gateway protocol) tables and accordingly can not carry traffic. Our research shows that 9 out of 10 of the theoretical total of 256*256*256 = 16,777,216 subnets are either “Phantom” subnets, or don’t carry any traffic at all. Apart from Phantom addresses, our map may not contain addresses of infrequently used, very low traffic subnets. Where it does contain very low traffic subnets, it’s resolutions may not be as dependable due to the proportionately low number of seed points that we have pertaining to these subnets. This is one of the weaknesses of our mapping method, but it’s impact on real world performance is limited because it is confined to the larger number of subnets which carry practically no traffic.
On the other hand, because our map’s accuracy is largely based on the available seed points, and because the number of seed points we will have for a given subnet is proportional to the traffic that it carries, our map’s accuracy will be very high where it matters most – where its resolutions will effect the most traffic.
How does this work?
The technology developed by geobytes is unique. It allows for the non-intrusive geographic location of most Internet users to their town, city or region in real time. The technology locates Internet users anywhere on the globe, without profiling, prompting or tracking the user in any way. It does NOT use the Internic’s Whois database or DNS lookups. In particular, cookies are not used and the privacy of Internet users is not invaded. No information about any Internet user is requested, collected or used in any way whatsoever.
How does it work with AOL addresses?
We can only resolve AOL and MSN TV traffic to country. AOL & MSN TV (was Web TV) are special cases and this does not apply to other ISP’s.
They are special cases because they act as non-transparent proxies, as such their addresses cannot be resolved to a finer granularity beyond that of Country. As far as we know, no one can resolve these addresses to anything finer than a country level.
To accommodate for this what we have done is include the AOL & MSN TV proxy addresses in the subnets table. These addresses have been allocated to the country they service, however the city and region will show AOL or MSN TV. This should reduce or eliminate the amount of special handling required to cater for the special case that AOL presents.
For countries where AOL has a “proxy network” we create a phantom city called “AOL” within the country, and then show this as the “city” for the AOL subnets that service this country.
The countries in which AOL has proxy networks are:
We also provide a list of ‘Proxy Network’ addresses so that these AOL addresses can be easily identified. The ProxyNetworks table also includes the relevant cityid for each subnet.
How are ISP’s that are in multiple regions handled?
Our map is not ISP based so they have no impact. Our map is based on the service areas of the Internet’s routers, gateways, firewalls etc.
With so many subnets carrying little or no traffic, and so few subnets carrying almost all of the traffic, how do you fairly work out your maps accuracy?
The distribution of traffic across the Internet’s subnets is disproportional in the extreme, so when calculating the accuracy of a IP Address Map it is important to test against a ‘Typical’ traffic profile. 87% of the Internet’s traffic originates from only 10% of it’s active subnets (with approximately 94% of the theoretical total of 256*256*256 = 16,777,216 subnets not carrying traffic at all.) 5% of the active subnets carry 74% of the net’s traffic. If an analysis is made without consideration of the proportion of traffic being carried by each subnet tested, then the conclusions drawn are likely to be meaningless.
Let’s say that there are 1 million active subnets of which 10% (100,000) carry 90% of the Internet’s traffic. Let’s also say that our accuracy on these 100,000 subnets is 99% and that our accuracy on the remaining 900,000 subnets is only 50%.
In this scenario, if we were to randomly test our accuracy against the entire set of active subnets – without regards to the proportion of traffic carried by each subnet, then we would conclude an accuracy rate of only 60%. Clearly, these results would be flawed, and would not be representative of the accuracy obtained under a normal operating environment.
The proportion of traffic carried by each subnet needs to be incorporated into the overall accuracy calculation. If this methodology is applied to the above example, then the overall accuracy would be 95% (99% accuracy on 90% of the nets traffic, and 50% accuracy on 10%).
What percent of the time can it correctly identify the country of an IP address?
We are 97% accurate to country level.
And what percent of the time can it correctly identify the city?
On average you can expect our map to be accurate to within 100km 80.25% of the time, and within 50km 75% of the time, with even greater accuracy in high traffic areas.
What addresses can you resolve?
We can resolve 98% of addresses. Addresses which we can not resolve relate mainly to infrequently used or very low traffic address.