Unit Predict – RL’s Demographics API

US Army in the 60s had a program called “remote viewing” where it had gathered together civilians who had, purportedly, the ability to sense and imagine objects and events at a distant location (read USSR silos) given only a location. Fascinating work. Now imagine you have to do the same in software. Your task is, given only 1 location data point, you are asked to write an algorithm that predicts a simpler problem (than our Army colleagues) – what is the characteristics of the individual at that location? Why is this an interesting problem to solve in the first place? Well, the inventory of location data is growing fast, being generated at all layers from physical layer, to network, to application to content (e.g exif encoding in images, or location data with tweets), all adding to the inventory. But what does it really mean to have access to someone’s location? Location-Based Applications (LBS) today often use location information as a key to retrieve and/or match some location data that is relevant to the current user location. Being recommended bars or restaurants, etc.

But can we do more than simple retrieval and matching keyed on location? Can we infer something else from location data, something close to, but not as hard as our Army colleagues back in the 60s? Our mission at RobustLinks is to turn data to knowledge, and the problem seems like a good fit. The history behind it is long, convoluted and informative but the short of it is that after scratching our heads we  came up with is a suite of algorithms that given only a single geolocation data of a user, transforms that data to a prediction of the demographics of that user, along several variables (age, gender, education, profession, income, ethnicity, number of children, home ownership, etc). It is important to note that the algorithm is given only 1 bit of information and asked to predict the profile of the person that generated it.

How we do it is mum right now. But as of recently we have opened up the APIs to the algorithms to the public. This article will overview the first API – unit predict. Next article will describe the batch predict (which given a time series of geocodes, as well as radius, aggregation rules etc) improves its prediction.

Unit predict is simple to use. After you’ve registered for an API key you simply provide:

  • your APIKey, and
  • a single location (latlong)

there are some advanced optional parameters that you can set that allow you to constrain the algorithm. I’ll cover those in another posting.

The API returns most likely predicted demographic profile for a person at that location, along the dimensions mentioned above. You can see the API doc at:

http://robustlinks.dyndns.org/api/globeskimmer/docs/

and call it via

http://robustlinks.dyndns.org/api/globeskimmer/unitPredict/?apikey=<your API Key>&lat=31.7&lng=-78

outputs

{
  "confidence":{
  "homeownership": {"own": 0, "rent": 100}, 
  "gender": {"male": 49, "female": 50}, 
  "age": {"35-44": 4, "18-24": 3, "25-34": 2, "45-64": 64, "65+": 25}, 
  "numberofchildren": {"5 or more": 0, "1": 4, "0": 93, "3": 0, "2": 1, "4": 0}, 
  "income": {"45,000 to 49,999": 1, "40,000 to 44,999": 2, "75,000 to 99,999": 7, "15,000 to 19,999": 1, "200,000 or more": 26, "10,000 to 14,999": 1, "50,000 to 59,999": 2, "30,000 
 to 34,999": 1, "60,000 to 74,999": 8, "100,000 to 124,999": 11, "20,000 to 24,999": 9, "150,000 to 199,999": 2, "25,000 to 29,999": 1, "Less than 10,000": 15, "125,000 to 149,999": 3, "35,000 to 39,999": 2}, 
  "education": {"BD": 26, "AD": 1, "GP": 15, "HS": 43, "SC": 9, "L9": 3}, 
  "employment": {"professional": 20, "media and entertainment": 0, "clerical and labor": 71, "service": 8}, 
  "ethnicity": {"hispanic": 0, "white": 61, "black": 38, "asian": 0}
  }, 
"prediction": {
  "homeownership": "rent", 
  "gender": "female", 
  "age": "45-64", 
  "numberofchildren": "0", 
  "income": "200,000 or more", 
  "education": "HS", 
  "employment": "clerical and labor", 
  "ethnicity": "white"
  },
 "weighting": "confidence", 
 "version": "0.1", 
 "radius": 5, 
 "calls_remaining_24h": 89, 
 "filterset": 5
}

 

Use Cases

This work started while trying to build an encryption service on mobile device, artifacts that are increasingly “leaking” a lot of personal data. Talking to folks in marketing industry we discovered that a lot of media provisioning and allocations are done in Nielsen style DMA (Designated Market Area). “Buy and distribute this type of media for Ohio region because the demographics is X,Y,Z”. When we started designing the demographic APIs we were also thinking about application developers who do not have an authentication logic in their apps. Can they send us their user location and we provide them with a profile of their users? Or how about Ad Networks? They are data aggregators and transforming that data to knowledge of users would be valuable.

Join the Partner Network

We designed these APIs with some potential use cases in mind. As the saying goes “man plans, god laughs”. We’ve already seen API users use the service in unanticipated ways. So if you see a use then feel free to give it a go. Due to resource limitations (and the potentially large volume of incoming data) we’ve had to cap the free service to 100 calls / day. Joining our partner network opens up the access to a greater degree.

Enjoy

Leave a Reply

Your email address will not be published.