Batch Predict Demographics API

You may have read about or even used our Unit Predict API, which allows one to predict the likely demographic makeup of an individual standing at some geographic coordinates.

While the usefulness of inferring user data from a single data point cannot be underestimated, we realized there was potential for improvement, as well as for an expanded number of use cases!

An example –  suppose you got a set of data points from an app user’s mobile device.  Maybe something like this:


2:13 AM EST 06/01/2013 (lower east side)

2:03 PM EST 06/01/2013 (rockaway beach)

7:45 AM EST 06/01/2013 (murray hill)

10:45 PM EST 06/06/2013 (bushwick)

7:15 AM EST 06/07/2013 (murray hill)

8:01 PM EST 06/09/2013 (atlantic city NJ)

7:23 PM EST 06/11/2013 (midtown)

3:30 PM EST 06/13/2013 (midtown)

11:45 AM EST 06/14/2013 (midtown)

7:15 PM EST 06/15/2013 (miami FL)

From the times, dates, and locations, one might at a glance assume something like:

“User X lives in Murray Hill, works in midtown, and likes beaches.”

High-level inferences of this nature are left up to the API user.  You know your data and your user base better than we do!

And note further that although you may have guessed something about the user’s behavior, you still don’t know whether said user is black or white, old or young, and so on. (I didn’t say “he” or “she” because you don’t know that either!)

So we’ve built new services on top of those available in UnitPredict to allow API users to programmatically deal with aggregate data sets like the above in order to more easily draw intelligent conclusions or filter data based on their domain knowledge.  The basic flow is:

  • Give us your batch of data points
  • Give us some guidelines about what you’re looking for
  • We’ll send back a predicted demographics profile, based only on those data points you told us were relevant.  No more “eyeballing” of data – you don’t need to look at a map.  So simple, even a computer could do it.

Let’s see how it works, using data from the above example.

The basic batchPredict query

Let’s see what the query looks like without filtering.  In effect, we are aggregating all points in our data set as being “equally valid” in our attempt to build a likely demographic profile for a user.


Our unfiltered prediction indicates a user who is professionally employed, rents a home, is white and has no children.

 

Getting ready: compiling your data series

We’re guessing it’s most likely that you are collecting data programmatically, by which token you should be able to generate an API call programmatically from your stored data.    To send us data series, you can generate a string of data points from your data set, each in the format

series=lat,lng,DATE_UTC

A GET request allows you to string data points together, i.e. :
series=lat1,lng1,DATE_UTC&series=lat2,lng2,DATE_UTC2&series=lat3,lng3,DATE_UTC3…

Why UTC?

UTC (Universal Time Code) or GMT (Greenwich Mean Time) is a globally accepted standard.  (Given the number of time zones out there, some of which are based on hours, some on half hours, and given that daylight savings or lack thereof can vary even from county to county in a given state, there has to be some kind of standard).

You’ll get to specify time zone where appropriate, as described immediately following.

 

Filtering by time of day

Perhaps you’ve decided to assume that your app user is near home if it’s night or early morning.  Along with the data set above, you can specify a relevance parameter(“timeOfDay”), and a timeOfDay parameter, with choices of MORNING (5-9 AM), WORKDAY (9AM -5 PM), and so on (see our API documentation for the complete list).  You’ll also need to tell us the timezone as it relates to GMT (in May, and also as of this writing, New York is GMT-04:00 thanks to daylight savings time).

Here we take the above query and see what a user’s demographic profile might look like if we only consider location data collected during the MORNING, local NYC time:


Given this filtered data set, the Globeskimmer algorithms now have confidence to predict additional demographics dimensions —  the user’s likely education level (Bachelor’s degree), gender(female), and age(25-34).

 

Filtering by location

You might also be wondering where your app user is generally based.  As is clear from the above data set, people can get around like never before.  Is your user an airline pilot, a secret bigamist, or simply taking a lot of vacations?

Whatever your take on user behavior, we make it easy to filter your data set by location.  Continuing with the above example, perhaps you’ve decided the user probably lives in the New York Metropolitan Area, and you want to remove any outliers from the prediction data.   To accomplish this you need to specify a bounding box, bbox for short, and a relevance parameter(“space”).

Here we take the original query and see what a user’s demographic profile might look like if we only consider location data collected within a generous bounding box around the New York area:


Not much information there, perhaps – our locus has expanded to the point that we are considering almost all of the points in our original, unfiltered query.

But suppose you’ve determined by other means, such as your mobile app’s internal data, that your user lives somewhere in Brooklyn.   We can set the bounding box accordingly:

…and find that your user may be in a different line of work than previously predicted (clerical and labor).

 

These are a few of the interesting features of batchPredict.  Sign up for a free API key  to learn more!

Leave a Reply

Your email address will not be published.