Demographic prediction for not registered users at leading Danish media.
CLIENT
- Ekstra Bladet
SERVICES USED
Data Engineering
Data Science
Data Modeling
People in the company act on data insights within 1-2 minutes and that wouldn’t be possible without the infrastructure Flowtale built for us.
- Lars Gundersen, Head of Data & Insights, Ekstra Bladet
Challenge
Ekstra Bladet, being the 4th most visited website in Denmark after Google, Facebook and Youtube, has some serious big data sets for their user base, split between registered and unregistered users. Registered users had both behaviour and demographics, while unregistered had only behaviour. Ekstra Bladet sought to also predict demography for non-registered users. It was agreed by making a model of the registered users, demography could be accurately predicted for the non-registered users through their behaviour.
Solution
A model was created to assess potential explanatory variables in terms of their ability to predict age and gender. Furthermore, its purpose was to conduct preliminary training of key machine learning models in order to assess their potential predictive performance with respect to age and gender (separately). Gender, in this example, being a behavioural pattern that can fit more sexes.
Predictive Models
By utilising machine learning algorithms, two predictive models were created; (i) a classification model to define age and (ii) a binary classification model to define gender.
Performance Assessment
The model was developed for maximum accuracy, which is defined as the percentage of correct classifications. Model families utilised; (i) gradient boosting, (ii) Random forest, and (iii)Support vector machine. Each family was considered for both age and gender. The model families were trained and benchmarked using the caret package.
Solid Process
The process of the task was separated into three overall steps; (i) Data research - collection and structuring of data for modelling and prediction purposes, (ii) Model building - building the appropriate model for subsequent preaching, and (iii) Prediction - prediction of gender and age.
Results
Diving into the userbase of Ekstra Bladet into gender and age is non-disclosable according to Flowtale’s Secrecy and Protection policies. However, the model accuracies were as following:
92%
“Man” prediction accuracy
48%
“Woman” prediction accuracy
83%
Age prediction, e.g. >44 years