The dangers of storytelling with feature importance
11-02, 16:05–16:45 (America/New_York), Radio City (Room 6604)

It's common for machine learning practitioners to train a supervised learning model, generate feature importance metrics, and then attempt to use these values to tell a data story that suggests what interventions should be taken to drive the outcome variable a favorable way (e.g. "X was an important feature in our churn prediction model, so we should consider doing more X to reduce churn"). This simply does not work, and the idea that standard feature importance measures can be interpretted causally is one of data science's more enduring myths. In this session we'll talk through why this isn't the case, what feature importance is actually good for, and we'll give a brief overview of a simple causal feature importance approach: Meta Learners. This talk should be relevant to machine learning practitioners of any skill level that want to gain actionable, causal insights from their predictive models.


Slides for this talk can be found here: https://docs.google.com/presentation/d/1F_cDWQvF1uXWqYIXlC5TTGM3ApPI92k4/edit?usp=sharing&ouid=106648805068606228158&rtpof=true&sd=true

Code used to generate toy datasets and plots can be found here: https://colab.research.google.com/drive/1UF9HGssZ105BeB3-iosnR-OJFUz-GqWU?usp=sharing

This talk is intended to informative and a bit playful (since I'll be pushing back on a commonly taught data science workflow). The only prior knowledge that would be helpful if one attends would be having at least modest experience with machine learning modeling in the usual open source data stack.

I will explain to the audience that standard feature importance metrics have a very specific use and should never be interpreted causally (even though in practice they are interpreted that way all of the time). Feature importance metrics are very widely used in the data science field, so I expect everyone at the conference that has ever built a supervised machine learning model will have used them at some point.

In this talk I'll explain how feature importance measures can: 1) show you which features contribute little predictive ability with regard to your outcome, which you can use to prune down and simplify your ETL, and 2) explain in a purely mathematical way why your model made the prediction it did. I'll demonstrate how beyond this, they tell us very little (certainly little about causality in the real world). I'll explain how confounding makes standard feature importance measures problematic. I'll give a gentle overview of a class of methods known as "Meta Learners", which are very simple to implement and give the modeler clear causal interpretations of the impact of their features. They can then use these to help direct real world action (e.g. "based on these results, if we want to reduce churn we should consider modifying X and Z"). Attendees should immediately be able to use these Meta Learning methods after this session.

Talk outline:

  • What good is feature importance anyways? (10 minutes)
  • Feature importance in a minute
  • Confounders
  • Correlation vs causation: Ice cream and crime
  • Model pruning and prediction explainability
  • A causal alternative: Meta Learners (20 minutes)
  • S-Learner, T-Learner, X-Learner
  • The causalml package API
  • Average and individual treatment effects
  • Closing remarks and Q&A (10 minutes)

Prior Knowledge Expected

Previous knowledge expected

I spent nearly a decade employing causal modeling and inference in academia as an epidemiologist, and since 2015 then I've been employing these approaches as an industry data scientist / ML engineer. I also am a member of the open-source community, being the author and maintainer of the causal-curve python package (https://github.com/ronikobrosly/causal-curve). I am currently a Director of Data Science at Capital One.