Lightweight, low-overhead, high-performance: machine learning directly in C++ PyData NYC 2023

Lightweight, low-overhead, high-performance: machine learning directly in C++
.ical

11-02, 11:40–12:20 (America/New_York), Central Park West (Room 6501)

How big should a machine learning deployment be? What is a reasonable size for a microservice container that performs logistic regression? Although Python is the generally used ecosystem for data science work, it doesn't provide satisfactory answers to the size question. Size matters: if deploying to the edge or to low-resource devices, there's a strict upper bound on how large the deployment can be; if deploying to the cloud, size (and compute overhead) correspond directly to cost. In this talk, I will show how typical data science pipelines can be rewritten directly in C++ in a straightforward way, and that this can provide both performance improvements as well as massive size improvements, with total size of deployments sometimes in the single-digit MBs.

The focus here is on efficiency in machine learning deployments. Efficiency has two angles: size and speed. Here we are primarily concerned with size, but speed also comes into the picture too.

In this talk I will start by introducing core parts of the C++ data science ecosystem, centered around the mlpack machine learning library. This includes Armadillo (a linear algebra library), ensmallen (optimization library), Bandicoot (a new GPU linear algebra library), xtensor/xframe (dataframe library), xeus interactive C++ notebooks, and more. I will show a few examples of how these tools can be used to solve some common data science problems, and some thoughts on adapting Python data science code to C++.

Then (and this is the guts of the talk) I will focus on compiling C++ code for deployment. I will focus on both cross-compiling for use on edge devices such as MPUs, as well as simply compiling for local use while keeping the size limited (and the code optimized). Strategies for additional code size gains and dependency reduction will also be discussed, as well as future directions for projects within the C++ data science ecosystem.

Prior Knowledge Expected –

No previous knowledge expected

Ryan Curtin

Ryan earned a Ph.D. studying the acceleration of statistical algorithms at Georgia Tech in 2015. During his time there, he became the maintainer of the mlpack C++ machine learning library (in 2009), and has been contributing to the C++ data science ecosystem ever since. Ryan also maintains the ensmallen optimization library, the Bandicoot GPU linear algebra library, and contributes to the Armadillo linear algebra library. His interest is in making machine learning fast---both by high-quality, efficient implementations, and by choice of asymptotically effective algorithm.

Lightweight, low-overhead, high-performance: machine learning directly in C++ .ical 11-02, 11:40–12:20 (America/New_York), Central Park West (Room 6501)

Lightweight, low-overhead, high-performance: machine learning directly in C++
.ical

11-02, 11:40–12:20 (America/New_York), Central Park West (Room 6501)