Ryan Curtin PyData NYC 2023

Ryan Curtin
.ical

Ryan earned a Ph.D. studying the acceleration of statistical algorithms at Georgia Tech in 2015. During his time there, he became the maintainer of the mlpack C++ machine learning library (in 2009), and has been contributing to the C++ data science ecosystem ever since. Ryan also maintains the ensmallen optimization library, the Bandicoot GPU linear algebra library, and contributes to the Armadillo linear algebra library. His interest is in making machine learning fast---both by high-quality, efficient implementations, and by choice of asymptotically effective algorithm.

Sessions

11-02

11:40

40min

Lightweight, low-overhead, high-performance: machine learning directly in C++

Ryan Curtin

How big should a machine learning deployment be? What is a reasonable size for a microservice container that performs logistic regression? Although Python is the generally used ecosystem for data science work, it doesn't provide satisfactory answers to the size question. Size matters: if deploying to the edge or to low-resource devices, there's a strict upper bound on how large the deployment can be; if deploying to the cloud, size (and compute overhead) correspond directly to cost. In this talk, I will show how typical data science pipelines can be rewritten directly in C++ in a straightforward way, and that this can provide both performance improvements as well as massive size improvements, with total size of deployments sometimes in the single-digit MBs.

Central Park West (Room 6501)

Ryan Curtin .ical

Sessions

Ryan Curtin
.ical