The Arrow revolution in pandas and Dask
11-03, 10:15–10:55 (America/New_York), Winter Garden (Room 5412)

The pandas library for data manipulation and data analysis is the most widely used open source data science software library. Dask is the natural extension for scaling pandas workloads to more than a single machine. The continuing integration and adoption of Apache Arrow accelerates historical bottlenecks in both libraries.


Pandas reached an important milestone earlier this year with the 2.0 release, which brought DataFrames backed by Arrow instead of the historical NumPy backend. We will learn how the 2.0 release shapes pandas as a tool for Data Analysis and what future releases will bring, including a more efficient string implementation, general purpose Arrow-backed DataFrames, Copy-on-Write and better integration with the ecosystem.

Dask is a library for distributed computing with Python and integrates tightly with pandas and other libraries from the PyData stack. Dask underwent a number of changes over the last year, which are targeted at more efficient and faster computations. We'll discuss some of these changes in more detail, including:

  • A more efficient shuffling algorithm based on Arrow
  • Arrow-backed string implementations by default
  • A logical query optimization layer for Dask DataFrames

We'll look ahead and discuss what the future holds and these changes and ongoing efforts position Dask in the Big Data ecosystem among competitors like Spark.

Whether you are a new or advanced users, you'll love this if you want to learn about recent or upcoming changes that are targeted towards running workflows with more efficiency or faster


Prior Knowledge Expected

No previous knowledge expected

Patrick Hoefler is a member of the pandas core team and a Dask maintainer. He is currently working at Coiled where he focuses on Dask development and the integration of a logical query planning layer into Dask. He holds a Msc degree in Mathematics and works towards a Msc in Software engineering at the University of Oxford.

This speaker also appears in: