Speaker

Ian Ozsvald

Material

Note

pandas

saving RAM

  • Stings are expensive and slow → Categorical
    • e.g., df.CompanyCategory.astype('category')
    • cheap and faster
  • float64 is default and a bit expensive
    • float32 "half-price" and a bit faster
  • dtype_diet

drop to numpy if you know you can

  • e.g., df['age_years'].sum() is much slower than df['age_years'].values.sum()
    • bypass a lots of method searching

install optional pandas dependencies

mistakes slow us down

Other than pandas

  • compile to Numba
  • Dask for multi-core
    • make plain-python code multi-core
  • Vaex
  • Modin

Share on: TwitterFacebookEmail


Published

Category

EuroPython 2020

Tags

Contact