Speaker
Eyal Trabelsi
Material
Note
- use what you need
- keep needed columns and rows only
- avoid loop
- use vectorized operations
- type matters
- supported types
int64
float64
bool
objects
datetime64
timedelta
- Category
- Sparse Types
- Nullable Integer / Nullable boolean
- pandas usage
- chunks
- query
- use numexpr
- e.g.,
df[df.col == "val"]
→ df.query("col=='val'")
- use
concat
instead op append
- groupby
- filter early
- custom functions are slow
- merge
- filter / aggregate early
- join on index
- compiled code
- General Python techniques
- cache
- use intermediate variables
- concurrency And parallelism
Share on:
Twitter
❄ Facebook
❄ Email