Speaker
李泓旻 (Andrew)
Material
Note
Problem to solve
Ray
- core concept
- tasks
- stateless
- return a
future
: the result of the tasks - idempotence
- actors
- stateful
- can be passed to other actors or tasks
- tasks
# initialize a ray cluster (by default your local machine)
ray.init()
- components
- global control store
- maintain the control state
- key-value store with pub-sub functionality
- benefits
- fault tolerance
- low latency
- global scheduler
- local scheduler
- in-memory object store
- plasma
- store
- inputs
- outputs
- stateless computation
- on each node, Ray has the object store via shared memory
- external storage is also supported
- global control store
- How to handle python dependency?
ray.init(runtime_env=runtime_env)
Modin
import modin.pandas as pd
- Why modin?
- high pandas API coverage (90% up)
- What if some pandas API is not supported?
- fallback to
default to pandas
mode
- fallback to