Speaker
Tania Allard
Material
Note
- common pain points in DS and ML
- complex setup / deps
- reliance on data / database
- fast evolving projects
- are containers secure enough?
-
how is it different from web apps?
- not every deliverable is an app or a model
- relies on data
- Mixture of wheels and compiled packages
- Security access levels - for data and software
- Mixture of stakeholders:
- data scientists
- software engineers
- ML engineers
-
best practices
- Split complex
RUN
statements and sort them - Prefer
COPY
to add files - install only necessary packages
- explicitly ignore files
- documentations
- never add data
- secrets
- Split complex
- cookiecutter template
Top Tips
- Rebuild your images frequently - get security updates for system packages
- Never work as root / minimize the privileges
- run as non-root user
- minimize capability
- You do not want to use Alpine Linux (go for buster, stretch or the Jupyter stack)
- pin / version EVERYTHING (use pip-tools, conda, poetry or pipenv)
- Leverage build cache
- Use one Dockerfile per project
- Use multi-stage builds
- fetch and manage secrets in an intermediate layer
- creates smaller image
- Make your images identifiable (test, production, R&D) - also be careful when accessing databases and using ENV variables / build variables
- Provide context with
LABELS
- Provide context with
- Do not reinvent the wheel! Use repo2docker
- Automate - no need to build and push manually
- Use a linter