Speaker
Tania Allard
Material
Note
- common pain points in DS and ML
- complex setup / deps
 - reliance on data / database
 - fast evolving projects
 - are containers secure enough?
 
 - 
how is it different from web apps?
- not every deliverable is an app or a model
 - relies on data
 - Mixture of wheels and compiled packages
 - Security access levels - for data and software
 - Mixture of stakeholders:
- data scientists
 - software engineers
 - ML engineers
 
 
 - 
best practices
- Split complex 
RUNstatements and sort them - Prefer 
COPYto add files - install only necessary packages
 - explicitly ignore files
- documentations
 - never add data
 - secrets
 
 
 - Split complex 
 - cookiecutter template
 
Top Tips
- Rebuild your images frequently - get security updates for system packages
 - Never work as root / minimize the privileges
- run as non-root user
 - minimize capability
 
 - You do not want to use Alpine Linux (go for buster, stretch or the Jupyter stack)
 - pin / version EVERYTHING (use pip-tools, conda, poetry or pipenv)
 - Leverage build cache
 - Use one Dockerfile per project
 - Use multi-stage builds
- fetch and manage secrets in an intermediate layer
 - creates smaller image
 
 - Make your images identifiable (test, production, R&D) - also be careful when accessing databases and using ENV variables / build variables
- Provide context with 
LABELS 
 - Provide context with 
 - Do not reinvent the wheel! Use repo2docker
 - Automate - no need to build and push manually
 - Use a linter