Senior (5+ years of experience)
Summary of the Role
Senior ML Ops Engineer to lead the design, development, and management of machine learning (ML) operations and infrastructure. This seasoned professional will work closely with data scientists and engineers to deploy, monitor, and maintain ML models in production environments, ensuring scalable, secure, and efficient operations. The ideal candidate will have a deep understanding of ML models, data pipeline workflows, and cloud-based technologies, combined with a strong operational mindset.
Required Skills
Proficiency in scripting languages such as Python or Bash.
Strong understanding of DevOps principles and methodologies.
Familiarity with ML frameworks (TensorFlow, PyTorch, etc.) and data warehousing.
Expertise in automated deployment, scaling, and management of containerized applications.
Ability to implement robust security measures for sensitive data.
Strong problem-solving skills and ability to work cross-functionally.
Excellent communication and project management capabilities.
Qualifications
Bachelor's or Master's degree in Computer Science, Engineering, or related field.
5+ years of relevant experience in a DevOps, MLOps, or similar role within a data-driven environment.
Proven track record of managing ML infrastructure and pipelines in a production setting.
Experience with cloud services (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).
Understanding of machine learning lifecycle, including data management, model development, and deployment.
Knowledge of best practices for maintaining high levels of ML model performance.
Experience with monitoring tools and technologies for ML systems.
Responsibilities
Develop and maintain reliable, scalable, and secure ML infrastructure and pipelines.
Collaborate with data scientists to operationalize machine learning models and accelerate the ML lifecycle from concept to production.
Implement and manage continuous integration/continuous deployment (CI/CD) pipelines for ML systems.
Monitor ML model performance and ensure models are up-to-date and delivering accurate predictions.
Identify and execute on opportunities to improve and streamline operational practices.
Create and maintain documentation for ML operations processes and best practices.
Ensure compliance with data privacy and protection policies.