Role: MLOps Developer Position: Remote Duration :LongTerm Notes: Candidate must be willing to be part of on-call rotation. Your future duties and responsibilities •Monitoring - Actively monitor ML jobs through logging, metrics, and alerts. •Incident Management – Address job failures. Rerun jobs, maintain model/job versions in production. •Infrastructure Scalability - Scale infrastructure to handle production workloads. Tuning, capacity planning, cost optimization. •Compliance and Security - Apply security best practices. Encrypt data, manage secrets, enforce access controls. •Technical Support - Troubleshoot integration and infrastructure issues. •Operations Review – Monthly Incident management summary, Change management summary and other support metrics. •Model Re-training - Schedule regular retraining of models on new data. Manage code and config changes needed for retraining. •Model Evaluation - Continuously evaluate model performance on test data. Watch for statistical validation issues. •Data Validation - Monitor production data inputs. Check for statistical changes, missing values, outliers, errors. •Dependency Management - Manage libraries, frameworks, ML pipelines. Keep versions aligned across dev, test, prod environments. •Update Python package versions for Airflow and Lambda if libraries become deprecated and other components and services as needed to customize the environment. •DevOps – Use version control, Infrastructure as Code, and Configuration Management tools to develop automation which deploys code base. •Automation – Programmatically define repeatable and reliable tasks which reduce operational overhead. •Testing – programmatically enforcing quality standards related to the application, automation, and underlying ml models. Required qualifications to be successful in this role Requirements: 3-7 years of proven experience in the following areas: Cloud: •AWS CLI •AWS SDK (boto3) •Parameter store & Secrets Manager •Lambda Data Engineering: •SQL (Ability to write, Debug, and optimize queries) •DynamoDB •Redshift •Programming: IDE (VS Code / PyCharm), •Testing (unit test/pytest/mock) •Python Machine Learning: •ML Development (Sage Maker) •ML Programming (Pandas, NumPy, Scikit Learn, Matplotlib, lightgbm) •Modeling DevOps: •General Linux Terminal (ps, awk, grep, netstat, etc.) •Shell Scripting (BASH) •YAML, IaC (Terraform, Cloud Formation) •CI/CD (Code Build, Code Deploy) •Orchestration (Apache Airflow) •Containerization (Docker, Kubernetes)