Senior MLOps Engineer AWS-Focused ML Infrastructure
Keysight Technologies
We are expanding our engineering team with a dedicated MLOps Engineer specializing in AWS to support the deployment, scaling, and operationalization of machine learning solutions across our manufacturing and semiconductor analytics platforms. This role will serve as a critical bridge between our Machine Learning Engineers—focused on Generative AI and classical ML—and production environments, ensuring seamless, reliable, and efficient ML workflows.
You will collaborate closely with the Senior Machine Learning Engineer (GenAI Platform) and the Machine Learning Engineer (Classical ML and Predictive Analytics) to automate pipelines, monitor model performance, and manage infrastructure for high-stakes applications like test plan generation, anomaly detection, predictive maintenance, and market intelligence. In our AWS-centric ecosystem, you will leverage best-in-class tools to enable rapid iteration while maintaining compliance, security, and cost efficiency in regulated industrial settings.
This position is perfect for a mid-level professional with a passion for DevOps in ML contexts, who excels at turning complex models into robust, production-ready systems.
Key Responsibilities
- Design, implement, and maintain end-to-end MLOps pipelines on AWS, including CI/CD automation for model training, validation, deployment, and retraining, using services like SageMaker, CodePipeline, CodeBuild, and Step Functions.
- Support the Generative AI platform by operationalizing AWS Bedrock workflows, including RAG pipelines, vector databases (e.g., via OpenSearch or Pinecone integrations), Lambda functions, and agentic systems—ensuring scalability for large-scale data processing like historical test plans and news article summarization.
- Enable classical ML initiatives by deploying and monitoring models built with XGBoost, Scikit-learn, and NLP architectures (e.g., RNNs/LSTMs) on AWS infrastructure, incorporating drift detection for anomaly tracking in sensor data and competitor pricing monitoring.
- Manage infrastructure as code (IaC) using Terraform or CloudFormation to provision and optimize AWS resources, such as EC2 instances, S3 buckets, EMR for Apache Spark-based processing (supporting our PMA product), and ECS/EKS for containerized deployments.
- Implement comprehensive monitoring, logging, and alerting systems with CloudWatch, X-Ray, and third-party tools (e.g., Prometheus/Grafana integrations) to track model performance, detect anomalies, handle concept drift, and ensure high availability for customer-facing tools like Q&A chatbots and predictive maintenance advisors.
- Collaborate in an Agile environment with ML engineers, data scientists, and SRE teams to conduct A/B testing, version models, automate rollbacks, and optimize costs through auto-scaling and spot instances.
- Enforce security and compliance best practices, including IAM roles, VPC configurations, data encryption, and audit logging, to safeguard sensitive manufacturing data and meet industry standards.
- Troubleshoot production issues, perform root-cause analysis, and drive continuous improvements in ML operations, staying ahead of AWS innovations to enhance platform reliability and efficiency.
Job Qualifications:
Must-have qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related technical field.
- 3–5 years of experience in MLOps, DevOps, or cloud engineering roles, with a proven track record of deploying and managing ML models in production environments.
- Deep expertise in AWS services for ML and data workflows, including SageMaker (real-time endpoints, inference components, multi-instance/multi-variant deployments), Bedrock (provisioned throughput, cross-Region inference profiles for scaling & resilience), EMR (for Spark-based PMA workloads), Lambda, S3, ECR, and orchestration tools like Step Functions or Airflow.
- Proven experience with Amazon Elastic Container Registry (ECR): building, scanning for vulnerabilities, tagging, versioning, and pushing custom Docker images for inference containers (including Bring-Your-Own-Container patterns for custom ML frameworks, vLLM, or deep learning environments); managing ECR lifecycle policies, replication across regions, and secure access via IAM roles.
- Strong proficiency in EC2-based ML deployments and infrastructure: selecting optimal instance types (e.g., ml.g family for GPU-heavy GenAI inference, g5/g6 for newer accelerators), configuring Auto Scaling Groups, managing spot instances for cost optimization, and handling EC2 fleets for custom hosting when SageMaker/Bedrock abstractions are insufficient.
- Expertise in load balancing & scaling for ML inference: configuring and troubleshooting Application Load Balancers (ALB) or Network Load Balancers (NLB) integrated with SageMaker endpoints or ECS/EKS tasks; implementing SageMaker's built-in routing strategies (e.g., least outstanding requests for latency optimization); setting up auto-scaling policies (target tracking on CPU utilization, invocations per instance, or custom CloudWatch metrics); using cross-Region inference profiles in Bedrock for burst handling and global resilience; and ensuring high availability through multi-AZ deployments with minimum instance counts 2.
- Demonstrated ability to resolve common deployment issues in production ML environments, including: cold-start latency in serverless/containerized inference, container pull failures from ECR, IAM permission misconfigurations causing access denied errors, model artifact corruption or version mismatches post-deployment, endpoint update failures without downtime (using blue/green or canary strategies), drift/throttling in high-concurrency scenarios (e.g., 429 errors in Bedrock), unhealthy instance recovery, and debugging via CloudWatch Logs, X-Ray traces, and SageMaker Model Monitor alerts.
- Proficiency in IaC tools such as Terraform or CloudFormation to provision and optimize AWS resources (e.g., ECR repositories, EC2 fleets, ALBs, SageMaker endpoints, and auto-scaling configurations) in a repeatable, auditable manner.
- Strong scripting and programming skills in Python (with libraries like Boto3), along with experience in CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline — with specific focus on automated ECR image builds, model artifact promotion, and safe endpoint updates.
- Familiarity with monitoring and observability stacks (e.g., CloudWatch, ELK Stack) and ML-specific tools for versioning (e.g., MLflow) and experiment tracking.
- Experience in Agile methodologies, with hands-on participation in sprints, code reviews, and cross-functional problem-solving.
- Solid understanding of ML concepts, including model drift, bias detection, and serving patterns, to effectively support both GenAI and classical ML teams.
Strongly preferred
- Fluency in English.
- Prior exposure to manufacturing, semiconductor, or industrial IoT domains, where data reliability and low-latency inference are critical.
- Certifications such as AWS Certified Machine Learning – Specialty, AWS Certified DevOps Engineer, or equivalent.
- Experience with hybrid ML setups, integrating on-premises data with cloud services, or handling large-scale NLP/Numerical data pipelines.
- Knowledge of security frameworks like SOC 2 or ISO 27001, and tools for automated testing of ML infrastructure.
- Prior experience troubleshooting and optimizing SageMaker multi-instance/multi-variant endpoints (including traffic shifting, shadow testing, and A/B deployments) and Bedrock inference profiles (Priority/Flex tiers, cross-Region routing for throughput and cost balancing).
- Hands-on work with EC2 Auto Scaling in ML contexts, including handling GPU instance availability constraints, spot interruption recovery, and cost-effective scaling for bursty inference workloads.
- Familiarity with advanced deployment patterns such as blue/green deployments, canary rollouts, and rollback automation to minimize production impact during model updates.
If you are a pragmatic, AWS-savvy engineer excited about operationalizing cutting-edge ML in mission-critical industries, this role offers the opportunity to build resilient systems that directly impact our company's innovation and customer outcomes. Join a dynamic team committed to excellence, with ample room for growth and technical leadership.
- ...seeking an experienced Software Development Engineer to design and build end-to-end full-... ...data processing. You will integrate AI/ML capabilities, including large language... ...interactions · Design and manage cloud-based infrastructure (AWS) · Apply best practices in code quality...
- ...model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes. Job Summary Senior CSE Qualifications Our Commitment We’re trailblazers that dream big, take risks, and challenge cybersecurity’s status quo. It...
- ...handler/prober conversion & troubleshooting support Required Experience and Qualifications: Diploma in Mechanical and Material Engineering or equivalent. Experience in the following Semicon processes is preferred Able to work 12-hour shifts, including weekends and...
- ...with both Mechanical and Electrical in all kinds of Centrifugal Water Pumps, Submersible Pumps and positive displacement pumps, Diesel engines Assembling ,dismantling , Installing, Testing Commissioning Repair Troubleshooting .Servicing overhauling Preventive and Breakdown...
- ...Job Description We are seeking a highly skilled and motivated Senior Software Engineer to join our LC/MS R&D team. In this role, you will play a pivotal part in accelerating the development and refinement of our next-generation LC/MS instrumentation. You will architect,...
- ...solutions used to produce various electronic products, such as notebook motherboards, automotive ECUs, and smart meters. As a senior Firmware Engineer, you will be an integral part of a multidisciplinary team of R&D engineers developing next-generation electrical systems and...
- ...(Applied Angstrom Technology) is a start-up semiconductor equipment company with headquarters in Singapore. The Automation Software Engineer will play a main role in developing the automation systems for new semiconductor manufacturing equipment (FEOL). At this role, you will...
- ...Role: The Senior R&D Electrical Engineer is expected to operate with a high level of independence and play a leading role in guiding junior engineers while managing more complex and critical designs. This role requires strong technical expertise and problemsolving capabilities...
- ...Automotive Radar, Space and Satellite, and Internet Infrastructure. The aerospace defense communication team is looking for a senior engineer / GTM / product management lead for signal... ..., drone detection, GNSS resilience, and AI/ML waveform classification Working with...
- ...all points in their careers. Responsibilities As a Senior Schematic & PCB Design Engineer in our Electronic Industrial Solutions Group (EISG), you... ...experience in analog circuit design and analysis, with a focus on fine-pitch PCB layouts for high-density, compact electronic...
- ...seeking an Expert R&D Software Engineer to lead the design and development... ...frontend, backend, and AI/ML systems. This role requires deep... ...engineering standards · Mentor senior and junior engineers, fostering... ...(CI/CD, observability, MLOps, cloud-native architecture) ·...
- ...products, such as notebook motherboards, automotive ECUs, and smart meters. As a Research Engineer, you will play a crucial role in a multidisciplinary team of R&D engineers focused on developing next-generation electrical systems and components. Your primary focus will be...
- ...quality initiatives during the new product development stage, with a focus on ensuring robust integration of semiconductor manufacturing... ...through volume production ramp-up. - Collaborate with process engineering, fabrication, and integration teams to establish and monitor...
- ...We are seeking an experienced Compliance Engineer to serve as Product Line Regulatory Lead. This Advance Level role provides on‑site regulatory leadership with a strong manufacturing focus, ensuring low level materials used and products assembled meet legal, safety and industrial...
- ...We are seeking a highly driven Process Reliability Engineer to own and elevate the reliability of manufacturing processes in a sustaining... ...Effective stakeholder management across cross-functional teams Strong focus on process discipline, prevention, and compliance Proven track...
- ...of trusted advisors delivers recruitment services with a strong focus on both candidate and hiring manager experience. We operate using... ...As a member of the TA COE reporting to the Head of APJC TA, the Senior TA Partner, APJC will lead full lifecycle recruiting—including sourcing...
- ...Responsibilities We are seeking a Software Engineer to contribute to the design and... ...services, while integrating AI/ML capabilities into real-world... .... You will work closely with senior engineers and R&D teams to... ...· Exposure to cloud platforms (AWS) and development practices such...
- ...Broadcom Central Engineering is looking for a qualified individual who can provide best-in-class development and support for Analog/Mixed Signal tool flows focused on advanced FINFET process nodes. This individual will be hands-on, heavily involved in day-to-day design/layout...
- Job no: 492760 Work type: Permanent (Full-time) Location: Yishun Park Categories: Allied Professional Our team of social workers uncovers and enables the strengths of persons with disabilities and their families through person centred planning and supports them ...
- We are seeking a skilled professional to provide essential secretarial and administrative support within a dynamic environment. The ideal candidate will play a pivotal role in enhancing operational efficiency through effective management and coordination. Responsibilities ...
- ...We are looking for Network Engineers who experienced in designing, implementing, and maintaining enterprise-level network infrastructure: Load Balancers (Core): F5 BIG-IP: Advanced configuration... ...environments Alternative products: AWS Network Firewall, Google Cloud Armor...
- ...As a member of Agilent’s Materials Engineering team, you are tasked to manage a list of assigned commodities. The job requires the Materials Engineer (ME) to collaborate with internal teams and external parties. Externally, the ME will work closely with Agilent’s global suppliers...
- ...volume semiconductor devices. Responsible for NPI activity, risk assessments, process improvement & development and production engineering support of high volume IC substrate manufacturing. Work with Broadcom internal cross functional engineering teams resolve issues...
- ...Job Description R&D engineer position available in design and physical implementation of high-performance System-On-Chip ASICs. Key competencies required are: Working experience leading team in physical design implementation of large ASICs (500 to 800 million gates complexity...
- Join a dedicated team in providing high-quality, patient-centered home care services. This role invites you to make a tangible impact in patients' lives while fostering a compassionate and supportive environment. Responsibilities ~ Deliver safe and individualized nursing...
- ...Applications Engineer The candidate will support business development for isolation products in automotive applications. This is a role where the candidate will promote automotive products through technical support to customers. He will articulate the value proposition or...
- ...test coverage and test time reduction. Work with multi-functional team to meet new product release schedule Proficiency in C++ Programming Degree in Electrical/Electronics Engineering Fresh graduates with good academic result are welcome to apply. R026070...
- ...starting with AP region, in his/her course of analysis Maintaining and enhancing technical expertise through high linkage with R&D and Engineering to expedite revenue growth Influencing critical Design-In activities with identified key customers and channels In-depth...
- ...design, test, and characterization of new isolation products. Carry out R&D programs with cross-functional teams such as IC design engineering, test engineering, process engineering, product engineering, quality engineering, manufacturing team, and contract manufacturer to...
- ...equipment company with headquarters in Singapore. The Mechanical Engineer will have the opportunity to design and test mechanical... ...semiconductor manufacturing equipment. The role will partner with senior engineers to brainstorm concepts, conduct feasibility studies, work...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior MLOps Engineer AWS-Focused ML Infrastructure. Be the first to apply!

