AI Engineer (ML Systems & Infrastructure)
16000 - 25000 SGDSWAPETECH PTE. LTD.
About the Role
We are looking for exceptional AI Engineers to build the next generation of AI infrastructure and Machine Learning Systems(MLSys).
This role focuses on large-scale system infrastructure rather than model research. You will work on the core foundations that power large-scale AI training and inference systems, including Kubernetes cluster management, RDMA networking, unified KV Cache architecture, observability platforms, distributed systems, GPU orchestration, and CUDA kernel optimisation.
You will collaborate closely with AI researchers, infrastructure architects, networking engineers, and platform teams to maximize the efficiency, scalability, and reliability of AI systems.
Key Responsibilities
AI Infrastructure & Kubernetes
- Design, deploy, and operate large-scale Kubernetes-based AI infrastructure.
- Develop cluster governance frameworks, scheduling policies, resource isolation, and multi-tenancy capabilities.
- Build and optimize GPU orchestration platforms using Kubernetes, Slurm, Volcano, Kueue, Ray, and related technologies.
- Improve cluster utilization, reliability, elasticity, and operational efficiency.
RDMA & High-Performance Networking
- Design and optimize RDMA, InfiniBand, RoCE, and high-speed Ethernet fabrics for distributed AI workloads.
- Optimize GPU-to-GPU and GPU-to-NIC communication paths.
- Improve distributed communication efficiency for large-scale training and inference.
- Analyze and eliminate networking bottlenecks across AI clusters.
Unified KV Cache & Distributed Memory Systems
- Design and implement unified KV Cache architecture across:
- GPU HBM
- CPU Memory
- RDMA-accessible Memory
- NVMe SSD
- Distributed Storage
- Develop efficient KV Cache sharing, migration, offloading, and scheduling mechanisms.
- Optimize latency and throughput for large-scale inference systems.
CUDA & System Performance Optimisation
- Develop and optimize CUDA kernels for training and inference workloads.
- Profile and optimize GPU compute, memory, communication, and scheduling efficiency.
- Contribute to low-level optimization of AI frameworks and inference engines.
- Work on technologies such as FlashAttention, TensorRT, Triton, NCCL, CUTLASS, and custom operators.
Observability & Reliability
- Build end-to-end observability platforms for AI infrastructure.
- Design monitoring, logging, tracing, alerting, and troubleshooting frameworks.
- Develop performance dashboards and SLO-driven operational systems.
- Improve maintainability, debuggability, and operational excellence of AI platforms.
Automation & Platform Engineering
- Build automation tools for deployment, provisioning, monitoring, and operations.
- Develop Infrastructure-as-Code (IaC) solutions using Terraform, Ansible, and related tools.
- Build CI/CD pipelines and engineering productivity platforms.
- Improve platform scalability and operational efficiency.
Required Qualifications
Education
- Bachelor's degree or above in Computer Science, Software Engineering, Electrical Engineering, or related fields.
Technical Skills
- Strong software engineering and programming skills.
- Excellent system design capability and strong engineering craftsmanship.
- Strong coding standards and code quality awareness.
- Strong sense of ownership, accountability, and execution.
System Fundamentals
Strong understanding of:
- Operating Systems
- Computer Networks
- Distributed Systems
- Data Structures and Algorithms
- Linux Internals
Programming Languages
Proficiency in one or more of:
- C++
- Go
- Python
- Rust
AI Infrastructure Experience
Hands-on experience in one or more of:
- Kubernetes
- GPU Infrastructure
- Distributed Systems
- AI Infrastructure
- HPC (High Performance Computing)
- Cloud-Native Platforms
Networking Experience
Experience with:
- RDMA
- InfiniBand
- RoCE/RoCEv2
- GPUDirect
- NCCL
- UCX
- High-Speed Ethernet
GPU & Performance Engineering
Experience with:
- CUDA
- GPU Performance Optimization
- Multi-GPU Systems
- Distributed Training
- Distributed Inference
Preferred Qualifications
- Experience building large-scale AI training or inference clusters.
- Experience with vLLM, SGLang, TensorRT-LLM, Triton, DeepSpeed, Megatron-LM, Ray, or similar frameworks.
- Experience with unified KV Cache systems, memory hierarchy optimisation, or distributed storage systems.
- Experience with Kubernetes GPU Operator and NVIDIA NetworkOperator.
- Experience with Prometheus, Grafana, Loki, OpenTelemetry, and observability platforms.
- Experience contributing to open-source projects such as: vLLM, FlashAttention, CUTLASS, TVM, MLIR, Triton, Kubernetes, NCCL
- Experience working across AI Infrastructure, HPC, Networking, and Silicon Systems is highly desirable.
21000 - 25000 SGD
...Are you an AI/ML Engineer who loves to build and implement innovative solutions that create value at scale? If so, you might be the perfect... ...stakeholders to design, deploy, and operationalize state-of-the-art AI/ML systems that solve complex business problems. You will also drive the...8000 - 10000 SGD
...ecosystems. Enable end-to-end ML model operationalization and performance... ...processing and development of engineering tools and applications.... ...develop highly scalable, Real time systems using Hadoop ecosystem... ...keras, Hugging face (NLP/NLQ/Gen AI use cases) Full-Stack Development...15000 - 16000 SGD
...Work on cutting-edge AI infrastructure and heterogeneous GPU system High-impact role shaping next-generation large-scale AI and LLM system About Our... ...workloads worldwide. The organisation is known for its strong engineering culture, innovation in AI platforms, and commitment to...6000 - 8500 SGD
...Role Overview We are seeking a Applied AI Engineer to embed directly with our business units... ...deliverable solutions Build agentic AI systems • Design and build production-grade agentic... ...AI workloads Maintain & enhance ML/DL models • Own, maintain, and improve production...15000 - 20000 SGD
...analytics into execution, and building technology infrastructure that supports the convergence of traditional... ...Details We are seeking an experienced AI/LLM Engineer to design and build intelligent, language-driven agentic systems that translate user intent into structured...10000 - 19000 SGD
...what you could accomplish. The Customer Systems - Gen AI Solutions team is responsible for building... ...hands on to a team that consists of engineers, data scientists & researchers to enhance... ...Qualifications ~5+ years of hands-on experience in ML, backend engineering, data engineering...7000 - 10500 SGD
...Lakehouse architecture. Develop and support data pipelines powering AI/ML models and analytics solutions. Partner with Data Scientists to produce machine learning models. Build feature engineering pipelines and reusable ML datasets. Implement MLOps practices,...4000 - 7500 SGD
...technically strong and analytically driven Data Engineer / Analyst to design, build, and maintain end-to-end data and AI/ML solutions. You will work across the full... ...cloud sources Build and manage enterprise data systems including data lakes, analytics platforms, and...4000 - 7000 SGD
...Support the deployment and implementation of AI infrastructure and data centre projects. Coordinate... ..., technology partners, and internal engineering teams. Assist in the design and... ...solutions, including GPU servers, storage systems, networking equipment, and cooling...5500 - 8000 SGD
...We are looking for an experienced HPC Systems Engineer to support and operate large-scale Linux... ..., and maintain Linux-based HPC infrastructure, including compute nodes, storage platforms... ...workloads Support compute-intensive, AI, and data-driven applications Advise...5500 - 7500 SGD
...We are looking for an AI-driven AutomationEngineer to build and implement self-healing infrastructure automation usingAI, RPA, and orchestration tools. This role focuses... ...handling Work with APIs (e.g., NetBackup) and system integration Integrate: Monitoring tools (Elastic...15000 - 20000 SGD
...design and build of mission-critical, real-time detection systems Own and shape cutting-edge AI/ML-driven detection strategies About Our Client Our... ...workflows for critical use cases. Partner across engineering, data, product, risk, and operations; lead and grow a...- ...Role Overview The AI/ML Specialist will play a pivotal role in designing and implementing... ...speech recognition, NLP, recommendation systems, and time series forecasting. Lead the... ...strategies for AI solutions in cloud infrastructure at scale. Drive technical design...
- ...Software Engineer (Distributed Systems - Python) Role: Software Engineer (Distributed Systems - Python) Client: Elite Tech Firm Compensation... ...high-performance, distributed systems for large-scale ML infrastructure. Key responsibilities include: Design and build...
8000 - 11000 SGD
...Responsibilities: Work with product owners, engineering teams, and industry partners throughout... ...with strong technical knowledge in AI Good understanding of the latest research... ...3D geometry, vector math, and coordinate systems Proficiency in relevant 3D modeling or...4000 - 4600 SGD
...We are seeking an expert Senior AI Infrastructure & Networking Engineer to lead the architecture, deployment, and optimization of our next-generation... ...up to 512+ nodes) featuring NVIDIA Blackwell UltraB300 systems. You will bridge the gap between heavy physical infrastructure...5300 - 6000 SGD
...We are seeking an expert Senior AI Infrastructure & Networking Engineer to lead the architecture, deployment, and optimization of our next-generation... ...up to 512+ nodes) featuring NVIDIA Blackwell UltraB300 systems. You will bridge the gap between heavy physical infrastructure...6000 - 9000 SGD
...are looking for a talented Full Stack Engineer with a robust background in AI to develop and maintain innovative... ...with experience in implementing AI/ML features. Responsibilities: Design... ...security, and reliability of developed systems. Participate in code reviews,...5000 - 8300 SGD
...About the Role We are seeking System Engineers to support the design, implementation, administration, and maintenance of enterprise infrastructure environments supporting mission-critical operations. This role offers the opportunity to work on large-scale infrastructure...13000 - 16000 SGD
...Role We are seeking a skilled Machine Learning Engineer to join our AI team and build cutting-edge recommendation systems for gaming content. You will leverage state-of-... ...translate business requirements into practical AI/ML solutions that enhance overall user experience....11000 - 13000 SGD
...next decade of operations. The Intelligence Layer — the AI/ML engineering division that will make those platforms smart — is being built... ...automated decisions. The data is already available. The infrastructure is being rebuilt. What is missing is the AI/ML engineering layer...- ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...CoreWeave combines superior infrastructure performance with deep... ...across data centers, hardware systems, and customer workloads to maintain... ...with the internal and customer engineering teams, offering valuable...
- ...FEQ227R222 Field Engineering, AI/ML Product Specialist As the AI/ML Product Specialist, you will be defining our technical go to market strategy... ...methodologies and have deployed large scale production systems ~ Deep understanding of LLM application architectures (RAG,...
- ...role in trends such as 5G, cloud computing, AI, and autonomous driving. But we’re more... ...Operations Intelligence or otherwise known as AI/ML Engineer supports the development, testing, and... ..., supply chain, quality, and production systems. Assist in developing generative AI...
9000 - 10500 SGD
...opportunity for an experienced AWS Artificial Intelligence (AI)/Machine Learning (ML) Architect to join our AWS services team and play a key role... ...Desirable Qualifications AWS Certified Machine Learning Engineer – Associate AWS Certified Machine Learning – Specialty certification...5000 - 6000 SGD
...environments. • Ensure high availability, redundancy, and system performance across infrastructure components. • Manage VMware/Hyper‑V platforms and... ...• Closely collaborate with Service Delivery Manager and Engineers to support the fulfilment of contract obligations. • Manage...6500 - 8500 SGD
...Job Summary Role: AI Systems Analyst Start: ASAP Duration: 12 Months Location: Singapore... ..., Information Systems, Data Science, Engineering, or a related field. - Minimum 6+ years... ...to support reporting, analytics, and AI/ML workloads. - Perform detailed effort...- ...unique combination of proprietary infrastructure and software, we empower over... ...done” end-to-end. You use AI to work smarter and solve... ...About The Team As the engineering organization matures, the infrastructure... ...software applications or systems that provide good self-...
7000 - 8000 SGD
...ML Engineer – L3 Location: Singapore Experience: 5–8 Years Salary: Up to SGD 8K Employment Type: Contract Job Description... ...Responsibilities Design and develop highly scalable real-time systems using Hadoop ecosystem components such as Iceberg, Spark, Ozone...- ...Workato delivers enterprise infrastructure for the agentic era, redefining... ...applications, processes, and AI into a single, governed platform... ...hiring a Senior Infrastructure Engineer to join our Global Core... ...design, scale, and secure the systems that power Workato’s automation...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Engineer (ML Systems & Infrastructure). Be the first to apply!

