Design, implement, and optimize distributed systems and infrastructure components to support large-scale machine learning workflows, including data ingestion, feature engineering, model training, and serving.
Develop and maintain frameworks, libraries, and tools that streamline the end-to-end machine learning lifecycle, from data preparation and experimentation to model deployment and monitoring.
Architect and implement highly available, fault-tolerant, and secure systems that meet the performance and scalability requirements of production machine learning workloads.
Collaborate with machine learning researchers and data scientists to understand their requirements and translate them into scalable and efficient software solutions.
Stay current with advancements in machine learning infrastructure, distributed computing, and cloud technologies, integrating them into our platform to drive innovation.
Mentor junior engineers, conduct code reviews, and uphold engineering best practices to ensure the delivery of high-quality software solutions.
What it takes to catch our eye:
Strong technical expertise in designing and building scalable ML infrastructure.
Experience with distributed systems and cloud-based ML platforms.
Proficiency in programming languages such as Python, Java, or Scala.
Deep understanding of ML workflows, including data pipelines, model training, and deployment.
Passion for innovation and eagerness to implement the latest advancements in ML infrastructure.
Strong problem-solving skills and ability to optimize complex systems for performance and reliability.
Collaborative mindset with excellent communication skills to work across teams.
Ability to thrive in a fast-paced, dynamic environment with evolving technical challenges.
Job Classification
Industry: InternetFunctional Area / Department: Data Science & AnalyticsRole Category: Data Science & Machine LearningRole: Data Science & Machine Learning - OtherEmployement Type: Full time