We are looking for a highly skilled and motivated Senior Software Engineer to join our Machine Learning Platforms team. As a Senior Software Engineer, you will be responsible for designing, developing, and scaling ML infrastructure that supports the full ML lifecycle-from data ingestion to model deployment and monitoring. This role requires a deep understanding of software engineering best practices, cloud-based services, and modern ML frameworks. You will collaborate closely with data scientists, DevOps engineers, and other stakeholders to build robust, reliable, and scalable platforms that facilitate rapid experimentation and deployment.
Key Responsibilities:
Platform Development: Design, build, and maintain scalable and efficient ML infrastructure, including feature stores, model training pipelines, model deployment frameworks, and monitoring systems.
Collaboration: Work closely with data scientists and ML engineers to understand their needs, define requirements, and translate them into platform capabilities that support end-to-end ML workflows.
System Architecture: Contribute to the architecture and design of complex, high-performance ML systems that can handle large-scale data processing and model deployment at production level.
Tooling & Automation: Develop and enhance tooling to support the ML lifecycle, including data preprocessing, feature engineering, model versioning, and automated testing of models and pipelines.
Performance Optimization: Identify and implement improvements in system efficiency, performance, and reliability across ML workflows and infrastructure.
Documentation & Best Practices: Create detailed documentation for platform tools and components, and promote best practices in ML model development and deployment.
Mentorship: Provide technical mentorship and guidance to junior engineers, fostering a culture of learning and continuous improvement.
Qualifications:
Educational Background: Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
Experience: 5+ years of professional experience in software engineering, with at least 2 years focusing on machine learning infrastructure or platforms.
Technical Skills:
Programming Languages: Strong proficiency in Python and experience with another language (e.g., Java, C++, or Go).
Machine Learning Frameworks: Familiarity with ML frameworks such as TensorFlow, PyTorch, or Scikit-Learn.
Data Pipelines & ETL: Experience in building and maintaining data pipelines, ETL processes, and data lakes.
Cloud Platforms: Hands-on experience with cloud services (e.g., AWS, GCP, Azure) and containerization/orchestration (Docker, Kubernetes).
MLOps Tools: Knowledge of MLOps tools and libraries, such as MLflow, TFX, Airflow, or Kubeflow, for model tracking, deployment, and monitoring.
Databases: Proficiency in SQL and experience with NoSQL databases for large-scale data management.
Soft Skills:
Excellent problem-solving and analytical skills.
Strong communication skills to collaborate with cross-functional teams.
Ability to work autonomously and manage multiple projects in a fast-paced environment.