Job Description: Client is seeking a team member to join our Microbiology R&D Development Science functional team. In this role you will develop and maintain end-to-end data and machine Learning pipelines for clinical and verification studies. We're looking for associates who thrive in a team-oriented, goal focused environment.
The Data Engineer is responsible for development and implementation of end-to-end Ops pipelines to support ML model deployment throughout the entire ML lifecycle. This position is part of the data science located in Sacramento, California and will be a hybrid role. The data engineer will be a part of the development science functional group and report to the data science manager. If you thrive in a cross functional team and want to work to build a world-class biotechnology organization-read on.
Responsibilities
Collaborate with stakeholders to understand data requirements for ML, Data Science and Analytics projects.
Assemble large, complex data sets from disparate sources, writing code, scripts, and queries, as appropriate to efficiently extract, QC, clean, harmonize and visualize Big Data sets.
Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL, Spark, AWS 'big data' technologies.
Develop and Design data schemas to support Data Science team development needs
Identify, design, and implement continuous process improvements such as automating manual processes and optimizing data delivery.
Design, Develop and maintain a dedicated ML inference pipeline on AWS platform (SageMaker, EC2, etc.)
Deployment of inference on a dedicated EC2 instance or Amazon SageMaker
Establish a data pipeline to store and maintain inference output results to track model performance and KPI benchmarks
Document data processes, write data management recommended procedures, and create training materials relating to data management best practices.
Required Qualifications
BS or MS in Computer Science, Computer Engineering, or equivalent experience.
5-7 years of Data and MLOps experience developing and deploying Data and ML pipelines.
5 years of experience deploying ML models via AWS SageMaker, AWS Bedrock.
5 years of programming and scripting experience utilizing Python, SQL, Spark.
Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda etc
Hands-on experience with model monitoring, drift detection, and automated retraining processes
Hands-on experience with CI/CD pipeline implementation using tools like GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, Blue Ocean
Experience working in an Agile/Scrum based software development structure
5-years of experience with data visualization and/or API development for data science users
*** If this position may be interested to you, please email me back at somp767@kellyservices.com with your most up to date resume in word format) and advise the best time and number at which you can be reached****