Role: Data Engineer - Clinical Decision Support Solutions
Duration: 6+ Months
Hybrid
Job Description: Client is seeking a team member to join our Microbiology R&D Development Science functional team. In this role you will develop and maintain end-to-end data and machine Learning pipelines for clinical and verification studies. We're looking for associates who thrive in a team-oriented, goal focused environment.
The Client's Data Engineer is responsible for developing and implementing end-to-end Ops pipelines to support ML model deployment throughout the entire ML lifecycle. This position is part of the data science located in Sacramento, California and will be a hybrid role. The data engineer will be a part of the development science functional group and report to the data science manager. If you thrive in a cross functional team and want to work to build a world-class biotechnology organization-read on.
Responsibilities • Collaborate with stakeholders to understand data requirements for ML, Data Science and Analytics projects. • Assemble large, complex data sets from disparate sources, writing code, scripts, and queries, as appropriate to efficiently extract, QC, clean, harmonize and visualize Big Data sets. • Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL, Spark, AWS 'big data' technologies. • Develop and Design data schemas to support Data Science team development needs • Identify, design, and implement continuous process improvements such as automating manual processes and optimizing data delivery. • Design, Develop and maintain a dedicated ML inference pipeline on AWS platform (SageMaker, EC2, etc.) • Deployment of inference on a dedicated EC2 instance or Amazon SageMaker • Establish a data pipeline to store and maintain inference output results to track model performance and KPI benchmarks • Document data processes, write data management recommended procedures, and create training materials relating to data management best practices.
Required Qualifications • BS or MS in Computer Science, Computer Engineering, or equivalent experience. • 5-7 years of Data and MLOps experience developing and deploying Data and ML pipelines. • 5 years of experience deploying ML models via AWS SageMaker, AWS Bedrock. • 5 years of programming and scripting experience utilizing Python, SQL, Spark. • Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda etc • Hands-on experience with model monitoring, drift detection, and automated retraining processes • Hands-on experience with CI/CD pipeline implementation using tools like GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, Blue Ocean • Experience working in an Agile/Scrum based software development structure • 5-years of experience with data visualization and/or API development for data science users