The Data Scientist will be responsible for driving insights from the vast amounts of patient and environmental data available within our data warehouse.
Experience with machine learning and statistical analyses are needed.
Work closely with researcher teams to design analysis specifications, including input data specifications, data cleaning, algorithms, and interpretation of results.
Develop and implement algorithms on existing data warehouse records and identify new external data sources to be ingested to the data warehouse to strengthen analyses.
Analysis will address a wide variety of clinical and research outcomes.
Research and implement AI algorithms, apply off-the-shelf AI and data-centric tools, and collect, store, and maintain data.
The successful candidate will have demonstrated competence in developing highly scalable artificial intelligence systems with multiple dependencies across teams.
What gets you the job:
Programming Languages
Python (for preprocessing, data analysis, machine learning, scripting)
SQL (for database querying and management)
SAS (common in healthcare data analysis)
MATLAB (for algorithm development, though less common in healthcare)
R (for statistical computing and bioinformatics)
Data Science & Machine Learning Frameworks
TensorFlow, PyTorch, Keras (for deep learning and complex machine learning, including neural networks and advanced AI)
Scikit-Learn (for classical machine learning)
XGBoost or LightGBM (for gradient boosting in structured data)
Large Language Models (LLMs) (for text generation, summarization, etc.)
AWS SageMaker (for end-to-end machine learning development, training and scalable machine learning in a managed cloud enviornment)
Natural Language Processing (NLP) Tools and Frameworks (e.g., Hugging Face, AWS Comprehend Medical for extracting insights from clinical text data)
AWS Bedrock (for accessing pre-trained LLMs and foundation models without managing infrastructure)
Healthcare-Specific Knowledge
HL7 (Health Level Seven International standards for electronic health information exchange)
FHIR (Fast Healthcare Interoperability Resources standard for exchanging healthcare information electronically)
ICD-10 Coding (for medical diagnosis and procedure classification)
HIPAA Compliance (handling sensitive patient data securely)
Clinical Terminologies (e.g., SNOMED, LOINC)
Data Tools & Platforms
SQL-based Databases (e.g., PostgreSQL, MySQL, Microsoft SQL Server)
Data Warehousing (e.g., AWS Redshift, Google BigQuery)
Data Visualization Tools (e.g., Tableau, Power BI, Plotly)
NoSQL Databases (e.g., MongoDB, Cassandra)
Apache Hadoop or Apache Spark (for big data processing)