-Design, develop, and manage data pipelines and workflows to enable efficient and accurate data processing using Trino SQL/Spark SQL warehoused in HDFS datasets.
-Effectively performs code designs and reviews/approves test cases.
-Implement data quality checks and audits to maintain high data accuracy and integrity.
-Produces elegant and efficient designs, high performance, and scalable code that allows for easy extension to future needs.
-Collaborate with cross-functional teams, especially data engineering, to understand data requirements and implement robust data solutions.
-Work closely with data domain experts to gather data requirements, translate business needs into technical specifications, and communicate data insights effectively for sales representative workflow efficiency.
-Optimize data storage for performance and scalability, ensuring efficient data Extraction, Transformation and Load (ETL).
-Develop and maintain documentation related to data pipelines, QA, metrics, and data policy as it relates to best practice, compliance and GDPR.
-Stay up to date with industry best practices and emerging trends in data engineering and analytics, including Generative AI as it impacts our data operations.
Qualifications:
-2+ years in using SQL and experience optimizing SQL databases for performance (Trino SQL, or Spark).
-Demonstrated experience in managing data pipelines (like HDFS), data repository (like GitHub), workflows (like Apache Airflow), and ETL (best practice coding).
-Ability to communicate complex technical concepts to both technical and non-technical individuals.
-Experience working with multiple stakeholders, setting project priorities and delivering on Objectives and Key Results (OKRs).
-Experience automating script changes in Python
Preferred Qualifications:
-BA/BS in engineering, computer science, or related technical field (such as statistics, or data science).
-Excellent analytical skills, designing data workflows and analyzing data for anomalies, or setting data quality thresholds via automated solutions.
-Familiarity with data governance principles
-Program Manager experience
-Demonstrated experience in managing data pipelines in HDFS