Job Responsibilities • Understand complex business requirements • Design and develop ETL pipeline for collecting, validating and transforming data according to the specification • Develop automated unit tests, functional tests and performance tests. • Maintain optimal data pipeline architecture • Design ETL jobs for optimal execution in AWS cloud environment • Reduce processing time and cost of ETL workloads • Lead peer reviews and design/code review meetings • Provide support for production support operations team • Implement data quality checks. • Identify areas where machine learning can be used to identify data anomalies
Experience & Qualifications • 7+ years of experience in programming language Java or Scala • 7+ years of experience in ETL projects • 5+ years of experience in big data projects • 3+ years of experience with API development (REST API's) • Believes in Scrum/Agile, and has deep experience delivering software when working on teams that use Scrum/Agile methodology • Strong and creative analytical and problem-solving skills
Required Technical Skills & Knowledge
Strong experience in Java or Scala
Strong experience in big data technologies like AWS EMR, AWS EKS, Apache Spark
Strong experience with serverless technologies like AWS Dynamo DB, AWS Lambda
Strong experience in processing with JSON and csv files
Must be able to write complex SQL queries
Experience in performance tuning and optimization
Familiar with columnar storage formats (ORC, Parquet) and various compression techniques