Job Description: As a Senior Data Engineer, you will be responsible for designing, implementing, and optimizing our data architecture to support data lakes, data warehouses, and ETL processes. You will work in a cloud environment, primarily using AWS services, to manage large datasets and ensure data quality and integrity.
Key Responsibilities:
Design, build, and maintain scalable data pipelines using Cloudera CDP and other big data technologies.
Collaborate with cross-functional teams to gather requirements and translate them into technical solutions.
Implement and optimize ETL and ELT processes to support data integration and migration tasks.
Monitor and troubleshoot YARN Queue Manager scheduling issues and provide effective resolutions.
Analyze and optimize performance for SOLR searches, including crafting and executing cURL queries for data retrieval.
Develop and manage Hive databases and tables, including processes for data backup and recovery.
Perform data auditing, governance, and ensure adherence to data quality standards across platforms.
Utilize AWS services such as S3, EMR, and others for effective data storage and processing.
Implement DevOps practices (CI/CD) to automate data workflows and improve efficiency.
Stay updated on industry best practices and emerging technologies in the big data landscape.
Qualifications:
Proven experience in data engineering, with a strong focus on Cloudera CDP, Hadoop, and AWS.
Proficiency in SQL and experience with relational databases (MySQL, PostgreSQL) and NoSQL databases (HBase, MongoDB).
Solid understanding of Data Lake and Data Warehouse architectures.
Experience with SOLR or similar search technologies, with a focus on performance optimization.
Familiarity with common data integration tools and ETL/ELT processes.
Experience implementing DevOps practices and using automation tools (e.g., Ansible, Terraform).
Strong analytical skills with the ability to troubleshoot complex data-related issues.
Excellent communication skills and ability to work collaboratively in a team environment.