We are seeking a skilled and motivated Software Developer with a focus on Big Data Engineering to join our dynamic team. In this role, you will work with cutting-edge technologies in the Hadoop Ecosystem to develop, maintain, and optimize large-scale data processing systems. As part of our engineering team, you will be responsible for building robust data pipelines, ensuring data quality, and integrating big data solutions into our systems.
Key Responsibilities:
Data Pipeline Development: Design, develop, and maintain scalable data pipelines using Hadoop, Spark, and related technologies to process and analyze large datasets.
Big Data Systems: Work with Hadoop Ecosystem tools such as HDFS, Hive, HBase, and Pig to manage and optimize data storage, retrieval, and processing.
ETL Design and Implementation: Build and optimize Extract, Transform, Load (ETL) workflows for high-volume data integration from various sources.
Performance Optimization: Analyze and tune the performance of large-scale data processing workflows and jobs in a distributed computing environment.
Collaboration: Collaborate with cross-functional teams, including Data Scientists, Analysts, and Product Managers, to gather requirements and deliver optimal data solutions.
Data Quality & Security: Ensure data integrity, quality, and security by implementing robust validation checks, monitoring, and logging mechanisms.
Innovation: Stay updated with the latest trends in Big Data technologies and continuously improve processes and systems with innovative solutions.
Required Skills & Qualifications:
Experience: experience working with Big Data technologies, particularly within the Hadoop Ecosystem (HDFS, Hive, Spark, HBase, Pig, Flume, etc.).
Programming Skills: Strong programming experience in Java, Scala, or Python for data processing.
Big Data Tools: Hands-on experience with Apache Spark, Hive, HBase, YARN, MapReduce, Kafka, and Flume.
ETL Frameworks: Experience in building ETL pipelines and working with tools like Apache Nifi, Talend, or similar.
Data Modeling & Warehousing: Knowledge of data warehousing, data modeling, and SQL (experience with NoSQL databases is a plus).
Cloud Experience: Familiarity with cloud platforms such as AWS, Azure, or Google Cloud Platform and big data services like EMR, DataProc, etc.
Data Integration: Experience integrating data from multiple sources and working with structured and unstructured data.
Version Control & CI/CD: Familiarity with version control tools like Git and continuous integration/continuous deployment practices.
Problem-Solving & Analytical Skills: Strong troubleshooting skills and ability to work on complex data problems in a fast-paced environment.
Preferred Skills & Qualifications:
Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.
Experience working with Apache Kafka for real-time data streaming.
Understanding of machine learning concepts and integrating them into data pipelines (preferred but not required).
Education:
Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field. Master's degree is a plus.