Role: Cloud Data Engineer (Python, Spark, Synapse, Azure storage) for ETL and Data Processing in AI/ML
Location: Minneapolis, MN or Irving, TX (3+ days in office/Week)
Yrs. of experience: 10+ Yrs.
Job Description:
Position Overview
We are seeking a Lead Developer with advanced skills in Python, Apache Spark, Azure Synapse, and Azure Data Engineering services to develop and
manage ETL pipelines and data processing solutions that support AI/ML initiatives. The ideal candidate will have strong expertise in Azure Cloud,
including SQL, Data Factory, SQL Pools, Spark Pools, and Data Warehousing. A background in CI/CD processes, cross-functional collaboration, and
methodologies is essential. In addition, knowledge of Data Science, Java, Kubernetes, Azure Data Lake Storage (ADLS) Gen2, and API development is required.
Key Responsibilities
1. ETL and Data Pipeline Development
- Design, develop, and optimize scalable ETL processes using Python, Apache Spark, and Azure Synapse.
- Build and manage Azure Data Factory pipelines to orchestrate complex data workflows.
- Use SQL Pools and Spark Pools within Synapse to manage and process large datasets efficiently.
- Implement Data Warehousing solutions using Azure Synapse Analytics to provide structured and queryable data layers.
- Ensure the data platform supports real-time and batch AI/ML data requirements.
2. Azure Cloud Development & CI/CD Deployment
- Build, configure, and manage CI/CD pipelines on Azure DevOps for ETL and data processing tasks.
- Automate infrastructure provisioning, testing, and deployment using Infrastructure-as-Code (IaC) tools like ARM templates or Terraform.
- Optimize Azure Data Lake Storage (ADLS Gen2) to store and manage raw and processed data efficiently, ensuring proper access control and data security.
3. Cross-Functional Collaboration
- Collaborate with Data Scientists, Data Engineers, ML Engineers, and Business Analysts to translate business requirements into data solutions.
- Work with the DevOps and Security teams to ensure smooth and secure deployment of applications and pipelines.
- Act as the technical lead in designing, developing, and implementing data solutions, mentoring junior team members.
4. Data Engineering and API Development
- Develop and integrate with external and internal APIs for data ingestion and data exchange.
- Build, test, and deploy RESTful APIs for secure data access.
- Use Kubernetes for containerizing and deploying data processing applications.
- Manage data storage and transformation to support advanced Data Science and AI/ML models.
5. Agile Project Management
- Participate in and lead Agile ceremonies, such as sprint planning, daily stand-ups, and retrospectives.
- Collaborate with cross-functional teams in iterative development to ensure high-quality and timely feature delivery.
- Adapt to changing project priorities and business needs in an Agile environment.
Required Skills and Qualifications
1. Technical Skills:
- Expertise in Python and Apache Spark for large-scale data processing.
- Strong experience in Azure Synapse Analytics, including SQL Pools and Spark Pools.
- Advanced proficiency in Azure Data Factory for ETL pipeline orchestration and management.
- Knowledge of Data Warehousing principles, with hands-on experience building solutions on Azure.
- Experience with SQL, including complex queries, optimization, and performance tuning.
- Familiarity with CI/CD tools like Azure DevOps and managing infrastructure in Azure Cloud.
- Experience in Java for API integration and microservices architecture.
- Hands-on knowledge of Kubernetes for containerized data processing environments.
- Proficiency in working with Azure Data Lake Storage (ADLS) Gen2 for data storage and management.
- Experience working with APIs (REST, SOAP) and building API-based data integrations.
2. Agile and Cross-Functional Skills:
- Experience working in an Agile environment, using Scrum or Kanban.
- Ability to lead, mentor, and coach junior developers in the team.
- Strong collaboration skills to work with data scientists, analysts, and cross-functional teams to deliver end-to-end data solutions.
3. Behavioral Skills:
- Strong analytical and problem-solving skills with a passion for data-driven solutions.
- Excellent communication and presentation skills, able to explain complex technical concepts to non-technical stakeholders.
- Ability to work in a fast-paced, dynamic environment with changing priorities.
- Self-motivated and results-oriented with attention to detail.
Preferred Qualifications
- Azure certifications in data engineering or cloud architecture.
- Experience deploying AI/ML models on cloud platforms.
- Familiarity with Data Governance best practices, ensuring compliance with data privacy regulations