Terms of Employment: • W2 Contract-to-Hire, 6 Months • This position is predominantly remote. Candidates should be comfortable traveling to Northern Virginia roughly once per month. Travel can be expensed. • Candidates must be based in Maryland, Washington, DC, Virginia, West Virginia, Pennsylvania, Delaware, New Jersey, New York, North Carolina, Florida, or Texas.
Overview & Responsibilities • This role requires expertise in building and managing Cloudera clusters, with a focus on administration rather than development. The ideal candidate will have experience with Cloudera CDP Public Cloud v7.2.17 or higher and a strong understanding of big data services and ecosystem tools.
Key responsibilities include:
Cluster Setup and Management • Build and configure Cloudera clusters, including services like NiFi, SOLR, HBase, Kafka, Knox, and others in the cloud. • Set up High Availability for critical services such as Hue, Hive, HBase REST, SOLR, and Impala on the BDPaaS Platform. • Monitor and optimize cluster performance using Cloudera Manager. • Perform incremental updates, upgrades, and expansions to the Cloudera environment, ensuring it meets optimal specifications.
Automation and Monitoring • Develop and implement shell scripts for health checks and automated responses to service warnings or failures. • Design and implement big data pipelines and automated data flows using Python/R and NiFi. • Automate the project lifecycle, including data ingestion and processing workflows.
Collaboration and Troubleshooting • Work with teams such as Application Development, Security, and Platform Support to implement configuration changes for improved cluster performance. • Troubleshoot and resolve issues with Kerberos, TLS/SSL, and other workload-related challenges. • Provide expertise for use cases like analytics/ML, data science, cluster migration, and disaster recovery.
Security and Governance • Implement and manage comprehensive security policies across the Hadoop cluster using Ranger. • Support governance, data quality, and documentation efforts.
Database and Workflow Management • Access databases and metastore tables, writing queries in Hive and Impala using Hue. • Manage job workflows, monitor resource allocation with YARN, and handle data movement. • Support the Big Data/Hadoop databases throughout their lifecycle, including query optimization, performance tuning, and resolving integrity issues.
Required Skills & Experience • Cloudera CDP Public Cloud: Administration and optimization of services such as Hive, Spark, NiFi, and CDSW. • AWS Services: Proficient in managing AWS services (EC2, S3, EBS, EFS). • Apache Kafka • Strong skills in administration, troubleshooting, broker management, and integration with IBM MQ. • Proficient in Kafka Streams API, stream processing with KStreams & KTables, and topic/offset management. • Experience with Kafka ecosystem (Kafka Brokers, Connect, Zookeeper) in production environments. • Apache NiFi: Administration of flow management, registry server, controller service, and integrations with Kafka, HBase, and SOLR. • HBase: Administration, database management, and troubleshooting. • SOLR: Manage logging levels, shards, collections, and troubleshoot resource-intensive queries.