Collaboration: Collaborate with cross-functional teams including data scientists, machine learning engineers, and domain experts to understand data requirements and objectives.
Data Strategy: Design and develop data collection strategies for large-scale image, video, and audio datasets, considering factors such as diversity, quality, and representativeness.
Pipeline Development: Design, develop, and maintain robust data pipelines to collect, store, and process large volumes of data efficiently and reliably.
Automation: Develop and implement automation tools to streamline data collection, processing, and curation tasks.
Infrastructure Management: Oversee the infrastructure required for data storage and processing, ensuring scalability and performance.
Data Curation: Curate and manage datasets, ensuring they are clean, well-organized, and suitable for training and testing ML models.
Data Quality Assurance: Implement data validation and quality assurance processes to ensure the integrity and accuracy of datasets.
Documentation: Document processes, methodologies, and best practices related to data collection and management.
Innovation: Stay up-to-date with the latest advancements in data collection, machine learning, and related fields, contributing insights and ideas to the team.
Qualifications:
Education: Bachelor's or Master's degree in Computer Science, or a related field.
Experience: 3+ years of experience in designing and implementing data collection pipelines for image, video, or audio datasets.
Skills:
Strong programming skills in languages such as Python or Java.
Familiarity with machine learning concepts and frameworks (e.g., TensorFlow, PyTorch).
Experience with data preprocessing, cleaning, and transformation techniques.
Proficiency in using databases and data storage solutions (e.g., SQL, NoSQL, Hadoop).
Knowledge of cloud computing platforms (e.g., AWS, Azure, Google Cloud) and their services.
Excellent problem-solving skills and attention to detail.
Effective communication skills with the ability to work collaboratively in a team environment.
Experience with distributed computing and big data processing is a plus.
Background in computer vision, natural language processing, or audio processing is a plus.
Compensation and Benefits:
Competitive salary and stock options
Comprehensive health, dental, and vision insurance
Generous paid time off and company holidays
Professional development opportunities
Hybrid work environment, with 3 days per week expected in office in Atlanta, GA or Austin, TX