- Write Extract, Transform, and Load (ETL) logic to automate data collection and reporting processes/pipelines, including data quality and monitoring.
- Build complex and reliable data pipelines in SQL to serve as the backbone for input and output to several ML models. Additionally, deploy SQL workflows in conjunction with Python code.
- Work with human data leads and modeling leads to define human data creation instructions, rubric and then evaluate the coding-related data created by humans.
- Data Analysis, Meetings with Rater pool leads, and Modeling leads, Status reporting.
- Define data requirements based on a deep understanding of modeling team goals and analysis of model loss patterns.
- Benchmark vendor data quality against competitor models.
- Use quantitative techniques to analyze vendor-produced data, ensuring high quality and driving rater pool optimization.
- Audit datasets for quality issues and develop tools to accelerate qualitative analysis.
- Apply modeling and experimentation techniques to demonstrate data impact and identify blind spots.
- Use your knowledge of data processing, technical systems, and project management to enhance our existing data and machine learning platforms for internal use cases.
- Collaborate with data scientists to drive operational efficiency and make our machine learning data workflows more reliable.
- Experience with SQL and Python Coding.
- Understanding of LLM capabilities and limitations.
- Understanding of LLM processes like Pre-training, RLHF, SFT, Evals, etc.
- Experience with training/tuning models, prompt engineering and evaluating LLM outputs is a plus.
- Experience writing, maintaining, and monitoring both streaming and batch ETLs operating on a variety of structured and unstructured sources.
- Familiarity with Machine Learning libraries (such as TensorFlow, Scikit-learn, Keras) or exploratory/statistical analysis using Python, R.
- Experience in software development life cycle.
- Experience with ML / AI is a plus.
- Prompt engineering & Writing prompts for GenAI is a plus.
- Advanced ability to write English prose.
- Bachelorโs or higher in CS or related field.
TBD