- Constant collaboration: Work closely with data scientists, engineers, and other stakeholders to understand requirements, provide technical guidance, and communicate model performance and limitations
- Data management and curation: Manage and curate large datasets for model training, evaluation, and testing, ensuring data quality, integrity, and compliance.
- Model deployment and integration: Deploy trained models to on-premises infrastructure
- Model serving and inference: Optimize and maintain model serving systems to support real-time inference and batch processing
Within your first 6 months:
- Understand the current LLMOps practices, techniques, and tools used to deploy, monitor, and maintain large language models efficiently.
- Collaborate with data scientists, DevOps engineers, and IT professionals to streamline LLM operations.
After 6 months:
- Oversee data preparation and prompt engineering efforts to improve model performance.
- Develop and maintain model review and governance processes to ensure compliance and quality.
- Set up and manage model monitoring systems with human feedback loops
- Optimize LLM pipelines for efficiency, scalability, and risk reduction
- Bachelor's in Computer Science, Data Science, or a related field and 12+ years of experience in full-stack software development, with at least two years focused on GenAI and LLM applications OR Master's Degree and 10+ years of experience.
- Strong programming skills in Python and experience with ML frameworks such as PyTorch or TensorFlow
- Experience post-training LLMs for instruction tuning and preference alignment
- Familiarity with MLOps platforms (e.g. MLflow, Weights & Biases)
- Knowledge of vector databases and similarity search techniques
- Strong problem-solving and analytical skills
- Excellent communication and collaboration abilities
- Ability to work in a fast-paced, cross-functional environment
- Passion for staying current with LLM research and industry trends
Preferred Qualifications:
- 3-5 years of experience in machine learning operations, with at least 2 years focused on LLMs
- Knowledge of LangChain, LlamaIndex or similar frameworks for building LLM applications
- Experience with model serving (vLLM, Triton, llama.cpp, etc) and API integration for LLMs
- Understanding of human-in-the-loop feedback systems for LLM evaluation
- Experience with agentic systems development frameworks such as AutoGen and CrewAI.
- Medical, dental and vision plans
- 401(K) participation including company matching
- Employee Stock Purchase Program (ESPP)
- Employee Assistance Program (EAP)
- Company paid holidays
- Paid sick leave and vacation time
- The company follows all applicable laws for Paid Family Leave and other leaves of absence.