
c2c requirements
Role: AIOps MLOps Architect
Location: Fremont, CA
Duration: Long Term
Job Description:
Strong background in Python, API development, Large Language Models (LLM) concepts, ML Ops, Azure Cloud and AI operations with 10-15 years of experience working on advanced AI/ML systems, cloud infrastructure, and API integrations, with a focus on operationalizing AI models and maintaining robust systems for AI-driven applications. This role requires a combination of technical expertise in cloud computing, machine learning, and software engineering. Collaborate with IT operations and business teams to support business user issues, requests, Production support and deployments; advocate best practices and recommend technical solutions for improvements in usability of application and systems performance
Key Responsibilities:
• Technical Operations: Review, Implement and support enterprise-level AI platforms and services to drive IT operation excellence. Ensuring that new use cases are onboarded smoothly and operationalized
• Optimization: Analyze business processes to identify areas for automation and work with business stakeholders and IT teams to determine requirements and design software bots to reduce operational toil.
• AI Ops & Model Deployment: Lead the operationalization and deployment of AI/ML models into production environments, ensuring they are highly available, scalable, and performant. Implement and monitor Continuous Integration (CI) and Continuous Deployment (CD) pipelines.
• Python Development: Design and develop Python-based solutions for automating and managing the lifecycle of AI/ML models, including data ingestion, model training, and real-time prediction workflows.
• API Integration: Build and maintain robust APIs for model serving and integration with other systems. Ensure seamless communication between models, data pipelines, and consumer applications.
• LLM Concepts and Implementation: Apply knowledge of Large Language Models (LLMs) to develop AI-driven applications and services, ensuring models are optimized and performing efficiently in production.
• ML Ops: Implement and maintain Machine Learning Operations (ML Ops) practices for version control, monitoring, logging, and debugging of AI/ML models in production. Support model retraining, versioning, and A/B testing.
• Cloud Infrastructure: Leverage Azure Cloud services for hosting and scaling AI applications, ensuring security, compliance, and performance. Implement infrastructure as code (IaC) using tools like Azure DevOps.
• Collaboration: Work closely with backend engineers, data engineers/developers, infrastructure engineers , operational SMEs and business stakeholders to tackle evolving challenges in the field of AI/ML to ensure AI solutions meet business requirements and performance benchmarks.
• Monitoring & Optimization: Continuously monitor the performance of deployed AI models and optimize them for efficiency, cost-effectiveness, and accuracy. Implement alerting and logging mechanisms by scripts or through observability solution.
• Documentation & Best Practices: Document AI Ops processes, Use cases, tools, and workflows. Establish and enforce best practices for managing AI models in production environments.
To apply for this job email your details to suresh.g@nextogen.com