Krish – Data Scientist/ Machine learning – 6+ years – USC – Local to Bay
Krish
San Jose, CA
Consultant's Details: | Employer Details: |
Consultant Name: Krish S | Employer Name:Nextgen Technologies Inc |
Work Visa: USC | Contact Person: Arjun Sharma |
Location: Bay Area,CA | Email:arjun@nextgentechinc.com |
Relocation: Yes | Phone: +1 (408) 913-3046 Note: Please call after 09:00 AM PST |
PROFESSIONAL SUMMARY
Experienced Machine Learning Engineer/ Data Scientist with over 6 years in the industry, specializing in AI/ML solutions using GCP, Azure and AWS technologies for the past 5+ years, and over one year with Generative AI. Proven track record of deploying, consuming, and fine-tuning NLP model & LLMs such as Azure Open AI, Llama 2/3, and Hugging Face. Extensive expertise in designing and implementing scalable AI solutions. Strong proficiency in Python, TensorFlow, PyTorch, and scikit-learn, coupled with hands-on experience in deploying AI solutions on cloud platforms like Azure and AWS. Exceptional communication and leadership skills with a history of successful collaboration across cross-functional teams.
- Experience in using various packages in R and python – like NLP, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn, Beautiful Soup, Keras, PyTorch, and TensorFlow.
- Experience with data storage and management on AWS, including creating and maintaining data pipelines and ETL processes.
- Extensive ETL testing experience using Informatica (Power Centre/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager).
- Working experience with advanced Microsoft Excel functions, ETL (Extract, Transform and Load) of data into a data mart and Business Intelligence (BI) tools like Microsoft Power BI and Tableau (Data visualization and Analytics).
- Worked on Web Applications in Azure and Azure functions to pull data from API to blob Storage and SQL.
- Worked on deploying websites in Azure and used MVC framework on the backend. Experienced in Developing PL/SQL – Procedures, Functions, SQL Scripts and Database Triggers to populate the data by applying business logic. Extensive knowledge in developing PL/SQL codes using Oracle 11g.
- Worked on Azure SQL data warehouse and database development. Experienced user of PL/SQL for development of server-end program units as well as reusable codes using Procedures, and functions and worked on ad hoc change tickets.
- Experienced with creating data-driven dashboards using Tableau and Superset to provide valuable insights into the management.
- Experienced in using Vertex AI's integration with TensorFlow Extended (TFX) for creating robust and scalable ML pipelines.
TECHNICAL SKILLS
AI/ML Solutions | Azure Open AI, Llama 2/3, Mixtral |
Cloud Platforms | AWS, GCP, Azure AI services (Azure AI Search, Azure Open AI) |
Cloud Resources | Azure Databricks, AWS Glue, GCP BigQuery, Dataflow |
Programming Languages | Python, PySpark, C++, SQL, JavaScript, HTML, CSS |
ML Framework | TensorFlow, Keras, PyTorch, Langchain & Llama Index, scikit-learn, NLTK, Spacy, Pandas, NumPy, Hugging Face Transformers, NLTK, OpenCV |
ML Algorithms | Regression(Linear, Polynomial, Ridge, Lasso, Decision Tree Regressor, MLP, ANN), Classification(Logistic Regression, SVM, Decision Tree, Random Forest, Naïve Bayes, KNN, ANN, Ensembling techniques), Clustering(K-means, K-median, K-mode, Agglomerative Clustering. |
Frameworks | Flask, Django, Express, EJS |
Data Management | Vector Databases, embeddings, SQL, NoSQL |
AI/ML Ops Practices | Model Monitoring, optimization & deployment, fine-tuning |
Software Version Control & Documentation | Git, JIRA, Confluence |
Containerization & Orchestration Tools | Docker, Kubernetes, AirFlow & MLFlow |
Monitoring Tools | Power BI & Tableau |
Soft Skills | Excellent communications, leadership, collaboration, Project management |
PROFESSIONAL EXPERIENCE
NRG, Houston, TX(Remote) October 2024 – Present
GenAI Engineer
Responsibilities:
- Led the end-to-end development and fine-tuning of generative AI models for a customer care chatbot, transitioning from a reactive chatbot to a proactive agent application. Utilized Azure OpenAI and Google Vertex AI to customize and optimize LLMs su ch as GPT-4o and Gemini, specifically to handle domain-specific customer inquiries and assist human agents.
- Improved Agent Efficiency by 35%: The agent application was designed as a “copilot” for human agents. It provided real-time response suggestions, automated summary generation for complex cases, and intelligent routing, which enabled agents to handle more inquiries per hour and focus on high-value, complex customer issues.
- Architected and managed a comprehensive MLOps pipeline with Azure Databricks and MLflow for robust experiment tracking, model versioning, and automated deployment. This included containerizing models with Docker and orchestrating them on Azure Kubernetes Service (AKS) to ensure high availability and reliability for the critical customer-facing application.
- Implemented advanced model training techniques, including Supervised Fine-Tuning (SFT) on a proprietary dataset of historical customer interactions. This, combined with Reinforcement Learning with Human Feedback (RLHF), ensured the models' conversational style and responses were not only accurate but also aligned with specific brand guidelines and emotional intelligence requirements, leading to a significant increase in the First Contact Resolution (FCR) rate.
- Developed and integrated custom tools for the agent application using frameworks like LangChain, enabling the models to perform complex actions such as querying internal databases, fetching customer order history from SAP and Salesforce, and interacting with external APIs to provide comprehensive and immediate solutions.
- Conducted rigorous model evaluation using both automated metrics (Containment Rate, FCR, Average Handling Time) and human-in-the-loop validation. Monitored model performance, detected and mitigated concept drift, and provided actionable insights to guide iterative improvements that directly impacted agent performance metrics.
- Enforced ML system security and compliance by implementing role-based access control (RBAC) and audit logging, all within a robust GitOps and DevSecOps framework, ensuring the sensitive customer data handled by the agent application remained secure and compliant with regulatory standards.
Environment: Python, PyTorch, LangChain, Azure OpenAI, SAP, Salesforce, Pandas, ReactJS, Azure Redis, Azure Kubernetes, Google Vertex AI, Siebel, Azure Databricks, Databricks MLflow, Databricks Jobs, Databricks Workflows, Delta Lake, Databricks Model Serving, Unity Catalog, PySpark
Cymer, San Diego, CA April 2022 – May 2024
AI/ML Engineer (GenAI)
Responsibilities
- Led the design and implementation of AI models and algorithms to address complex business challenges, resulting in a 20% improvement in operational efficiency.
- Collaborated with product managers to identify AI opportunities and integrated AI solutions into existing products, enhancing overall user experience.
- Developed chatbot to summarize various documents within Cymer by applying RAG technique using Langchain, LlamaIndex and vector database (CosmosDB).
- Natural Language Processing (NLP) such as LLM for sentiment analysis, entity recognition, Topic Modeling and Text summarization was done using advanced python library such as NLTK, TextBlob, Spacy and Gensim.
- Demonstrated expertise in AI-specific utilities, including proficiency in ChatGPT, Hugging Face Transformers, and associated data analysis methods, highlighting a comprehensive understanding of advanced artificial intelligence tools and techniques.
- Expertise knowledge on AI/ML application lifecycles and workflows from data ingestion to model deployment in cloud environment like Azure, AWS & GCP.
- Managed data pipeline using RDF Graphs, primitives from Apache Beam & Apache Nifi to build transparent and manageable data flow on GCP Dataflow, Google BigQuery platform for a practically fully automated solution alleviating daily routine.
- Performed data cleaning and feature selection using MLlib package in PySpark, working with deep learning frameworks such as Caffe with considerations for MLOps.
- Configured GitLab CI/CD pipelines to automate the building, testing, and deployment of applications to AKS, improving efficiency and reducing manual intervention.
- Deployed and fine-tuned LLM models including Azure Open AI and Llama 2/3 to create a chatbot to find relevant content from organization documents (process documents). This chatbot improved processes and reduced content search time with better summary.
- Integrated BERT (Bidirectional Encoder Representations from Transformers) models into natural language processing workflows to leverage contextualized word embeddings and capture semantic relationships in text data.
- Mentor Data Scientist/ ML Engineer to get up to speed to start contributing into the client project.
- Participate in customer communication to discuss challenges and provide development status update.
- Collaborate with cross functional team to gather requirements and explore AI solutions for their problems by developing proof of concepts (PoC) utilizing latest technologies like Azure open AI and Azure Search.
Environment: Python, Tableau, Power BI, Machine Learning (Keras, PyTorch), Generative AI. Deep Learning, Natural Language Processing, Cognitive Search, Data Analysis (Pandas, NumPy), Vertex AI, Agile Methodologies, SCRUM Process, GCP, GitLab, Databricks, PySpark, BigQuery, Dataflow, Apache Beam, Apache Nifi, Informatica, Snowflake.
Morgan Stanley, Menlo Park, CA Jun 2020-Mar 2022
Sr. ML Engineer
Responsibilities:
- Utilized Pandas and NumPy for data cleaning, feature engineering, and normalization to prepare datasets for modeling.
- Applied Supervised Machine Learning Algorithms for the predictive modelling to tackle various types of business problems like risk assessment, investment forecasting and many more.
- Designed and implemented predictive models using TensorFlow and PyTorch, experimenting with various deep learning architectures including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)to handle sequential data effectively.
- Used Python to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Built a data warehouse by utilizing ETL processes with Databrick and Dataiku gathering all the business data to envision AI solutions for the collected data.
- Derived data from relational databases to perform complex data manipulations and conducted extensive data checks to ensure data quality. Performed Data wrangling to clean, transform and reshape the data utilizing NumPy and Pandas library.
- Implemented model versioning and A/B testing strategies on Databricks for evaluating model performance and conducting experiments to improve model accuracy and effectiveness.
- Led the design and implementation of a customer segmentation project using AWS S3 for data storage, Python, and Pandas for data manipulation, applying K-means clustering in Scikit-learn to segment customers, enhancing marketing strategies.
- Developed a GAN-based model to generate high-quality synthetic images for training a computer vision system, significantly improving its accuracy and robustness.
- Applied advanced natural language processing (NLP) methodologies to extract insights from unstructured data sources.
- Integrate AI/ML Model and APIs into production AWS SageMaker.
- Designed and developed natural language processing (NLP) pipelines to enhance search relevance and user experience by integrating semantic search capabilities.
- Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongD3 connector for Hadoop.
- Conducted performance testing and benchmarking of cognitive search systems to identify bottlenecks and optimize system scalability and response times.
- Dealt with large amount of cloud data storage to Identify faces of same person from Image data storage and faces with similar features using NumPy, Seaborn, PIL, matplotlib, Pandas, OpenCV and Sci-kit learn Libraries.
- Create a Flask API to process input failure log files, generate summarized content, and integrate this with a Large Language Model (LLM) to produce concise text summaries.
- Developed the different Python workflows triggered by events from other systems. Collected, analyzed, and interpreted the raw data from various clients’ REST APIs.
- Created interactive dashboards in Tableau that provide a high-level overview of transaction activities and fraud detection metrics. Used Tableau’s built-in statistical tools to perform analyses like correlation studies, regression analysis, or time-series forecasting.
Environment: Python, R, Tableau, Power BI, Machine Learning (Scikit-Learn, Keras, PyTorch), Generative AI. Deep Learning, Natural Language Processing, Cognitive Search, Data Analysis (Pandas, NumPy), Vertex AI, SQL, NoSQL (MySQL, PostgreSQL), Django Web Framework, HTML, XHTML, AJAX, CSS, JavaScript, XML, JSON, Flask, Agile Methodologies, SCRUM Process
Arocom Solution, India Jun 2018 – May 2020
Data Analyst/ML Engineer
Responsibilities:
- Architected and Implemented deep learning workflows to predict protein tertiary and quaternary structures, a critical step for understanding protein function. The models utilize amino acid sequences and evolutionary signals (multiple sequence alignments) as inputs to model the spatial relationships between amino acids.
- Engineered custom stacked neural network architectures that employ affine transformations and non-linear activation functions to simulate the complex, non-linear energy landscape governing protein folding behaviors. This approach helps the models learn the physical and chemical constraints that determine a protein's final 3D shape.
- Integrated and curated data from diverse biological repositories, including the Protein Data Bank (PDB) for known structures and UniProt for sequence information. Utilized HHblits to generate multiple sequence alignments, which provide crucial evolutionary context. Applied state-of-the-art models like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) based models to learn from this data and enhance prediction accuracy.
- Collaborated with biologists to perform model interpretability analysis, a key step in ensuring the predictions are biologically plausible. This feedback loop allowed for iterative refinement of protein designs, improving their functionality under real-world biological pressures.
- Automated the end-to-end machine learning pipeline, from data preprocessing (e.g., featurization of sequences) to model training and evaluation (e.g., using metrics like RMSD or GDT_TS), and final structure visualization to communicate results effectively.
Environment: Power BI, XGBoost, ETL, Tableau, Numpy, Scipy, NLTK, PLSQL, Mlib, Mflow, TensorFlow, Python.
EDUCATION
Bachelor of Technology on Computer Science and Engineering with Specialization in AIML in Vellore Institute of Technology, India
CERTIFICATION
To unsubscribe from future emails or to update your email preferences click here