EXPERIENCE
In this role, I lead the MLOps team in a project to create a unified RAG platform for the entire bank. My work combines technical leadership, model optimization, and interaction with business stakeholders to integrate new solutions.
Key Achievements:
- Led a team (Data Scientists, Data Engineers, ML Engineers) as Tech Lead/Team Lead, successfully designing and implementing a unified RAG (Retrieval-Augmented Generation) platform across the entire bank
- Optimized LLM model inference, resulting in a 40% performance improvement. This reduced the response time of the entire RAG service by half
- Ensured high performance and reliability of the service, maintaining SLA at 5 seconds under load of up to 250,000 requests per day
- Developed and implemented production-ready MLOps pipelines for LLM model deployment using KServe and vLLM
- Resolved infrastructure constraints by building vLLM from source with flash-attention support for legacy CUDA (11.8)
- Implemented a unified gateway (HiGress) for all LLM models and MCP (Model Context Protocol), centralizing management and access
Core Responsibilities:
- Designing architecture and participating in RAG system implementation
- Deploying and maintaining LLM inference infrastructure in new clusters based on KServe, including troubleshooting kNative and Istio components
- Client interaction: conducting meetings, developing connection schemes for new clients to RAG service, and effort estimation
- Creating unified pipelines for deploying various non-model services across multiple environments (clusters), improving release speed and consistency
- Research and implementation of best practices for optimizing and accelerating LLM model inference
Tech Stack: Kubernetes, KServe, vLLM, RAG, ArgoCD, Istio, Python, Jenkins
- Developed and maintained a machine learning model deployment platform, managing 100+ ML models as part of a specialized ML team
- Orchestrated database operations, including table creation and structure optimization for enhanced performance
- Led critical aspects of a large-scale infrastructure migration, including server relocation and system upgrades
- Authored and implemented Lua scripts for Tarantool Cartridge cluster during application migration
- Enhanced a Golang-based database emulator for Clickhouse, improving integration testing capabilities
- Streamlined Python environment migration through RPM packaging and GitLab CI pipeline development
- Developed and deployed a chat-bot application utilizing OpenAI API, Langchain, and RAG for custom report generation
- Deployed applications in Kubernetes (k8s) environments, ensuring scalability and efficient container orchestration
- Utilized Puppet for automated server deployment and configuration management
Tech Stack: Python, RAG, Lua, Golang, Clickhouse, Python, RPM, GitLab CI, OpenAI API, Langchain, Kubernetes, Puppet
- DWH maintenance
- Modeling new database objects from non-relational to relational form
- Implementing Grafana and Prometheus to track metrics about DAGs execution
- Creating and maintaining ETL pipelines to automate CRM interactions with customers through various communication channels (email, SMS, push notifications, etc)
- Using asynchrony to speed up query execution
- API integration with external systems
Tech Stack: Python, DWH, Apache Airflow, Apache Kafka, PostgreSQL
- Developed data pipelines in GCP for financial data processing, including encryption and anonymization in PCI environment
- Built backend services using FastAPI and deployed them to Cloud Run and Cloud Functions
- Created and maintained data analytics protocols, standards and documentation
- Developed web application using Django and Plotly Dash for IT job market trend analysis
- Implemented ETL pipelines using Apache Airflow for data processing
- Worked with technologies: GKE, Cloud PubSub, BigQuery, Cloud Build, PostgreSQL, Docker, Redis
Tech Stack: GCP, FastAPI, Django, Plotly Dash, Apache Airflow, GKE, Cloud PubSub, BigQuery, PostgreSQL, Docker, Redis