Alexey Fateev
Technical Lead / AI Platform & LLMOps
Summary
Technical lead with 6+ years in IT and 3+ years in MLOps/LLMOps, focused on taking AI/LLM solutions from requirements and technical design to production rollout and operations. Combines hands-on AI infrastructure expertise with delivery leadership across Data Science, MLOps, Data Engineering, analytics, and business stakeholders. Specialize in building and optimizing RAG platforms, AI agents, and LLM inference infrastructure.
- Production experience with RAG platforms, AI agents, LLM inference, model serving, release processes, observability, and incident handling.
- Able to translate business/product needs into technical plans, clarify ambiguous requirements, surface delivery risks, and unblock cross-functional teams.
- Active independent researcher in LLM inference and public speaker on RAG, AI platforms, and LLM infrastructure.
Experience
Technical Lead / AI Platform & LLMOps
In this role, I lead the MLOps team in a project to create a unified RAG platform for the entire bank. My work combines technical leadership, model optimization, and interaction with business stakeholders to integrate new solutions.
Key Achievements:
- Leading a cross-functional team of 15+ professionals (Data Scientists, ML Engineers, Data Engineers, System Analysts) as Tech Lead, driving technical strategy and execution across multiple AI initiatives
- Successfully delivered 5 production-ready RAG-based products and AI Agent solutions, serving the entire bank's AI infrastructure needs
- Architected and implemented from scratch an A/B testing platform for RAG products leveraging Istio Service Mesh and Argo Rollouts, enabling data-driven product optimization
- Designed and deployed canary deployment strategy from the ground up, significantly reducing production deployment risks and enabling safer rollouts
- Established unified technical and infrastructure layer across all AI products, ensuring consistency, scalability, and maintainability
- Optimized LLM model inference, resulting in a 40% performance improvement. This reduced the response time of the entire RAG service by half
- Ensured high performance and reliability of the service, maintaining SLA at 5 seconds under load of up to 250,000 requests per day
- Developed and implemented production-ready MLOps pipelines for LLM model deployment using KServe and vLLM
- Resolved infrastructure constraints by building vLLM from source with flash-attention support for legacy CUDA (11.8)
- Implemented a unified gateway (HiGress) for all LLM models and MCP (Model Context Protocol), centralizing management and access
Core Responsibilities:
- Designing architecture and participating in RAG system implementation
- Deploying and maintaining LLM inference infrastructure in new clusters based on KServe, including troubleshooting kNative and Istio components
- Client interaction: conducting meetings, developing connection schemes for new clients to RAG service, and effort estimation
- Creating unified pipelines for deploying various non-model services across multiple environments (clusters), improving release speed and consistency
- Research and implementation of best practices for optimizing and accelerating LLM model inference
MLOps Engineer
- Developed and maintained a machine learning model deployment platform, managing 100+ ML models as part of a specialized ML team
- Orchestrated database operations, including table creation and structure optimization for enhanced performance
- Led critical aspects of a large-scale infrastructure migration, including server relocation and system upgrades
- Authored and implemented Lua scripts for Tarantool Cartridge cluster during application migration
- Enhanced a Golang-based database emulator for Clickhouse, improving integration testing capabilities
- Streamlined Python environment migration through RPM packaging and GitLab CI pipeline development
- Developed and deployed a chat-bot application utilizing OpenAI API, Langchain, and RAG for custom report generation
- Deployed applications in Kubernetes (k8s) environments, ensuring scalability and efficient container orchestration
- Utilized Puppet for automated server deployment and configuration management
Data Engineer
- DWH maintenance
- Modeling new database objects from non-relational to relational form
- Implementing Grafana and Prometheus to track metrics about DAGs execution
- Creating and maintaining ETL pipelines to automate CRM interactions with customers through various communication channels (email, SMS, push notifications, etc)
- Using asynchrony to speed up query execution
- API integration with external systems
Data Engineer
- Developed data pipelines in GCP for financial data processing, including encryption and anonymization in PCI environment
- Built backend services using FastAPI and deployed them to Cloud Run and Cloud Functions
- Created and maintained data analytics protocols, standards and documentation
- Developed web application using Django and Plotly Dash for IT job market trend analysis
- Implemented ETL pipelines using Apache Airflow for data processing
- Worked with technologies: GKE, Cloud PubSub, BigQuery, Cloud Build, PostgreSQL, Docker, Redis
Independent Research
LLM Inference Optimization
Building and maintaining a personal 4×RTX 3090 inference server (96GB VRAM). Experimenting with emerging inference optimization techniques, tracking industry trends, and publishing benchmarks and findings on Telegram.
- Benchmarked Prefill/Decode disaggregation (SGLang + Mooncake), achieving 5× lower P99 inter-token latency vs unified serving
- Tested DFlash speculative decoding with Qwen3.5-27B, reaching 90 tok/s single-user (1.6× over baseline)
- Evaluated ultra-low-bit dynamic quantization (Unsloth IQ2_XXS) for running 256B+ parameter models on consumer hardware
- Exploring Tensor/Pipeline Parallelism tradeoffs on PCIe-connected multi-GPU setups without NVLink
Public Speaking
ИИ-агенты в крупном банке — опыт, эффекты, затраты, ошибки
From RAG for operators to a RAG platform for a major bank
Alfa-Bank case — piloting RAG for 10,000 employees
Certifications
Hard&Soft Skills · Certificate of completion for the Technical Leadership program