EXPERIENCE
- Led RAG (Retrieval-Augmented Generation) team consisting of Data Scientists, Data Engineers, and ML Engineers
- Participated in RAG architecture design and implementation
- Research and implementation of best practices for optimizing and accelerating LLM model inference, which led to a 40% inference speedup
- Built vLLM from source for legacy CUDA (11.8) with flash-attention support, addressing infrastructure constraints
- Deployed and troubleshooted KServe across multiple clusters, resolving issues in kNative and Istio components
- Developed production pipelines for LLM model deployment using KServe and vLLM
- Created a unified pipeline for deploying various non-model services across multiple environments (clusters)
Tech Stack: Kubernetes, KServe, vLLM, RAG, ArgoCD, Istio, Python, Jenkins
- Developed and maintained a machine learning model deployment platform, managing 100+ ML models as part of a specialized ML team
- Orchestrated database operations, including table creation and structure optimization for enhanced performance
- Led critical aspects of a large-scale infrastructure migration, including server relocation and system upgrades
- Authored and implemented Lua scripts for Tarantool Cartridge cluster during application migration
- Enhanced a Golang-based database emulator for Clickhouse, improving integration testing capabilities
- Streamlined Python environment migration through RPM packaging and GitLab CI pipeline development
- Developed and deployed a chat-bot application utilizing OpenAI API, Langchain, and RAG for custom report generation
- Deployed applications in Kubernetes (k8s) environments, ensuring scalability and efficient container orchestration
- Utilized Puppet for automated server deployment and configuration management
Tech Stack: Python, RAG, Lua, Golang, Clickhouse, Python, RPM, GitLab CI, OpenAI API, Langchain, Kubernetes, Puppet
- DWH maintenance
- Modeling new database objects from non-relational to relational form
- Implementing Grafana and Prometheus to track metrics about DAGs execution
- Creating and maintaining ETL pipelines to automate CRM interactions with customers through various communication channels (email, SMS, push notifications, etc)
- Using asynchrony to speed up query execution
- API integration with external systems
Tech Stack: Python, DWH, Apache Airflow, Apache Kafka, PostgreSQL
- Developed data pipelines in GCP for financial data processing, including encryption and anonymization in PCI environment
- Built backend services using FastAPI and deployed them to Cloud Run and Cloud Functions
- Created and maintained data analytics protocols, standards and documentation
- Developed web application using Django and Plotly Dash for IT job market trend analysis
- Implemented ETL pipelines using Apache Airflow for data processing
- Worked with technologies: GKE, Cloud PubSub, BigQuery, Cloud Build, PostgreSQL, Docker, Redis
Tech Stack: GCP, FastAPI, Django, Plotly Dash, Apache Airflow, GKE, Cloud PubSub, BigQuery, PostgreSQL, Docker, Redis