Overview
At a stealth LLM venture, I was a core AI engineer on a global B2C product that scaled to 1M+ users, shaping & implementing the design and development of real-time, LLM-powered companion features.
Developed scalable infrastructure to support 2,000+ concurrent users and 20 LLM calls/sec, implementing custom workarounds on open-source LLMs during the performance-limited Llama 2 era (2023).
Key Responsibilities & Achievements
- Product Collaboration: Shaping the features, oppotunity and implementation on the product.
- LLM Pipelines: Developed complex LLM pipelines for interactive, real-time chat features and companion applications. This included creating innovative workarounds for the limitations of available open-source LLMs to meet product requirements.
- LLM Deployment: From serverless to self-hosting to self-optimization to using an external inference provider, reduce the cost by 10x.
- Scalability: Designed systems to support 2,000+ concurrent users.
Impact
Contributed to the successful launch and scaling of a global B2C product with innovative LLM-powered features, serving more than 1M users, and 2,000+ concurrent users while lower the cost at 10x times.