Overview
Typhoon is a leading initiative that advancing AI and Large Language Models for Thai. As a founding leader at SCB 10X, I helped establish it as nowaday Thailand’s top AI research lab.
The project spans the full AI development cycle - from research on multimodal and reasoning adaption to application — aiming to position Thailand as a technology creator, not just a user.
Key Achievements & Impact
- Pioneering Thai LLMs: Authored and led the development of the first comprehensive Thai LLM & multimodal family, Typhoon, which became the most competitive open-source Thai LLM.
- Widespread Adoption: Typhoon models have achieved over 100,000 downloads on Hugging Face and processed more than 30 million API requests, at opentyphoon.ai.
- Production Use: The first LLM chosen by SCBx for production use, highlighting its performance and cost.
- Industry Recognition: The Typhoon Lab, received the Techsauce Innovation Award 2024.
- Open Research: All major Typhoon models and research papers are open-sourced, fostering collaboration and advancing the field with in Thailand, SEA and globally.
Core Typhoon Lab Works
1. Foundational Models (Typhoon)
- Developed the initial family of Thai Large Language Models, establishing a strong baseline for Thai NLP.
- Designed a Thai knowledge evaluation system from scratch for LLMs.
- Implemented a continuous pretraining pipeline from crawling to filtering and dataset creation to adapt High resource language-focused LLMs to Thai effectively.
2. Multimodal Capabilities (Typhoon2)
- Typhoon2: Extended foundational models to handle text, vision, and audio inputs/outputs, creating one of the first multimodal LLMs in Southeast Asia.
3. Reasoning Models (Typhoon T1 & Typhoon R1)
- Typhoon R1: Developed the most advanced reasoning LLM tailored specifically for Thai, by leverage strong English centric LLM and combine with
- Leveraged novel model merging techniques to efficiently adapt language-specific LLMs into reasoning models.
4. As a Lead AI Scientist and Founding Member of Typhoon Team
I also, lead, encorage, advice and shaping team who also built
- Typhoon Audio2: Developed one of the first end-to-end speech LLMs in SEA, enhancing capabilities in audio processing and understanding for low-resource languages.
- Typhoon T1: Created Southeast Asia’s first dedicated reasoning model for the Thai language, leverage scaling at test-time paradigm to addressing complex logical tasks.
- Typhoon OCR: A OCR focus model, compatitive with top proprietary VLM such as OpenAI gpt4o and Gemini flash in OCR task.
- Collaboration: Collaboration with SEA region such as SEA AI LAB on Sealion2 and AI-SG on Sealion and Project Aquarium. Also with stanford for ThaiHelm, Talk-Arena and multiple work.
Key Publications:
- Typhoon: Thai Large Language Models
- Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models (arXiv:2412.13702)
- Typhoon T1: An Open Thai Reasoning Model
- An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging
- Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models