3 min read
Typhoon: Open-Source Language Technologies for Thai Language Knowledge, and Culture

Overview

Typhoon is a leading initiative that advancing AI and Large Language Models for Thai. As a founding leader at SCB 10X, I helped establish it as nowaday Thailand’s top AI research lab.

The project spans the full AI development cycle - from research on multimodal and reasoning adaption to application — aiming to position Thailand as a technology creator, not just a user.

Key Achievements & Impact

  • Pioneering Thai LLMs: Authored and led the development of the first comprehensive Thai LLM & multimodal family, Typhoon, which became the most competitive open-source Thai LLM.
  • Widespread Adoption: Typhoon models have achieved over 100,000 downloads on Hugging Face and processed more than 30 million API requests, at opentyphoon.ai.
  • Production Use: The first LLM chosen by SCBx for production use, highlighting its performance and cost.
  • Industry Recognition: The Typhoon Lab, received the Techsauce Innovation Award 2024.
  • Open Research: All major Typhoon models and research papers are open-sourced, fostering collaboration and advancing the field with in Thailand, SEA and globally.

Core Typhoon Lab Works

1. Foundational Models (Typhoon)

  • Developed the initial family of Thai Large Language Models, establishing a strong baseline for Thai NLP.
  • Designed a Thai knowledge evaluation system from scratch for LLMs.
  • Implemented a continuous pretraining pipeline from crawling to filtering and dataset creation to adapt High resource language-focused LLMs to Thai effectively.

2. Multimodal Capabilities (Typhoon2)

  • Typhoon2: Extended foundational models to handle text, vision, and audio inputs/outputs, creating one of the first multimodal LLMs in Southeast Asia.

3. Reasoning Models (Typhoon T1 & Typhoon R1)

  • Typhoon R1: Developed the most advanced reasoning LLM tailored specifically for Thai, by leverage strong English centric LLM and combine with
  • Leveraged novel model merging techniques to efficiently adapt language-specific LLMs into reasoning models.

4. As a Lead AI Scientist and Founding Member of Typhoon Team

I also, lead, encorage, advice and shaping team who also built

  • Typhoon Audio2: Developed one of the first end-to-end speech LLMs in SEA, enhancing capabilities in audio processing and understanding for low-resource languages.
  • Typhoon T1: Created Southeast Asia’s first dedicated reasoning model for the Thai language, leverage scaling at test-time paradigm to addressing complex logical tasks.
  • Typhoon OCR: A OCR focus model, compatitive with top proprietary VLM such as OpenAI gpt4o and Gemini flash in OCR task.
  • Collaboration: Collaboration with SEA region such as SEA AI LAB on Sealion2 and AI-SG on Sealion and Project Aquarium. Also with stanford for ThaiHelm, Talk-Arena and multiple work.

Key Publications: