How did Intercom use $100 million and GPT-4 to build the hit AI customer service Fin in 4 months?

In 2022, when GPT-4 had just been released and most companies were still busy discussing the news, Intercom had already quietly moved into action. Within just a few hours, this customer service software company began hands-on testing; in only four months, it launched the AI Agent Fin—now capable of handling millions of complex customer inquiries every month.

This first-mover advantage was no accident. Faced with the rapid evolution of large language models (LLMs), Intercom’s leadership made a decisive bet on AI. They quickly assembled a cross-functional team, shut down all non-AI projects, invested $100 million to rebuild the business architecture, and fully migrated to an AI platform.

This decision triggered a company-wide transformation from top to bottom: reshaping product teams, establishing an “AI-first” customer service strategy, and building a technical platform capable of powering Fin’s high-speed operations.

Next came the three key lessons they summarized from this AI transformation journey—lessons that any team, no matter their current stage, can immediately apply.

“AI must be embedded into product design from the start, not crammed in as an afterthought.” —Paul Adams, Chief Product Officer, Intercom

Lesson 1: Start early and experiment continuously to improve model fluency

Intercom began experimenting with generative models early and often, gaining valuable real-world experience—identifying the limitations of models and finding opportunities for optimization. When GPT-4 launched in early 2023, they were fully prepared, releasing the AI customer service agent Fin in just four months and rapidly expanding its use.

“With GPT-3.5, we achieved smooth conversational experiences, even some ‘magic,’ but its reliability wasn’t high enough for customer service. Because we had laid the groundwork early, when GPT-4 arrived, we knew the time was right and moved quickly to launch Fin.” —Jordan Neill, VP of Engineering, Intercom

This grasp of model fluency enabled Intercom to design Fin Tasks—a system that can automatically handle complex processes like refunds and technical support. While the team initially planned to use a retrieval-based architecture, evaluation showed that GPT-4.1 could complete tasks independently and efficiently, with higher reliability and lower latency.

Today, GPT-4.1 remains the core engine of Intercom’s AI systems, including the key logic of Fin Tasks. The team also found that adding “chain-of-thought prompting” to non-reasoning queries improved performance without building a full RAG pipeline.

The conclusion is clear: the earlier and deeper you understand the model, the faster you can seize opportunities as the technology evolves.

Evaluations showed GPT-4.1 delivered the highest reliability in task execution while reducing costs by 20% compared to GPT-4o.

Lesson 2: Use rigorous evaluation to drive rapid iteration and complete upgrades in days

To speed up technical upgrades, you must measure precisely what works and understand why.

Intercom’s ability to quickly switch to new models, modes, and architectures hinges on a structured and rigorous evaluation process. Whether it’s Fin Voice (based on the Realtime API) or Fin Tasks (based on GPT-4.1), every deployment undergoes offline testing and live A/B experiments, focusing on three key capabilities:

Instruction adherence: Can it accurately understand and execute complex multi-step tasks (e.g., refund processes)?
Tool call accuracy: Can it reliably invoke system functions?
Brand tone consistency: Can it consistently maintain communication in Fin’s style?

For example, the team uses real customer service records as benchmarks to test task execution and uses evaluation results to guide A/B tests comparing different model versions (e.g., GPT-4 vs. GPT-4.1) in resolution rates and customer satisfaction.

Thanks to this approach, Intercom completed the migration from GPT-4 to GPT-4.1 in just a few days. Once they confirmed GPT-4.1’s significant improvements in instruction handling and function execution, they immediately deployed it to Fin Tasks, resulting in notable gains in performance and user satisfaction.

“Within 48 hours of GPT-4.1’s release, we had evaluation results and a deployment plan. It struck the perfect balance between intelligence and latency.” —Jordan Neill, SVP of Engineering, Intercom

Lesson 3: Build flexible architectures for long-term competitiveness

From its inception, Intercom has designed its product architecture with change in mind, ensuring the system can evolve in step with the AI models it depends on.

The Fin system uses a modular design, supporting multi-modal interactions across chat, email, and voice—each with its own trade-offs in latency and complexity. This architecture allows Intercom to route each customer request to the most suitable model and swap or upgrade models without overhauling the underlying system.

This flexibility is intentional and constantly refined. The Fin architecture is now in its third major iteration, with the next version already in development. The team adjusts dynamically with model capabilities: adding complexity when needed to unlock new functions, and simplifying when possible to reduce maintenance costs.

The benefits of this flexibility were especially clear in Fin Tasks development. Initially, the team planned to build a custom retrieval-based architecture to support multi-step tasks (like refunds, account changes, and troubleshooting). But testing showed GPT-4.1’s instruction adherence exceeded expectations, maintaining equal reliability at lower latency and cost.

“Honestly, I don’t think GPT-4.1 has been talked about enough. Its performance in latency and cost really surprised us, giving us the opportunity to simplify the architecture and remove a lot of unnecessary complexity.” —Pratik Bothra, Principal Machine Learning Engineer, Intercom

Unifying data and workflows to create connected customers

This is only the beginning. Intercom is using its advanced AI models and flexible modular architecture to extend AI’s reach from customer support to the entire enterprise—accelerating problem resolution and enhancing customer experiences across the board.

Support teams: The Fin AI Agent can handle the majority of customer inquiries from chat, email, and voice channels.
Operations teams: Fin Tasks automates complex ticket workflows, such as processing refunds, account changes, and subscription updates.
Product teams: Through Intercom’s MCP server, AI tools like ChatGPT can access customer conversations, tickets, and user data to help teams identify issues faster, plan product roadmaps, optimize communication strategies, and efficiently prepare quarterly business reviews.

With its rigorous evaluation standards, performance-based design, and flexible architecture, Intercom has built a highly scalable AI platform. This not only redefines customer support but also provides valuable lessons for other companies looking to leverage AI for business growth.

AIGC

Hot

Data Consumption

Hot

Data Warehouse

Hot

Cloud Database

Hot

Data Governance

Hot

Connect to SaaS

Hot

Infrastructure

Hot

How did Intercom use $100 million and GPT-4 to build the hit AI customer service Fin in 4 months?