The Bitter Lessons
Building Sana AI

April 4, 2024 · 5 min read — Joel Hellermark · Founder and CEO

In 1970, AI and robotics pioneer Hans Moravec observed that tasks that are difficult for humans can be easy for AI, and vice versa.

Fifty years later, as we build Sana AI, we are rediscovering this paradox. Our goal is to create an AI agent for work with infinite capabilities—one that can process any amount of information, answer any question, assist with all knowledge work, and ultimately solve your most difficult problems autonomously.

Along the journey, we’ve kept coming back to Richard Sutton’s Bitter Lesson on computation. There are plenty of lessons we’ve learned on our own—bitter and sweet in equal measure.

Multi-step planning and reasoning are essential

Techniques like chain-of-thought, tree-of-thought, and reflection significantly improve query response accuracy. If we want the assistant to correct mistakes autonomously, access external tools, and provide end-to-end solutions on our behalf, these techniques have to be built in. We've trained our custom agent solution R-4 to lay out a step-by-step plan to solve the query, execute the plan, and self-reflect on its output.

Meetings are a vital source of truth

So much invaluable company knowledge gets exchanged verbally in meetings vs. written documentation. It's an untapped source of net new company data. With R-4, Sana AI can transcribe, summarize, index, retrieve, and analyze meetings like any other knowledge asset—enabled by seamless integrations with platforms like Google Meets, Teams, and Zoom.

Verified data is critical

Company knowledge goes stale fast. Here, an AI agent is uniquely positioned to succeed where the traditional knowledge management system has failed, but only if it knows what knowledge is most up-to-date. Along with constant content re-indexing and testing, we've designed verification, deprecation, and Q&A workflows to ensure the assistant always has access to the latest and greatest.

Vector search alone is insufficient

We need a rich knowledge graph to handle queries based on multiple data variables. Vector search can't predictably solve a request like "List all companies with more than $5M in ARR we've met in Europe over the last 12 months". Instead, we first have to derive a knowledge graph and then search structured data.

Models need a unified interface

The AI ecosystem is moving so fast that committing to a single provider or model feels risky. Equally, maintaining provider-agnostic systems adds huge complexity for organizations. We've built an underlying architecture that enables Enterprise customers to choose and switch between a range of state-of-the-art models.

Integrations, permissions, and deployments are a maze

Most company data and context live across a slew of tools, not just within an operating suite. We've done the hard work to ensure we can enable granular access controls, integrations, and permissions for 100+ enterprise tools. It's been a massive undertaking. We've had to build a system that automatically handles permission through out-of-the-box integrations that mirror users' existing access rights. This system supports deployment in private clouds, offering flexibility for Enterprise customers.

Customization is expected but cumbersome

Every team wants its own unique AI agent. But fine-tuning models for each team's needs is difficult without an internal AI team. Our no-code UI setup lets users build custom assistants tailored to each use case in minutes.

The long tail is really long

80% of user queries can be handled well, but that last 20% spans a huge range of edge cases. Reaching human-level robustness has required us to use multimodal models and reasoning that can give the assistant a better understanding of images, slides, and tables.

Generative interfaces are in their infancy

Chat and autocomplete-style UIs are just the beginning. Inventing new interaction paradigms that fully leverage LLMs is a vast design space. We're building out components the assistant can use to dynamically generate interfaces ranging from meeting summaries and knowledge snippets to actions in apps like Salesforce and Gmail.

Memory will be non-negotiable

People's comfort and experience with AI vary greatly. To serve everyone equally well, we have to set an unreasonably high bar on user experience. We also need our architecture to support a future where agents will simply adapt to your individual needs from the first login and keep understanding you better over time.

Assistants should be proactive

In a world where most people don't know what an AI agent is capable of, Sana AI can't just respond to queries. It also needs to anticipate needs across an entire workflow. For sales, this could look like automatically taking notes according to the company's sales methodology, drafting the customer follow-up email, and updating Salesforce.

Humans are the benchmark

Users expect the assistant to match what a human expert could do at every turn. But human-level breadth, robustness, and adaptability remain elusive. We've found that giving the user a step-by-step view, citing sources, and allowing them to correct the assistant's mistakes has helped us build trust.

It's tempting to seek clever shortcuts and hacks when faced with challenges of this scale and complexity. But as computer scientist Richard Sutton has argued, the biggest lesson from 70 years of AI research is that leveraging computation to solve problems in general ways is ultimately the most effective approach. Even if it requires daunting amounts of engineering.

So that is what we are doing. Step by step, we are doing the hard engineering work to build an infinitely capable system that makes human and AI collaboration seamless.

It's a long road, with bitter lessons at every turn. But at the end lies the ultimate prize—useful AGI grounded in our knowledge that can transform how we live and work.

Join us on the journey. Try Sana AI for free @ sana.ai

The Bitter LessonsBuilding Sana AI