Composo helps enterprises monitor how well AI apps work

Micheal

Composo co-founders - Luke Markham left, Seb Fox right

AI and the large language models (LLMs) that power them have a ton of useful applications, but for all their promise, they’re not very reliable.

No one knows when this problem will be solved, so it makes sense that we’re seeing startups finding an opportunity in helping enterprises make sure the LLM-powered apps they’re paying for work as intended.

London-based startup Composo feels it has a headstart in trying to solve that problem, thanks to its custom models that can help enterprises evaluate the accuracy and quality of apps that are powered by LLMs.

The company’s similar to Agenta, Freeplay, Humanloop and LangSmith, which all claim to offer a more solid, LLM-based alternative to human testing, checklists and existing observability tools. But Composo claims it’s different because it offers both a no-code option and an API. That’s notable because this widens the scope of its potential market — you don’t have to be a developer to use it, and domain experts and executives can evaluate AI apps for inconsistencies, quality and accuracy themselves.

In practice, Composo combines a reward model trained on the output a person would prefer to see from an AI app with a defined set of critera that are specific to that app to create a system that essentially evaluates outputs from the app against those criteria. For instance, a medical triage chatbot can have its client set custom guidelines to check for red flag symptoms, and Composo can score how consistently the app does it.

The company recently launched a public API for Composo Align, a model for evaluating LLM applications on any criteria.

The strategy seems to be working somewhat: It has names like Accenture, Palantir and McKinsey in its customer base, and it recently raised $2 million in pre-seed funding. The small amount raised here is not uncommon for a startup in today’s venture climate, but it is notable because this is AI Land, after all — funding to such companies is abundant.

But according to Composo’s co-founder and CEO, Sebastian Fox, the relatively low number is because the startup’s approach is not particularly capital intensive.

“For the next three years at least, we don’t foresee ourselves raising hundreds of millions because there’s a lot of people building foundation models and doing so very effectively, and that’s not our USP,” Fox, a former Mckinsey consultant, said. “Instead, each morning, if I wake up and see a news piece that OpenAI has made a huge advance in their models, that is good for my business.”

With the fresh cash, Composo plans to expand its engineering team (led by co-founder and CTO Luke Markham, a former machine learning engineer at Graphcore), acquire more clients and bolster its R&D efforts. “The focus from this year is much more about scaling the technology that we now have across those companies,” Fox said.

British AI pre-seed fund Twin Path Ventures led the seed round, which also saw participation from JVH Ventures and EWOR (the latter had backed the startup through its accelerator program). “Composo is addressing a critical bottleneck in the adoption of enterprise AI,” a spokesperson for Twin Path said in a statement.

That bottleneck is a big problem for the overall AI movement, particularly in the enterprise segment, Fox said. “People are over the hype of excitement and are now thinking, ‘Well, actually, does this really change anything about my business in its current form? Because it’s not reliable enough, and it’s not consistent enough. And even if it is, you can’t prove to me how much it is,’” he said.

That bottleneck could make Composo more valuable to companies that want to implement AI but could incur reputational risk from doing so. Fox says that’s why his company chose to be industry agnostic, but still have resonance in the compliance, legal, health care and security spaces.

As for its competitive moat, Fox feels that the R&D required to get here is not trivial. “There’s both the architecture of the model and the data that we’ve used to train it,” he said, explaining that Composo Align was trained on a “large dataset of expert evaluations.”

There’s still the question of what tech giants could do if they simply tapped their massive war chests to enter this problem, but Composo thinks it has a first mover advantage. “The other [thing] is the data that we accrue over time,” Fox said, referring to how Composo has built evaluation preferences.

Because it assesses apps against a flexible set of criteria, Composo also sees itself as better suited to the rise of agentic AI than competitors that use a more constrained approach. “In my opinion, we are definitely not at the stage where agents work well, and that’s actually what we’re trying to help solve,” Fox said.

Leave a Comment