XetHub raises $7.5M for its Git-based data collaboration platform

Seattle-based XetHub, a startup that makes it easy for businesses to use Git for data management, today announced that it has raised a $7.5 million seed financing round led by Madrona. The basic idea here is to allow developers to work with data the same way they work with code, including all of the collaboration features a tool like Git enables. The team describes XetHub as a "collaborative storage platform for data management."

The company was co-founded by Yucheng Low (CEO), Ajit Banerjee and Rajat Arya, a team with years of experience working with large data platforms. Indeed, Low previously co-founded ML startup Turi, where Arya was the first employee. Apple acquired the company in 2016, allowing Low and Arya to work on various parts of Apple's ML platform stack, with Arya leading Apple's data platform team, for example. It was also at Apple that the two met Banerjee, who previously worked at Inktomi, Amazon and Facebook. He also previously founded two startups.

XetHub repository view is designed for navigating and visualizing data repositories while keeping GitHub sensibilities. XetHub automatically summarizes common file formats (CSV) and supports custom visualizations. Image Credits: XetHub

During their time working on the data platform at Apple, the team realized there was still a lot of room for improvement in the data management realm.

"It really shouldn't come as a surprise, but data is far more important than everything else. More important than the model -- than anything else," Low told me. "Managing where you store this data, how you collaborate on this data is really fundamental. However, what we see is that the way we manage data today really feels like how source code was done 30 years ago -- which means version control or collaboration is done by copy-and-paste -- sometimes there's a more elaborate version of it, but it's still ultimately copy-and-paste if I want to make sure no one else is touching what I'm doing."

Just like developers have moved to tools like Git for collaborating on their source code, XetHub wants to allow them to use these same familiar primitives for working with data.

"The way we think about it is that for the first time, we truly enable developers to work on data in exactly the same way as code," Low said. He noted that the team aimed to create a tool that doesn't just mimic a Git-like experience but one that preserves the core Git user experience -- including all of the integrations that developers are familiar with.

XetHub extends Git to support large files, offering efficient storage and transfer with data deduplication while maintaining full Git compatibility. Image Credits: XetHub

Currently, the service can handle repositories with up to 1TB of data, with plans to expand this to 100TB soon. Few developers will want to clone a large repository like this, so one nifty feature here is that developers can also mount these repositories and make them behave like a local file system, no matter whether that's on their laptop or a large GPU cluster. It's also worth noting that the tool is agnostic to file formats.

From a marketing perspective, the team is focusing its efforts on AI/ML teams, but users can obviously use XetHub for managing any kind of data.

XetHub is now publicly available with a free community edition that you can use to manage up to 20GB of deduplicated storage. Low tells me the company is already talking to some enterprise customers, but the team isn't quite ready to name names yet.

"Yucheng and the exceptional XetHub team have been innovating with machine learning for well over a decade, and then applying their skills at the most iconic consumer technology company -- Apple. XetHub enables developers to work with large datasets, in collaboration with others, to build intelligent and generative applications," said Matt McIlwain, managing director, Madrona. "Developing and deploying these applications is constrained by legacy infrastructure and complex data workflows, and XetHub addresses these pain points from the developer point of view."