Data architectures increasingly encompass a wide variety of data sources—including distributed databases, SaaS applications, IoT, and Edge devices—that are spread across hybrid and multi-clouds, spanning geographic regions and availability zones. And now, new analytics tools are emerging for these dispersed data environments, designed to run queries wherever data is stored.
This new generation of analytics products can potentially generate insights faster than the decades-old central data warehouse model. The term “just-in-time analytics” is sometimes used in reference to the speed and flexibility they offer.
And, because these tools are able to access data wherever it’s stored, they both reduce data movement and provide business agility by running queries against data that may be here, there, or anywhere.
Many database providers, including AWS, Google Cloud, IBM, and Microsoft, already support so-called “federated” database queries, in which a single SQL query can access data from multiple places.
In the past few weeks, two startups—Starburst and Promethium—have introduced new capabilities for decentralized data that do something similar, albeit in new ways. Starburst emphasizes a “data mesh” concept in its approach to corralling decentralized data. Promethium, meanwhile, employs a data fabric, which is similar but a bit different.
How Starburst Enables Data Products
Boston-based Starburst describes itself as the “analytics anywhere” company. Coinciding with the company’s second annual conference, Starburst announced that it has secured $250 million in Series D funding, bringing its market valuation to $3.35 billion. Salesforce Ventures is among the investors.
Founded in 2017, Starburst’s flagship tech is called Starburst Enterprise, a SQL query engine with built-in encryption, authentication, and access control. Starburst’s toolset is available on the Big 3 public cloud platforms—AWS, Google Cloud, and Microsoft Azure—as well as Alibaba, HPE, and RedHat. Last year, the company also launched its analytics software as a service, Starburst Galaxy.
Starburst’s solutions are based on the open-source Trino, a distributed query engine originally created at Facebook, where it was known as Presto. Starburst customers include Carrefour Brazil, Marks & Spencer, Société Générale, and Zillow.
Starburst is based on data mesh, a relatively new architecture that assigns data ownership and governance to business stakeholders. At its conference, Starburst introduced access controls for establishing a data mesh and, in particular, for creating “data products.”
What are data products? Starburst describes them as data assets that are strategically conceived rather than a mere by-product of business activities.
“We continue to see a rapid rise in demand from companies that want to build and share data products,” said Starburst Co-Founder and CEO Justin Borgman. “We’ve worked hard to build a solution that enables companies to get the most out of their data by turning it into data products that can drive new revenue growth and cost savings opportunities.”
Other newly introduced Starburst capabilities include the ability to define metadata for use in creating, publishing, and managing curated data products. Also new is the ability to find, rate, and share data products.
In some ways, the concept of a data product isn’t new. FedEx, Garmin, and other companies have been packaging and sharing unique datasets for years. But no doubt, other businesses are eager to innovate with their own data deliverables, and framing the discussion around data products may be a helpful way to do that.
Promethium Simplifies Analytics Kitchen Sink
Promethium, based in Silicon Valley, announced that it has raised $26 million in Series A funding, bringing total investments to $34.5 million since the company was founded in 2018.
Promethium describes its technology as a “data fabric solution,” incorporating a data catalog and more than 200 data connectors to sources including AWS S3, Salesforce, SAP, Snowflake, and other database environments. In this respect, Promethium is akin to Starburst; both allow for access to distributed data.
Use cases for Promethium include data discovery, migration, data lakes and pipelines, and self-service analytics.
Promethium utilizes patented natural language processing to enable queries by non-technical users. “We allow you to take something as easy as a question and then figure out all the steps that data engineers and data analysts would have to do—global data discovery, complex data modeling, ETL, validate the query, apply the filter,” Promethium Founder and CEO Kaycee Lai told me during a briefing on the company’s technology. “We do all that, all the validation steps.”
While there have been many advances in analytics and business intelligence over the years, Lai contends that the myriad products now available often lack integration and have resulted in complexity. “It’s actually harder now to get that single source of the truth,” he said.
The Promethium environment aims to simplify this kitchen sink of analytics complexity by facilitating collaboration between an organization’s data scientists and business users. It does so with capabilities such as request tracking, chat, and feedback, allowing for back and forth among the teams involved.
Promethium touts the relative simplicity of its self-service analytics. For example, data does not need to be centralized for ETL transformation or shoe-horned into pre-built models.
Ahana Throws Its Weight Behind Presto
As you might expect given the nature of the opportunity, other vendors, both established and newcomers, are jumping into the action as well.
Ahana, another venture-funded startup, offers a managed cloud service for Presto that is available on AWS. Ahana is focused on analytics for widely used S3 data lakes.
As mentioned earlier, Starburst is built on Trino, which is derived from Presto. Trino and Presto are distributed SQL engines from the same code base, but they have branched. Ahana emphasizes that PrestoDB is “the one and only, original Presto.”
It’s not unusual for vendors to align themselves with different standards in this way, but potential users will want to understand the fine points. Check out Ahana’s Presto vs. Trino explainer for a side-by-side comparison.
Ahana notes that Presto, while powerful, is also complicated when it comes to configuration and management. Its managed service, with built-in integration and data catalogs, aims to simplify deployment and use.
Ultimately, that’s the name of the game with this new breed of analytics tools. Distributed data offers many advantages, including resilience, reduced latency, and data-residency compliance. But data access and analysis are essential to an overall strategy.