#37 About The Data Marketplaces - What's going to happen next?

I’ve had plenty of discussions with the data economy entering customers about the marketplaces. Most of the clients are not yet ready for that and still train the monetization internally. Still, the interest is there.

The trend of going towards distributed solutions can split the marketplaces as well, not only the source data. If one just follows what the blogs tell us about the data markets, the impression easily is that we are riding the wave. The reality is that the data markets are taking baby steps. But how far are we with the data economy?

One method to have an idea of how far the data economy has progressed is to look at the existing data marketplaces and their offering. It’s not clear yet if there will be a “ProgrammableWeb” for data products or not which has given us a good idea of how the API Economy progressed.

Instead, I took Amazon DataMarketplace, Snowflake marketplace, and OceanProtocol market offering under review. Of course, the selected marketplaces are just a few of the total markets, but still represent the typical and most discussed marketplaces. Analyzing the offering gives a good idea about the status of data marketplaces.

For sake of clarity, a data product is here understood as a package of data with the intention to create value for the user, it is intended to be shared or sold internally, in data ecosystems, or in public. In short, it is treated as a product. I also was intrigued to find out how much the offering contains API access to the data flow. The API or native platform access under certain circumstances converts the product into service since you are no longer buying the data (owning and integrating to data lake), but paying for the access and value it offers.


Read more below or watch the story as a video

Datasets and platforms dominated markets

As was mentioned, I selected 3 marketplaces for review. Here are the results. I found out that Amazon Data Marketplace has 4051 data commodities. 43 (1%) of the offering is API access-based (a service). The biggest share of products were datasets (3943) offered via S3 Bucket. Data warehouse (RedShift) based offering counted 65. It is clear that Amazon Data Marketplace is mostly about selling datasets – data products in the legacy format. This is what we saw a decade ago in mass scale in the open data movement.


When I looked at the Snowflake marketplace I found out that it contains 1079 data commodities. Out of that 43 (6%) were listed under API access. It’s worth mentioning that the offering based on data sets (94%) is comparable to the Redshift type of data warehouse based offering. This kind of native access to data in the platform has clear benefits compared to APIs: often faster, more reliable, and easier to purchase to mention a few.


This kind of native offering resembles the situation with B2C streaming platforms like Netflix. Instead of offering movies as a service, the platform offers data as a service. The access is not often API-based, but SQL or alike mostly because that is the most common (and preferred) protocol used by data consumers. Of course, data can be sold as products in this case as well. This part of the offering is hard to categorize. Despite this nuance, I’m now categorizing the datasets which have SQL or alike access layer to be services. In the below table these cases are listed under “Platform”.



The third selected marketplace, Ocean Protocol Marketplace v3, contains 627 data commodities. It is impossible to clearly see which of those might have API access. All of the offerings has dataset label. I did make a search for “API” among the offering and gained 46 hits. I took a look at the results and none of the items offered API access to data, but hits came from matching with “scrAPIng” and other similar words containing the “API”. The mentioned 627 data commodities include 96 algorithms.



Tip of the iceberg

What we see in the data marketplaces is just the tip of the iceberg. When I was deeply involved in the API Economy it was known fact that majority of the APIs are hidden from the public.





More APIs were used in the partner level and in private / internal context. The amount of “hidden” APIs is in tens of millions. Currently ProgrammableWeb lists around 24 000 APIs. That is just the tip of the iceberg. It is likely that data economy will follow the pattern.

Even if some claim that similar layers could not exist in the data economy, I have found evidence in the practical life as consultant and data platform CDO, that similar publicity layers are emerging.



Anything above it going to public either in closed ecosystems or beyond is controlled by business design and management. They are the gatekeepers for anything going outside the company borders. Take a look at the illustration, give it a thought and contact us to engage in discussion.


I approached the classification from a value stream perspective and added layers for private, closed data ecosystem, and public. Just like companies were afraid to publish APIs without practicing internally first and then carefully cross the border to partners, I see same kind of pattern in refining data for internal and closed data ecosystem usage. Likewise, this was not monetizing APIs, nor it is data monetization. This is data sharing in a trusted bounded context – the extranet of data products. Closed data ecosystem data commodities cross the company border and the company exposes its data outside. This is the point from where the business logic layer is leading the development.

It is though important to stress that you should treat your internal data just like it would someday be published as products and services to public marketplaces. In short, approach all data with the same process model, requirements, and product mindset regardless of its initial publicity level.

Near Future

In the future, companies will enter the data marketplaces with a plethora of products and services after getting their own processes, skills, and confidence high enough. But that is not going to happen in 2022.

This year is the year of sharing data in trusted data ecosystems – products and services at the partner level.