#58 Why do you need a Data Product Specification?

Traditionally the purpose of product specification is e.g. to provide a description and explanation of product requirements, components, features, manufacturing, performance and origin of the product. Product specification is a critical early stage in product development and guides critical thinking in the early stages of implementing a new idea. It helps to communicate effectively about what you are building and to whom, and what the end result should be.

The same applies to data product specification. Most of the issues raised here need to be resolved already in the business-oriented design phase of the data product, for which we have developed the Data Product Toolkit®. If desired, the Data Product Toolkit can, for example prefill the information required by specifications, which can be refined as the work progresses.

How to implement data product specification?

If you want to implement accurate enough data product specification (DPS), you need first to understand what it eventually is. DPS is a detailed and structured plan that outlines the product to be built and what are its specific requirements and key functions. Usually it includes information about person or user, most often the customer, for whom it is made. It does not include anymore business objectives but is explicitly a specification.

One of the key purposes of DPS is of course to determine the necessary datasets and datasources. In other words, what "raw data" or data products are behind the data product, i.e. where the product is derived from. Often, as the specification is structured and contain many encoded elements, most of the matching can be done automatically as a part of data productization and processing.

The DPS must be very clear, easy to read and use. It should contain all the necessary information, e.g. for the needs of application developers and the product team. In principle, it should include as much information as possible to ensure that the specifications are not too vague.

The aim of DPS is to promote the use of data and enable, for example, the commercialization (external monetization) of data. The specification also enhances the developer experience of internal and external developers, reduces various deployment risks such as vendor locks and other dependencies, and of course accelerates market entry of data product. The DPS should be in machine-readable form.

OpenAPI Specification (formerly Swagger Specification) is the optimal choice, when using REST APIs to specify DPS and make data product utilization as efficient and automated as possible As part of DPS, data ownership and terms of use can be determined through a data product licence (DPL). We’ll cover the DPL in later blogs.

Examples of key entities and guidelines of the DPS implementation

  1. Overview: General description, purpose and background.

  2. Specification scopes: Meaning, scope and uses.

  3. Detailed data product identification: Title, abstract, topic category, pre-defined themes or portfolios and representation types.

  4. Data content and structure: Description of data content and data structures.

  5. Reference systems: When usign e.g. geographic or temporal data reference systems (e.g. coordinate reference system or time-zone) are needed.

  6. Data quality: Data quality requirements, acceptable conformance quality levels and corresponding data quality measures and e.g. harmonization levels.

  7. Data capture: Specification of the sources and processes that shall or may be used for the data capture.

  8. Data maintenance: Specification of the principles and criteria to be applied in maintaining the product, such as maintenance and update frequency.

  9. Portrayal guidelines: Specification of the portrayal rules and a set of portrayal specifications, for specifying how the data may be represented graphically. This could be particularly important for a web service, reports and analytics end-product.

  10. Data product delivery: Delivery channels , formats (e.g. transfer standard i.e. REST API) or delivery medium.

  11. Data product licence: Data product licence (DPL) is often needed when taking into account data ownership and use agreement confirmation level. Contractuality varies a lot and can range from open data or applying terms and conditions to a more rigid digitally signed contracts. For example, sometimes separate legal documentations as a part of DPL are needed and sometimes just clicking apply to T&C as a data platform or marketplace user is just enough.

  12. Additional information: Additional catchall for anything else to be specified for the product, such as constraint information regarding access and use.

  13. Metadata: Metadata is data that provide information about the data and shall be provided with the data product.

  14. Pricing: Recurring time period based (day, week, month, year) plan, one time payments plan, pay-as-you-go plan