Data Scientists as the customer
Currently, data economy commodities such as data products and data services are consumed by data scientists and data engineers. They are rather technical and use similar tools compared to more traditional developers.
data scientists use 45 percent of their time in preparing the data for actual use.
Same basic tools
What separates clearly data scientists from traditional developers, is the popularity of R language which is for statistical computing and graphics. That is used at least frequently by 27% of the data scientists which is probably a lot higher compared to traditional developers.
Currently, Data scientists are expected to have also DevOps capabilities to develop and implement pipelines all the way to production. In short, it seems that the majority of data scientists' work requires a full-stack approach and capabilities.
That is not achieved in formal training but requires experience. It is no surprise that DataOps has emerged and is intended to standardize and support data product and service development and launch operations as well as maintenance including lifecycle management.
Watch the video or continue reading below it
According to State of Data Science 2020 report, data scientists use 45 percent of their time in preparing the data for actual use. It could be worse. In some surveys in the past, data prep tasks have occupied upwards of 70% to 80% of a data scientist’s time.
Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on business and overall job satisfaction. This efficiency gap presents an opportunity for the industry to work on solutions to this problem, as one has yet to emerge.
Better experience - more sales and satisfaction
The solution is captured under the umbrella term Data Developer eXperience. It is similar to what we have witnessed in the API Economy. In general, the developer experience has been ranked 2nd most important factor in evaluating the success of API programs.
Furthermore, focusing on the Developer eXperience has proven to boost sales and create robust industry-shaping companies like Stripe and Twilio. Both of them are highly appreciated by the developers and have had focus on the developer experience. The result is revenue in billions.
Since the data economy requires APIs to act as digital plumbing, it is expected that the current developer experience of data products and services is something close to API developer experience with a little twist. The author is conducting research about the differences and similarities together with two data economy professionals. Here’s what we can say about the data developer experience so far.
Getting Data Developer eXperience in shape
The data developer experience aims to enable customer’s flow state which is the moment of creativity and productivity. Thus it is the holy grail for all developers. It should add a minimum cognitive burden and increase the size of the technical tool-stack only if absolutely necessary. Great data developer experience offers self-service tools to get started when and where ever. Here are 4 selected approaches to data developer experience. Get these in shape and you are tackling the first major issues.
The data developer experience aims to enable customer’s flow state which is the moment of creativity and productivity.
1. Automate digital janitorial work
Throwing your money in that has a lower than expected return of interest. This is the biggest bottleneck as it was discussed in the beginning. We are using a lot of time on something that should be automated and standardized. The solution is to harmonize the data flows and find robust solutions to evaluate data quality and even fill in the gaps for example with interpolation and advanced AI and machine learning-driven techniques.
2. Offer first results in minutes, preferably in seconds
We are expecting to see some results in minutes rather than hours or days. Thus cleaning data for usage is not an option. Instead, the data as a service (or product) must have clear data models which are preferably standardized and offer quality attributes.
3. Transparency in performance creates trust
The consumer also most likely expects you to offer status information about the data flow. This resembles the current practice with APIs. You need to be transparent and real-time in your operations to gain and keep the trust of your customers. This might be partial if not totally solved with your API management. Depending on your offering, API management analytics can offer needed statistics in real-time. Keep in mind that data flow also known as data supply chain often consists of multiple systems and APIs. You must be able to verify the performance of the whole chain. Preferably you can visualize and offer statistics about the performance in each step inside the chain.
4. Support common tools, extend with open source
It is also very likely that data scientists solve similar problems in everyday data cleansing and preparation. Make sure your solution is easy to use in commonly used stacks. Build solutions that utilize the proven programming environments. This lowers the barrier to try out and reduces the cognitive burden.
Talented data scientists build ad hoc tools to remove repetitive tasks but the tools are not distributed to a wider audience. Refine internally developed tools and offer those as open-source for your clients and prospects. Developers love open source. Encourage contributions towards the tools you have published.