04/08/2021 | News release | Distributed by Public on 04/08/2021 08:46
Are the phrases 'data is the new gold' or 'data is the new oil' just marketing slogans to entice businesses to spend more money on technology? Or are these metaphors describing a new paradigm that revolutionizes the world?
Let's try to find that out and start with a few facts. Both oil and gold were buried deep under the surface of the earth for millions of years. And both only started to become valuable for humans once we learned how to dig them out of the earth and process them as resources to become jewelry, or fuel for light, heating, transportation and energy. Similarly, data has been around for ages, starting with early Sumerian and Chinese writing systems, throughout centuries of data in printed books and since the 20thcentury in digital form on our computer systems.
Also, data only has value when it is served in the right format so that it can be read or consumed in another way, and more importantly, made accessible for people who need it, when they need it. That is why libraries were built to store books, allowing scholars to easily access the information they needed. For centuries, the use of data was at a stage comparable with the use of oil for lighting with a petrol lamp.
It started to accelerate with the digital era. Since the first days of computers, methods were developed to make the information readable and accessible for people. The birth of the world wide web with hyperlinking capability was another big step forward in creating value with data, also enabling you to read this article.
There is a big difference between data and gold or oil. While the available amounts of new gold and oil are shrinking, the available amounts of data are increasing exponentially. As you can see on the graph below, the amount of digital data globally is doubling approximately every three years. The 74 zettabytes we have today stand for 79 billion terabytes, which is more than 10 terabytes for each living human on earth!
Another big difference is that this increasing amount of data is also created in more and more variations. While gold and oil still look the same in 2021 as it was centuries ago, data has become so variable that it becomes impossible for humans to comprehend it all. At the same time, we all know that having insight into facts is a key to decision-making. Looking back a few decades, the information for decision making was based on a small set of fact tables that had to be nicely summarized and presented in a report (in those days Excel was a revolution!) and in many businesses that is still the core of the decision-making support system.
Now here comes the great opportunity. In the growing pile of data, and thanks to the connected world, there is also information about global economic trends and markets, geo-political factors, supply chain events, consumer and customer behavior, and so on. And within the walls of the enterprise, with the Industrial Internet of Things(IIoT), robotic process automation(RPA), and business process/workflow management (BPM) a whole new host of data streams becomes available.
It must be clear that enterprises that manage to capture a rich variety of data and are able to transform it in a format suitable for decision-making have a huge advantage over companies that make decisions based on traditional reporting only. The first category of companies will be looking at facts over a much larger horizon and with much more colors in the picture.
On January 28, 2021, the following headline appeared in the Washington Post:
'Israel moves to head of vaccine queue, offering Pfizer access to country's health-care database'
Not only did the price paid for the vaccine make Pfizer decide to prioritize deliveries to Israel; the company probably valued the direct access to the country's healthcare data higher than the actual money deal. There can be no better illustration of how data is indeed becoming the new gold. The article obviously pointed to the potential infringement of privacy by sharing people's health data. So the headline is also an illustration that in a world of big data, concerns about data privacy and security are increasingly becoming a hot topic.
In business, decision-making is not exclusively a management responsibility; each person in an organization who has access to rich data in the context of their job, will make better decisions and therefore do a better job. While in the old days the management was reluctant to share too much information to the workers, in today's organizations, they realize the benefit of democratization of information; just like Google, Wikipedia and many other internet sources of information that have brought benefits to the society. Leaders with an authoritarian mindset that don't acknowledge the value of democratization of information are a minority.
The insight is growing that where the old-style reporting was considered an unavoidable cost, the new style of data strategy acknowledges that data, when managed well, creates a lot of value for the business in the first place. In the coming years, this will lead to entirely new businesses that specialize in buying and selling data (monetizing data), and at that point, data is almost literally becoming like oil or gold (trading). But even without thinking that far, each enterprise must realize that it becomes crucial to protect and maximally exploit the value of the data that they create within their walls.
Looking at the evolution of the technology, the new data flows are also becoming a feedback loop in the processes that are increasingly automated; for example, the data from an automated workflow can be used to detect bottlenecks and make automatic adjustments in the flow itself to balance the work.
That brings us to another important technological aspect of modern data management. As mentioned, the huge volumes and variety of data make it impossible for humans to find the information that is needed and transform it into a useful format. That is where the algorithms come into the game: these are smart pieces of program logic that are able to crawl through terabytes of data and detect patterns in information that are relevant for the decision process they need to support. We call it smart logic because the logic can 'learn' from previous results in a feedback loop and get better and better in doing their taskwith each iteration. Other algorithms are specialized in correcting errors and gaps in the information (cleansing) and other algorithms do the data harmonization and aggregation. All these techniques fall under the umbrella of what is called 'Data Preparation', making the data available for the consumers.
After preparation, the large amounts of new gold must also be stored in a new Fort Knox. The data lakeis a technological answer with a data storage concept that is designed to absorb large amounts of data of any format and automatically replicate it over a distributed network of servers. Even without making backups, the data in the data lake will remain available, without interruption, for decision support and other purposes (e.g. compliance).
All these bright opportunities of the new golden data do not mean there are no challenges.
A first worry is data governance: policies must be defined for managing the data quality, security and data life cycle. How trustworthy is the data? What is the lineage? Is it confidential? Who can access it? How long must we keep data available for compliance reasons? Tools are needed to apply those policies. In a world of big data, data governance can only be effective if the tools have some form of intelligence so that the policies are applied automatically, without administrators having to manually manage each data source in detail.
The main challenge, given the wealth of opportunities, is determining the best use cases for a business. This starts by asking: what are the problems we are trying to solve; what are the questions we are trying to answer? The answer will vary a lot from business to business.
A pharmaceutical company like Pfizer might find the most value in finding correlations between people's socio-demographic data and the effects of their vaccines. An automotive manufacturer will probably be looking for causes and impacts on supply chain disruptionscaused by the pandemic or geopolitical disruptors. How is the evolution of climate change impacting car sales? Obviously the sales of electric vehicles increase, but what is the speed of this change, and what are the regional differences? What type of electric vehicles lead the race in each region? That can be analyzed on a global level, but equally relevant is to analyze it in the context of the company's ecosphere. It will greatly help with decisions on when and where to invest in production capacity and logistics.
Increasingly powerful analytics tools have become available to exploit the value hidden in the data lakes, the vaults of the new gold.
As a leading provider of cloud solutions for global manufacturers, QAD is continuously investing in its embedded analytics solutions. Starting in 2019, QAD Adaptive ERPincludes a Cassandra data lake to capture both ERPand production execution data. It's a technology that is ready for Big Data. Apache Spark algorithms take care of processing large datasets at lightning speeds. Recent investments made analytics metrics available in the context of the application transactions screens, with self-service configuration, fully enabling the democratization of the information sharing in the organization. With Global Analytics, QAD is also broadening the analytics scope to span multiple ERP systems in a single Cassandra cluster.
For more information, please leave a comment or visit www.qad.com.