Thirty years back one of the main challenges for most businesses was paucity of data. Today one of the main challenges for most businesses is plethora of data.
“Data is the new oil” – it was 2006 when the British Data Scientist Clive Humby first used this phrase. When we look around ourselves in our daily life, we can well feel the statement. Not only in professional life, we had been using data in our personal life decision making as well. However, it is in more recent times that we have realized the real value of data.
If someone does not know how to store and use oil properly, it can lead to a havoc. The same goes with data as well. “Lies, Damned lies, and Statistics”, a phrase although coined way before 2006, explains this mess. A funny joke points out that 40% of sick people die when cared for at hospital, in contrast to 10% of sick people who die when cared for at home. Some serious analyst inferred from the above data that a sick person should never be sent to the hospital. The analyst made a ‘selection bias’ error, a very common mistake committed by many analysts.
However, one big difference between data and oil (obviously there are many differences, but in the context of the value discussion) is that oil is a form of nonrenewable energy – while data can be renewed, rather reused, in infinite number of ways and infinite number of times.
Thirty years back one of the main challenges for most businesses was paucity of data. Today one of the main challenges for most businesses is plethora of data. And a big part of these data is digital in nature. According to a study published by Statista Research Department in September 2022, total digital data in today’s world is estimated at 97 Zettabytes. By 2025, this number would go up to 181 Zettabytes. Most of these digital data are unstructured. Thanks to the unprecedented advancement of technology, collection of data is a cakewalk now-a-day. Data gets collected on the fly while the user is in the process of exploration on a digital platform. It is important to collect and feed the data into the system real time since many of the use cases of this data would be short-lived. And with the advent of AI & ML techniques, there has been literally a revolution in the use of data. It is aptly a Data Renaissance.
The fintech industry for example has been completely revolutionized by Data Renaissance. Innovative products have been introduced which would have been impossible without Data Renaissance. Consider lending as an instance which has innovated hitherto unthinkable credit underwriting strategies. Traditionally credit bureau and bank statement or pay slips had been standard documents used for underwriting of unsecured retail credit. But in today’s time, fintech companies have come up with their lending apps which can capture lot more data through the device of prospective borrowers, commonly referred as alternate data. Using such data lending firms are able to underwrite the customer segments which were erstwhile mostly unserved or underserved for unsecured retail credit. Whenever a prospective customer lands on a website, every user action point (called click stream data) gets tracked easily. And this data can reveal a lot regarding the user characteristics.
With such a voluminous data, businesses struggle on how to use the data, where to use the data, when to use the data and most importantly where to store the data. A survey done by Capital One and published in Forbes in August 2022 revealed that 76% of the organizations found it difficult to understand their data. The traditional relational database management systems fail to tackle this new type of data. The world is fast moving towards cloud storage, which is not only more efficient in handling such data types but also most of the times cost effective.
The other challenge that organizations face is how to democratize the data. It is important that everyone in the organization gets access to the data so that they can contribute to the overall growth. But at the same time, it is important to ensure that people have the relevant skillset to draw correct inferences from the data – else it would again become “Lies, Damned Lies and Statistics”. It is very common of people drawing wrong inferences leading to wrong strategic business decisions. Hence, it is important for the management to hire the right skillset people for the task and channelize all data requests of the entire organization via this core team.
Last but not the least, an extremely important aspect is of data security and privacy. Companies are today dealing with extremely sensitive personally identifiable data. It is imperative to collect all such data with proper consent. And then store the data with utmost governance on data security. Time and again, data leakages have happened even from highly reputed firms. Not only such a data leak causes business loss for the company, more importantly it is a failure of the moral and ethical responsibility of the organization if data gets compromised. The government is also coming out quite strict on data security. The draft of “Digital Personal Data Protection Bill 2022” is out for public feedback.
To conclude, in this new era of Data Renaissance organizations need to build an effective system of data collection, storage and analysis and invest on relevant resources for execution. And to achieve such goals there needs to be a very good synergy across different teams – technology, data engineering, data science, product etc. This usually does not come at a low cost. However, focused investment results in many-fold returns over time –it is important to maintain patience since data engines take time to learn and evolve.
Soumyajit Ghosh, Seasoned Data Scientist and Chief Business Officer, IndiaLends