Let us look into a fascinating day of a data analyst – a persona of a report analyst can start with analyzing the data requirements drawn or received from a business owner. The report owner can provide the reporting needs through a “Just-in-time” isolated conversation to the report analyst.
An empirical scenario in publishing a report or an artificial intelligence model
First, more often than not, the turnaround time to analyze reporting requirements gets extended when the analyst takes time to examine the key performance indicators, the business logic, and the semantic meaning. The analyst will also have to inspect the systems and databases to seek the correct coverage of data while it often can start with an unplanned phone call to the system owners.
Next, with all the un-planned collaboration and lack of standard processes and tools, the time taken to provision a report increases to a couple of weeks. The scenario is the same for an artificial intelligence analyst who models an insight that augments a customer journey like a mortgage.
An ideal scenario in managing, engineering, and governing data
As an organization puts a formal process of eliciting and managing data requirements, it becomes easier to discover reporting needs by formalizing the activities of collaboration with the stakeholders, researching the business meaning, or experimenting as planned activities. Further, finding the business domains for the scope of data can be an activity that can be followed by engaging with the data stewards to assist with the analysis.
Why data-planning can be crucial for efficient data consumption?
Most often, system engineers during the implementation of systems may not define structures like tables with simple names that resonate with the contents of the structure. In the absence of simple descriptions and semantic names, the reporting analyst stays confused about the provisioning source.
A catalog is changing the routine the sources can be discovered, enabled by crowd-sourced definitions by data providers or consumers, right into the collaboration tool. This can be an ongoing activity that builds a culture of sharing trust in using data obtained or created by someone else in an organization.
Intelligent search is a crucial feature in data provisioning
During data analysis, the personnel can use an intelligent search capability through a catalog that can take in a semantic name and show the right source to provision. This increases the availability of intelligence about data that further brings context to the analysis.
Thus, democratizing data across an organization needs executive sponsorship from the source and consumption landscape owners, as it requires extensive training of personnel, an awareness around curating business metadata and usage of tools for self-service data sourcing needs.
Data democratization across an organization can also drive the enablement of integrated and interoperable data across divisions thus breaking silos. Moreover, native processes can be digitized with ease as data becomes discoverable and available to use.
Moreover, given the vast replication of the same data across the sources, having the means to glance at the profile of data to be sourced provides intelligence into sourcing the right source. Also, a data quality profile helps the analyst sense the extent of cleaning required on bad information.
However, across the multitude of native and digital channels, harmonizing data quality rules for commonly used data will bring consistency in sourcing correct data. Showing the quality of data as it is discoverable enhances the trust in using it for an artificial intelligence model, or in a new digital application.
A cultural shift in data democratization can improve report and analytics outcomes
Data democratization can be a well-desired culture change in an organization and is based on a concept that enables easy access to data to anyone. The ease of availability and access to data enables direct and in-direct data monetization thus improving revenue streams. This concept can be realized through an internal marketplace as a tool in conjunction with a catalog.
A marketplace cannot give away free access to all data while there can be risk-based controls that need to be actively managed. Some of these controls are data privacy, security, authentication, encryption, entitlements, user access management, device management, and data rights management.
The second aspect where much time is taken to churn an insight from the model is having to bring the required data into storage like a warehouse using integrated data, from where it can be modeled further to a report or insight. This scope of activities popularly known as data engineering is now a formalized science in the consumption landscapes. Certain integration services on data engineering tool-sets make it possible to translate a searchable data element to being made available in the storage of choice like a data lake, or a feature store.
Data Protection on the heels of data democratization enables responsible data consumption
Data Protection comes into play in most cases of data engineering as well, when the coverage of data includes the personal data of customers. Certain questions can be thought by a data engineer on “who is authorized to process this data?”, “is there consent required to process?”, “will data need to be anonymized?”.
In addition, on looking into a catalog, information including the privacy classification, data owner, data and privacy stewards, and associated entitlements can be absorbed in engineering analysis. A need for a workflow tool cannot be less stressed, which can instantiate a pull-based service for data authorization and enriching information by stewards.
Furthermore, a toll gate is preferred during this phase of data engineering if all these activities are governed manually through push-based-mechanism. However, this toll-gate can be well organized by a consuming data steward for that business data domain who can route requests based on the needs like to enrich data quality, masking, or checking consents. Moreover, the data steward as a domain expert can help ascertain the right data coverage to be picked up for a reporting attribute or model feature.
Governing data by formalized process and people collaboration through tool-sets
Furthermore, to complete a data journey, personnel, including data platform, analysts, data engineers, data analysts, business data stewards, privacy stewards, have to actively collaborate. These can be direct interaction with stakeholders and relies on their endorsement based on experiences, expertise, and judgment on a data governance tool-set.
Data Governance guides personnel in better managing data. The guidance is ensured through policy and ownership of data in an organization. The emphasis is on formalizing the Data Management function along with the collaboration.
About the author: Tejasvi Addagada is a data strategist solving customer-centric challenges through digital technology, data engineering, and data governance to monetize data for new revenue streams