Building Towards a Next-Generation Data Pipeline: A Conversation with Fidelity’s Mihir Shah
In a recent interview by CIO, Mihir Shah, Head of Data Architecture for Fidelity Investments, and Jason Davis, CEO and Founder of Simon Data, sat down to discuss the evolution of Fidelity’s data strategy. Central to this discussion is a vision known as “the next-generation data pipeline.” Foundational expanding the scope and impact of data beyond traditional lines of engineering ownership and reporting applications. Fidelity’s next generation of data applications involves fully unlocking the power of data across the enterprise. This starts with transforming traditional ownership outside of core data and engineering functions to enable business stakeholders including marketing to drive core processes forward with measurable results.
In moving towards this vision, Fidelity embarked on a journey consisting of three core steps. The first step was centralizing their data into a scalable cloud data environment. Next, Fidelity aimed to democratize data to permit access across the organization. Finally, they sought to enable business functions with the right applications to transform processes to be fully data-driven.
While Fidelity’s steps are simple, the idea is quite ambitious and requires complex execution. Yet, the difficulty level did not deter the team from the goal of transforming the data pipeline, making data available and utilized by everyone.
Step One: Data Centralization
Historically, Fidelity’s data existed in disparate systems and databases that were developed over many years as their capabilities matured. The business had a myriad of data across the organization but no centralized way of maintaining, cataloging, and joining disparate sources.
Before investing in a cloud data warehouse, data lived in silos across various departments – and Shah clearly understood the limitations of their fragmented architecture. However, he had a strong vision of how centralized architecture would solve this problem and drive business outcomes.
Step Two: Data Democratization
With a centralized data model, the next obstacle towards Fidelity’s core vision was one of access. While accessibility starts with data in a centralized place, it also involves active data management and providing views into data that are purposeful and domain-specific.
To illustrate this point, let’s kick-off with an abbreviated quote from Mihir demonstrating the value of centralized data:
“…All data sets are in one place and are managed by the data owners. There is a full catalog, everybody knows where the data is <and> it’s easily accessible. And then there are products <that> are working on leveraging the datasets..”
Jason adds on to this point by expressing the value in combining datasets :
“ <The> data becomes richer as you pull in third-party, supplier, and partner sources. 15 years ago you had an on-prem data warehouse with a star schema, and you’d never think about any transformations outside… We’ve fully outgrown that model today. When you look at the richness of data, there’s a notion that we call the last mile data transformations…you take two datasets and you can have a third. When you start to have access to a whole new world of data..<then you can> really look at the complexity of the applications and the opportunities that now exist from a marketing perspective.”
The idea here is – while there is value in combining datasets to get a third and more encompassing dataset, there is also value in getting these datasets into the hands of the data owners creates opportunities to drive desired outcomes.
Step Three: Data Enablement
Ultimately, the power of data is how you use it – Shah and Davis go on to discuss “the next-generation data pipeline” in more detail. The first tenet of data usage starts with the level of effort required to build, modify, and iterate data pipes that drive business processes:
When data operations become easier, capabilities can transcend to a more strategic level:
“I think so many of the existing use cases for either are getting migrated off of on-prem data warehouses or data marts…And those are important use cases…<that> can certainly be enabled with the cloud and centralization…But really, the vision we’re driving towards is this next tranche of how data is used outside of just the analytics, modeling, and insights.”
Circling back to our opening clip of Shah, he discusses wanting to target every aspect of the business to make Fidelity more data-driven as a whole. The power of an enhanced data pipeline is touching every aspect of the business.
“Whether it’s procurement or somebody who’s looking at expense reports for fraud. So, if you look around your enterprise, there are so many different processes that can just be simple… It may not be a complex model, just the availability of data. They can actually enhance… The job becomes much easier or better.”
While a successful data strategy starts with centralizing data from across an organization, fully affecting the strategy requires enabling teams outside of data & technology. As Fidelity looks forward, developing the people, processes, and systems required to affect this is critical. This “next-generation data pipeline” is about re-thinking how data is used & deployed through the organization – for not just reporting or insights purposes but to drive forward core business processes within marketing and beyond.