How to prepare your customer data for composability

A how to guide by Brooklyn Data Co.

The CDP (customer data platform) is a critical component of any MarTech stack. A CDP’s main function is to aggregate, unify, and centralize customer data from various sources into a single, comprehensive view. But traditional packaged CDPs have challenges with connecting data from all platforms, and many organizations are making the move to composable CDPs. What should you know about preparing your CDP for composability?

Understanding the difference between packaged vs composable CDPs

Traditional CDPs were created to help marketers collate the myriads of ways that they interact with customers. These packaged CDPs were built specifically for Martech applications, making them rigid with regards to what information can be accessed.

Unfortunately, this tends to lead to siloed customer data that can’t be integrated into the organization’s overall enterprise data strategy. This includes customer data that is outside the organization’s privacy protections.

On the other hand, composable CDPs have a modular and flexible architecture. This allows you to aggregate and unify data from multiple sources, not just predefined marketing applications. Composable CDPs pave the way to create a customized data ecosystem that meets the specific needs of an organization, provide a single unified view of every customer, and perform real-time data processing and analysis while building their tools around the data warehouse, keeping the data warehouse as the central source of truth.

Ask Simon Data: What’s a connected CDP?

One of the biggest problems created by packaged CDPs is that customer data is replicated multiple times as it moves across marketing platforms. This movement often falls outside the governance and security standards set by the CDW, and also creates inefficiencies for both tech and marketing teams.

Simon Data’s Connected CDP is a hybrid approach that offers all the benefits of Composable CDPs within a packaged interface. With Simon Data, marketers can access customer data directly within the cloud data warehouse, build complex customer segments and easily activate them in end channels – all while the resulting data is transmitted back on a continuous loop between marketing channels and the data warehouse.

We call this a “Connected CDP” because companies can capitalize on the benefits of composability – placing the CDW at the heart of their marketing and data infrastructure and maintaining data governance and security standards. Marketers benefit because they can use workflows and features that make it easy to access and deploy this data in marketing experiences.

Before you get started on the switch to a composable or connected CDP, it’s important to perform a readiness check on your organization’s data maturity level. There is some important pre-work that needs to be done before you can take advantage of the benefits of these deployments, you need to perform a readiness check on your organization’s composability maturity level.

Readiness check! Data maturity level evaluation

Step 1: Before you start – Ensure data accuracy

Your data will only be as good as the time you’ve spent cleaning it. So it’s important to plan what data will be used, define data quality metrics, and choose the right tools for the job.

Identify high-quality source datasets. Choose a dataset to serve as a reference point, a “golden record”. For example, when considering customer data you want to choose the dataset with customer information that has been maintained and updated over time. This may be Shopify customer data or CRM data.

Understand your data. You can’t expect to implement checks and balances for data accuracy if you don’t understand your existing data sets. Going through this exercise will help you identify areas of concern about your data quality and accuracy. 

Here are 5 questions to ask about your current datasets:

  1. What is the source system? (e.g., Shopify, Salesforce, etc.)
  2. How is the data collected? (e.g., web events, user-provided, employee-entered, etc.)
  3. How often is the data collected?
  4. How is the data updated? For example, does new data overwrite existing data? Does new data create a new row, and old data is flagged as non-valid?
  5. How is this data currently leveraged?

Define quality metrics. It is important to get the entire organization on the same page about what data quality means. This will be enforced by creating clear metrics. Metrics to consider are completeness, consistency, accuracy, validity, timeliness, and relevance.

Choose the right tools for the job. There are tools available to help you with the process of data validation, cleansing, profiling, and auditing. For example:

  • Ingestion tools: Fivetran, Meltano
  • Storage and querying: Snowflake
  • Transformation: dbt, coalesce


Step 2: Maintain data accuracy

A successful composable & connected CDP deployment requires that the data stored in the data warehouse is accurate. But how can you ensure data accuracy once data is in the CDP?   

Identify which are the highest quality datasets. Identifying a reference table with good data quality, for example the “Highest Quality” Customer Dataset could be CRM Data or Customer data in an eCommerce platform.

Data cleansing. Once you have determined which dataset will be the golden record, use it as a reference. You can leverage tools like dbt and coalesce to define your data model. In this step, you’ll perform data standardization, data normalization, and data deduplication

Enrich the golden record data set. Once the golden record is identified, enrich its data set with other data sets like registration data and sales transactional data. The data team will expose this data set to the marketing team, for QA & UAT to ensure that the golden record of customer data is sound. 

Be sure that the data team works with the marketing team to be sure deduping was done properly, householding logic is sound, and ensure that it will be possible to perform granular segmentation with the data.

Step 3: Maintain compliance

Once the highest quality dataset has been identified and enabled, there will probably be data governance issues that need to be resolved

PII data handling. You will need to identify PII standards for the data set. You’ll need to answer questions such as: 

  • How is PII data collected?
  • What data can be collected and stored? 
  • Who can access PII data?

Let’s look at an example. Users provide PII data such as name and email address via a webform. This data is stored in the CRM system, and then is loaded into the CDP. Data analysts with access to the CDP should not have access to PII data due to security and regulatory compliance reasons. This means that the PII data needs to be filtered or masked in some way.

Data access. It is important to know who has access to PII data, and expose data based on roles to keep in PII compliance.

To continue the above example, the data analysts would all be assigned a role within the CDP. This role governs their access to PII. This means that access is not an attribute that can be assigned to individual users. This makes it easier to control access to sensitive data. 

If a user is required to access PII data, they would be assigned a different role that enables them to see names and email addresses.

CCPRA, GDPR compliance.   As data privacy legislation continues to grow, it’s important to make sure your data warehouse is set up properly for compliance, this includes an easy way to download customer data and remove customer data if a customer requests it.

Use the best platform for composable CDP

With a little preparation, it is possible to take advantage of the flexibility and modularity of a composable CDP. Connected segmentation from Simon Data transforms Snowflake into a customer segmentation engine for your marketing team, all without data leaving your cloud warehouse.

By natively connecting to Snowflake, your Data and Engineering teams enjoy the security, governance, and data-sharing benefits of their cloud data warehouse. And, with a no-code, intuitive UI, marketers also get access to Snowflake data without writing a single line of code.

Want to dive deeper? Join Simon Data & Brooklyn Data Company on September 28 at 11 am PT / 2 pm ET for a webinar where we’ll explain the tools and methods we mentioned in this blog post.

Case Studies

Request a Demo