7 questions to ask before bringing new data into Pigment

  • 3 October 2023
  • 2 replies
  • 239 views

Userlevel 2
Badge +1

Whether you’re implementing a new use case, connecting a new data source, or simply bringing new data into your Pigment environment, it’s important to carefully consider the state and lifecycle of that data. This will help you identify what impact that data might have on your models and your existing processes.

 

 

If you’re implementing Pigment for the first time, your Solution Architect can help you ask these questions. Even if you’re bringing in this data on your own, though, here are 7 questions to ask before you start:

 

1. What is my organization’s data strategy?

Most businesses have some sort of data strategy, whether that’s a list of loose best practices or a rigorously mapped and enforced process. This should include guidelines for integrations, source of truth, and data cleansing protocols. 

If you don’t have one, or your organization’s strategy doesn’t cover all of these things, that’s okay. You have the opportunity to start the conversation and guide the process. But if you do have one, it should guide your decision-making throughout the rest of these questions.

 

2. Where did this data come from (and is it from the right place)?

Often, requests for new data in Pigment will come from teams used to interfacing with a certain solution. This could be Salesforce, or Netsuite, or any number of platforms. However, these platforms aren’t always the source of truth. Take a close look at where the data has come from before the system in question – is that possibly a better source for your data sync? Is there a data lake, warehouse or other official source of truth in place that you should connect to?

 

 

3. What is the full lifecycle of this data?

Of course, we’re not just looking at the data source to make sure it’s the right one. We also want to map the full flow of the data. Even if we’ve found the ideal source of truth, understanding where the data has originated and where it might have passed through will help us understand how fit for purpose it is (more on accuracy in a moment).

It’s also important to think about the post-Pigment journey this data will take. Will you be feeding it into a BI tool? Pushing it through the Google Sheets connector? Back into your data lake? It’s important to define that step as well so that you can determine the best way to push the data, and anything that needs to happen within Pigment to facilitate that.

 

4. How accurate and complete is this data?

Next, it’s important to understand how accurate and complete the dataset is. Inaccurate or incomplete data could have a massive impact on your models, which in turn can compromise confidence in the data or lead to decision-making based on false assumptions.

And the assessment doesn’t end with the sample you’ve pulled. It’s also important to understand if your data is consistent in its level of accuracy and completeness. Again, not accurately measuring this could mean you’re blindsided by inaccurate outputs further down the line.

If your data is inaccurate or incomplete, you may wish to employ some data cleansing methods pre-Pigment. You can also build Boards in Pigment that can surface anomalous or incomplete data, but these will only be as helpful as the resource you have available to resolve the flagged items.

 

5. How big is the dataset?

Pigment can handle vast amounts of data, but that doesn’t mean you should just pull as much data as possible. The best practice for model efficiency is to bring in only the data you need and on the required granularity (e.g. on a month aggregation). Even then, though, it’s important to consider the size of the dataset. The recommended model structure for a small pool (e.g. 200 cells) will be vastly different from that for a set with millions of cells. 

Here are some questions to help you assess the size of your dataset:

  • What is the number of cells? (Rows and/or columns can help, but cells will be the most helpful figure.)
  • Do I need calculations or operations on this data to run sequentially, concurrently, or both?
  • Are there any pre-calculated data points I can remove or include to help with efficiency?
  • How much is this data set expected to grow over time? Will it stay static, grow linearly, or potentially double in size every month?

 

6. How & when will I bring this data into Pigment?

There are many ways to bring data into Pigment, ranging from fully manual to fully automated (via native connectors and our API). The important thing is to understand how the push and pull of data might impact your processes, and therefore when it should occur. This includes any calculations you’re running, anyone who may be using the platform, and other integrations that may be active. For example, if you import a new set of figures into your model in the middle of the day, will it negatively impact your modelers? Or will it help them with accurate reporting for later on in the day? Pigment’s native connectors allow for scheduling, so you can ensure your data enters Pigment at a cadence that suits your working environment.

 

7. How will I test the impact of this data in a controlled way?

As discussed above, inaccurate or incomplete data could have a negative impact on your metrics and insights in Pigment. However, all data will have some kind of impact – otherwise, why are you bringing it in to begin with? So it’s important to test in a controlled and limited way. You may choose to leverage sandbox environments in your other platforms, test applications in Pigment, etc. This will help you identify and analyze the impact without creating hard-to-reverse changes in your live environment – the environment you and your leaders are using to make decisions and drive the business forward.

 

Get started!

Once you know how big your dataset is, we’d love for you to submit a support ticket to let us know! We may have best practices to share based on the size of your dataset, or there may even be additional workspace configuration we can do to help with efficiency


2 replies

Badge +1

Re:

5. How big is the dataset?

Pigment can handle vast amounts of data, but that doesn’t mean you should just pull as much data as possible. The best practice for model efficiency is to bring in only the data you need.

→ I suggest to modify the article: “(..) bring in only the data you need and on the required granularity.”.
As month is a regular planning granularity it might be sufficient to aggregate daily transactions on the month which makes a huge difference in volume.

Userlevel 2
Badge +1

@David Mandl You're absolutely right that the correct level of granularity will be different for each model. Glad to see you're already thinking about this for your own use case! :) 
I have added your suggestion to the article, thanks for your feedback! 

Reply