How Pigment handles sparsity in formulas

  • 14 April 2022
  • 0 replies
  • 176 views

Userlevel 2

There are times when your data might have sparsity or no data for certain items.  This article covers how Pigment formulas within metrics handle these values, which are referred to as NULL or Blank values.  It will be broken down by the different operators, or functions, and also by the different dimensionality of the metrics.

 

Table of Contents 

 

 

What is sparsity and why Pigment’s engine is sparse?

 

A sparse engine means that empty values are not filled by default with a 0 or a FALSE: your datasets might include blanks. Pigment’s engine is sparse, notably because it enables the computations to be much faster than in a non sparse engine. 

The information below is related to the number data type. It is important to know there is also sparsity in booleans. A boolean that has not been used, or is sparse, will be considered blank.  

 

It however creates some challenges: how do simple operators behave in a sparse engine? Mathematics didn’t define such behaviors and Pigment had to build its own heuristic.

The goal of this article is thus to explain the behavior of simple operators with sparse datasets, by taking into account that it won’t behave the same way if the datasets on both sides of the operators have the same dimensionality or not. There are two main different types of behavior: one for addition, subtraction and OR operators, and one for multiplication, division and AND operators. 

Formulas across Metrics with the same Dimensions

 

This section will discuss the behavior when the dimensionality for all the metrics is the same. In the examples below all three metrics have the dimensions of Product and Country.

  • Addition/subtraction/OR:

    • The resulted metric will have the same dimensions as the input metrics. 
    • In addition, subtraction, and OR formulas, when a blank value is combined with a numeric value, the result will always be the numeric value.  
    • In the example below, blank cells are highlighted using conditional formatting 
  • Multiplication/Division/AND:

    • The resulted metric will have the same dimensions as the input metrics. 
    • In multiplication, division, and AND formulas when a blank value is combined with a numeric value, the result will always be the blank result.
    • In the example below, blank cells are highlighted using conditional formatting 

Formulas with at least one common dimension

 

These examples will show how blank values are handled when there is a common dimension in the metrics but the dimensionality is not a complete match.  In the examples below, the first metric has both Country and Product, while the second metric only has Product.  Finally, the resulting metric has both Country and Product.

Note: This use case includes an operation between a metric and a scalar. A scalar is a number without any associated dimensionality.  It can be written within a formula or referenced by a metric without any dimensionality applied to it.  

 

  • Addition/subtraction/OR:

    • Let’s consider that the Metrics in the example below both share Product as a common dimension 
    • The resulted metric will have all the dimensions of the initial metric
    • The blank values of the metric with the most dimensions will remain blank after the formula is executed. If an item is blank on the dimensions of that metric, the resulted item after the operation will be blank on the same dimensions.  Look at the example below, all the blank values from Metric A are still blank in Metric B.

Note:  If this is not the desired behavior for your use case, the use of modifiers, such as ADD, can align the dimensionality of the metrics.  In that case, you would get the behavior outlined in Formulas Across Metrics with the Same Dimensions

  • Multiplication/Division/AND:

    • The Metrics in the example below both share Product as a common dimension 
    • The resulted metric will have all the dimensions of the initial metric
    • When a numeric value is multiplied by a blank value, a blank value will be the result.  In the example below, Metric B, Product 2 is blank, in Metric C all values associated with Item 2 are also blank.  

Formula across Metrics with no common dimensions

  • Addition/subtraction/OR:

    • The resulted metric will have all the dimensions of both metrics
    • If a given item is blank on the dimensions of a metric, the resulted item after the operation will be blank on the same Item across all dimensions.  In the example below, Metric A, Product 2 contains a blank value.  In the resulting Metric, all of item 2 is blank.  The same for Metric B, France, the item is blank and all resulting France items are also blank.

 

  • Multiplication/Division/AND:

    • The resulted metric will have all the dimensions of both metrics
    • The sparsity persists after the operation: if a given item is blank on the dimensions of a metric, the resulted item after the operation will be blank on the same dimensions  If an item is multiplied or divided by a blank value, the result will be a blank value. 

Note:  If this is not the desired behavior for your use case, the use of modifiers, such as ADD, can align the dimensionality of the metrics.  In that case, you would get the behavior outlined in Formulas Across Metrics with the Same Dimensions

 


0 replies

Be the first to reply!

Reply