# FORECAST_LINEAR function

• 0 replies
• 88 views

Userlevel 3 • Carole • Community Manager
• 2 replies

## Description

Forecasts a variable by fitting a straight line to the data. It is a model that relates a response variable Y to an input variable x by the equation

`Y=a+bx`

The quantities a (slope) and b (intercept) are parameters of the regression model. The fitting is done using the ordinary least squares method.

## Syntax

``FORECAST_LINEAR(Source_metric [, Ranking_dimension, Alternate_metric])``
• `Source_metric` is the data source on which the linear regression is computed, and must be a metric with data points as an expression of Integer or number type.  This metric must include the same dimension that is used in the `Ranking_Dimension` parameter.
• `Ranking_Dimension` is the dimension by which the regression is computed. If left undefined, the ranking dimension defaults to a  Calendar Dimension from `Source_metric`. If `Source_metric` is defined on multiple Calendar Dimensions, you must define which dimension to use.  If you want to use a dimension outside of time, you must define it here.
• `Alternate_metric` is an optional parameter that allows you to forecast `Source_metric` based on another metric. `Alternate_metric` must be another metric with the exact same dimensionality as `Source_metric`

The last 2 parameters, `Ranking_Dimension` and `Alternate_metric` are optional.

## Return type

All the time series cells will be filled by an integer or decimal value starting from the first empty cell until the last value of the `Ranking_dimension` (as it is sorted).

Note: If the regression is against an `Alternate_metric`, the forecast will only compute a value on non-empty X values.

## How the slope is calculated across dimensions

The quantities a (slope) and b (intercept) are parameters of the regression model. The fitting is done using the ordinary least squares method. The slope a and intercept b are computed on all the dimensions that are not designed as the `Ranking_Dimension`. This calculation will be performed on all items within the dimensions outside of the `Ranking_Dimension`.

It means that when performing a linear regression on time on a metric based on Month and Country, the resulting metric will have a different equation on all country items.

For example, let's say you have a metric with Month, Country, and Product, and you use Month as the `Ranking_Dimension` the linear regression would be performed for each item in the Country and Product dimensions.

Note: If `Source_Metric` has empty values, they won't be taken into account to compute a and b.

Note: If the `Source_Metric` has only one data point, the linear regression will return a constant function equal to the only available data point

## Examples

Metric `Sales` defined on 1 Dimension

Month Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sales 1 3 5 4 9 13 16 17

Forecasted Sales =`FORECAST_LINEAR('Sales','Month’)`

Metric `Sales` defined on 2 Dimensions

Month Nov Dec Jan Feb Mar Apr May Jun Nov Dec Jan Feb Mar Apr May Jun
Country FR FR FR FR FR FR FR FR US US US US US US US US
Sales 1 3 5 4 9 13 16 17 1 -1 -3 -5 -4 -9 -13 -16

Forecasted Sales =`FORECAST_LINEAR('Sales','Month’)` aggregated on Countries

Forecasted Sales =`FORECAST_LINEAR('Sales','Month’)` not aggregated on Countries

Metric `Cost of sales` against Metric `Sales`.

Sales 1 3 5 4 9 13 16 17 20 4 5 6 7 8 9 10
Cost of sales -2.5 -1.5 -0.5 -1 1.5 3.5 5 5.5 7

Forecasted Salary =`FORECAST_LINEAR('Cost of sales actuals’,'Month','Sales per month')`

https://www.sciencedirect.com/topics/mathematics/simple-linear-regression