Overcoming Data Collection Lag
Collecting data for SensibleAI Forecast projects is not necessarily a process that is 1) Instant or 2) Ensures that all your data will be collected with the same timeliness as the others. Collection Lag is defined as the time duration between this given moment in time and when you receive the corresponding data for that moment in time. This article will help understand and resolve data collection lag problems that may occur.
Collection Lag
Time series data is most accurate when the context of when data was collected is implemented into your project. After completing evaluation on the sourcing of data and quality, you may find that a particular source of data (such as your target dimension) is lagged.
Take, for example, the image below. The dates between 10/9/2021 and 10/17/2021 are laid out on a daily basis. The value column next to it has values through the 15th, but no data for the 16th or 17th. In this situation, we would say that there are two days of collection lag.
There are three cases of data lag that SensibleAI Forecast will autonomously handle on a target-by-target basis. First, let’s understand Target Collection Lag and Configured Collection Lag:
-
Target Collection Lag: The period of time between a specific target and the given moment of time when the data is collected.
-
Configured Collection Lag: This is a designated period in time where the “majority” of the targets have their data collected.
-
Target Collection Lag > Configured Collection Lag
: A situation where the Target Collection Lag is greater than the Configured Collection Lag and the two periods of time will be filled in by interpolation techniques. Basically, SensibleAI Forecast will just guess what those values would be. -
Target Collection Lag < Configured Collection Lag
: In a situation where the Target Collection Lag is less than the Configured Collection Lag and the extra values that are ahead of the Configured Collection Lag will be removed. Basically, SensibleAI Forecast will throw out these values to stay consistent. -
Target Collection Lag = Configured Collection Lag
: This means that the Target Collection Lag exactly matches the Configured Collection Lag. There is no special treatment necessary.
Resolving and Setting the Target Collection Lag
Setting your configured collection lag can be a tedious process. What happens when a configured collection lag is too high or too low?
-
Configured Collection Lag too Large: When the configured collection lag is large, you can lose out on accuracy. The reason for this drop off is because you are lagging the features which you are implementing into your model. The larger the lags, the less impactful features will be, and less accurate your modeling will be overall.
-
Configured Collection Lag too Small: A smaller configured collection lag is great for accuracy, if you limit imputation. Lowering the effective forecast time can allow features which are more relevant or timely to be used in a model, increasing their influence. However, if the trade-off for this is a lot of imputation to fill missing values in your dataset, you will end up losing accuracy as a result.
As with any SensibleAI Forecast project, the quality of the data you can put into the model directly relates to the quality of the insights generated. Generally, it is best to try to use data that is consistently filled and minimizes target lags, but in almost every use-case there will be a lag in the collection of the data and the time it can be deployed into a SensibleAI Forecast model.