Forecasting timeseries is a common problem in data science/machine learning. It asks, given a set of observations of the past, what the future will look like.
Some real world applications of timeseries forecasting include:
This API implements the Prophet open-source forecasting library for timeseries developed by Facebook's data science team. It's an additive regression model that combines a piece-wise linear component with yearly and monthly seasonal components, as well as a user-provided list of holidays. It works particularly well on timeseries that are seasonal, and is robust to missing data/outliers.
Prophet is already available as a Python and R package, but with Metered, you can access a production-ready version of it via a simple API call.
The benefits of Metered over running the packages yourself include:
The code snippets in this guide are intended to be run interactively as you follow along. The snippets are also interrelated, so outputs are automatically propagated from snippet to snippet, as needed. This is an experimental format that we're eager to receive feedback on. Please send us a note with any thoughts you have.
For ease of use, the API keys in this guide are real keys. You can copy-and-paste them right into your code. The keys have a dynamic rate limit. To access a higher rate limit, create an account and provider your credit card information.
Below, we show how to accomplish a common forecasting flow using the Metered Prophet API. Recall that GraphQL allows you to create services by defining types and fields on those types, then providing functions for each field on each type. The Metered Prophet API is running a GraphQL service that looks like this:
All GraphQL API queries are made on a single endpoint, which only accepts POST requests:
The first step is to load your raw data into memory.
Your raw data should be in a csv file format with two columns:
y, containing the date and numeric value respectively. The
ds column should be
YYYY-MM-DD for a date, or
YYYY-MM-DD HH:MM:SS for a timestamp.
As our case study in this guide, we would like to forecast the number of Wikimedia pageviews for observability purposes -- if the number of pageviews is ever much lower than expected, this might indicate a softare outage.
We were able to collect, through the Wikimedia API, three years worth of Wikimedia data at a daily granularity.
This is what the first couple rows of the csv looks like:
to load the hosted csv file into memory.
As you can see, this metric has a weekly seasonality that makes Prophet a good fit.
The above process loads data for a single timeseries. In practice, you'll often want to forecast multiple timeseries in parallel (Even a single metric/timeseries often needs to be forecasted multiple times , for instance for different cities/countries). For instructions on how to do that, you can refer to the guide for the Generic Batch Job API. The instructions there will import the raw data from a csv file and perform some aggregation/cleaning, then save it in a sqlite database.
The next step is to use the trained model to forecast the data into the future. This can be done by querying
parameter, which is the number of units into the future you would like to forecast, and a
parameter, which denotes the unit. It returns a
The final step is to retrieve summary statistics and the forecasted data, which can be done starting at the
fields on the
type. Recall that GraphQL returns only the leaves of the graph, and you have to specify the entire path down to any particular leaf that you want.
What happens if you run Prophet and you don't like the results?
You can easily tune the model by changing up parameters like the number of changepoints in the piece-wise linear model, which determines how flexible the curve is. Refer to
for a full reference on the parameters of a Prophet model.
When the observed number of Wikimedia pageviews is anomalously low compared to what is forecasted, we would like to send an alert to our on-call engineers telling them that hey, you should check things out, there might be an outage.
To do this, we can use the confidence intervals on the forecast (
yHatUpper) to create a set of thresholds. When the observed timeseries crosses the thresholds in either direction, we alert.