季节性、假日效应和额外回归

为假日和特殊事件建模

如果你想对假期或其他重复发生的事件进行建模，你必须为它们创建一个dataframe 。它有两列（holiday和 ds），并为每一个假期的发生建立一行。它必须包括过去（从历史数据上看）和未来（从预测的角度看）的所有节日的发生。如果它们在未来不会重复出现，Prophet将对它们进行建模，然后不将它们包括在预测中。

你也可以包含列lower_window和upper_window，它们将假期扩展到[lower_window, upper_window]日期周围的日子。例如，如果你想在圣诞节之外加入平安夜，你就会加入lower_window=-1,upper_window=0。如果你想在感恩节之外使用黑色星期五，你就会包含lower_window=0,upper_window=1。你也可以包含一列prior_scale来分别设置每个节日的先期规模，如下所述。

在这里，我们创建了一个dataframe ，其中包括佩顿-曼宁的所有季后赛出场日期。

# Rlibrary(dplyr)playoffs <- data_frame(  holiday = 'playoff',  ds = as.Date(c('2008-01-13', '2009-01-03', '2010-01-16',                 '2010-01-24', '2010-02-07', '2011-01-08',                 '2013-01-12', '2014-01-12', '2014-01-19',                 '2014-02-02', '2015-01-11', '2016-01-17',                 '2016-01-24', '2016-02-07')),  lower_window = 0,  upper_window = 1)superbowls <- data_frame(  holiday = 'superbowl',  ds = as.Date(c('2010-02-07', '2014-02-02', '2016-02-07')),  lower_window = 0,  upper_window = 1)holidays <- bind_rows(playoffs, superbowls)

# Pythonplayoffs = pd.DataFrame({  'holiday': 'playoff',  'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',                        '2010-01-24', '2010-02-07', '2011-01-08',                        '2013-01-12', '2014-01-12', '2014-01-19',                        '2014-02-02', '2015-01-11', '2016-01-17',                        '2016-01-24', '2016-02-07']),  'lower_window': 0,  'upper_window': 1,})superbowls = pd.DataFrame({  'holiday': 'superbowl',  'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),  'lower_window': 0,  'upper_window': 1,})holidays = pd.concat((playoffs, superbowls))

上面我们将超级碗的日子同时包含了季后赛和超级碗比赛。这意味着超级碗的效果将在季后赛效果的基础上有额外的加成。

一旦创建了这个表，假日效应就会被包含在预测中，通过用holidays参数传递它们。这里我们用快速开始中的Peyton Manning数据来做。

# Rm <- prophet(df, holidays = holidays)forecast <- predict(m, future)

# Pythonm = Prophet(holidays=holidays)forecast = m.fit(df).predict(future)

假日效应可以在 forecastdataframe中看到。

# Rforecast %>%  select(ds, playoff, superbowl) %>%  filter(abs(playoff + superbowl) > 0) %>%  tail(10)

# Pythonforecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][        ['ds', 'playoff', 'superbowl']][-10:]

Untitled

节日效应也会体现在成分图中，我们看到在季后赛出场前后的日子里，会出现一个高峰，其中超级碗的高峰特别大。

# Rprophet_plot_components(m, forecast)

# Pythonfig = m.plot_components(forecast)

png

可以使用plot_forecast_component函数(从Python中的fbprophet.plot中导入)绘制单个假期，比如plot_forecast_component(m, forecast, 'superbowl')只绘制超级碗部分。

内置的国家法定假日

你可以使用add_country_holidays方法(Python)或函数(R)来使用一个内置的特定国家假期集合。指定国家的名称，然后除了通过上面描述的holidays参数指定的任何假期之外，这个国家的主要假期也会被包含在内。

# Rm <- prophet(holidays = holidays)m <- add_country_holidays(m, country_name = 'US')m <- fit.prophet(m, df)

# Pythonm = Prophet(holidays=holidays)m.add_country_holidays(country_name='US')m.fit(df)

你可以通过查看模型的train_holiday_names(Python)或train.holiday.names(R)来查看哪些假期被包括在内。

# Rm$train.holiday.names [1] "playoff"                     "superbowl" [3] "New Year's Day"              "Martin Luther King, Jr. Day" [5] "Washington's Birthday"       "Memorial Day" [7] "Independence Day"            "Labor Day" [9] "Columbus Day"                "Veterans Day"[11] "Veterans Day (Observed)"     "Thanksgiving"[13] "Christmas Day"               "Independence Day (Observed)"[15] "Christmas Day (Observed)"    "New Year's Day (Observed)"

# Pythonm.train_holiday_names0                         playoff1                       superbowl2                  New Year's Day3     Martin Luther King, Jr. Day4           Washington's Birthday5                    Memorial Day6                Independence Day7                       Labor Day8                    Columbus Day9                    Veterans Day10                   Thanksgiving11                  Christmas Day12       Christmas Day (Observed)13        Veterans Day (Observed)14    Independence Day (Observed)15      New Year's Day (Observed)dtype: objec

每个国家的假期由Python中的holidays包提供。可用的国家列表，以及要使用的国家名称，可以在他们的网页上找到：https://github.com/dr-prodigy/python-holidays。除了这些国家之外，Prophet还包括这些国家的假期。巴西(BR)、印度尼西亚(ID)、印度(IN)、马来西亚(MY)、越南(VN)、泰国(TH)、菲律宾(PH)、巴基斯坦(PK)、孟加拉国(BD)、埃及(EG)、中国(CN)和俄罗斯(RU)、韩国(KR)、白俄罗斯(BY)和阿拉伯联合酋长国(AE)。

在Python中，大多数假期都是确定计算的，因此可用于任何日期范围；如果日期在该国支持的范围之外，就会发出警告。在R中，1995年到2044年的假期日期被计算出来，并以data-raw/generated_holidays.csv的形式存储在软件包中。如果需要更宽的日期范围，可以用这个脚本将该文件替换为不同的日期范围：https://github.com/facebook/prophet/blob/master/python/scripts/generate_holidays_file.py。

如上所述，国家层面的假期就会在组件图中显示出来。

# Rforecast <- predict(m, future)prophet_plot_components(m, forecast)

# Pythonforecast = m.predict(future)fig = m.plot_components(forecast)

png

季节性的傅立叶阶数

季节性是用部分傅里叶和来估计的。完整的细节见论文，维基百科上的这张图说明了部分傅里叶和如何近似于无周期信号。部分和中的项数（阶数）是决定季节性变化速度的参数。为了说明这一点，请考虑快速入门中的Peyton Manning数据。年季节性的默认傅里叶阶数是10，产生了这个拟合。

# Rm <- prophet(df)prophet:::plot_yearly(m)

# Pythonfrom fbprophet.plot import plot_yearlym = Prophet().fit(df)a = plot_yearly(m)

png

默认值通常是合适的，但当季节性需要适应较高频率的变化时，可以增加默认值，一般来说，季节性的平稳性较差。在实例化模型时，可以指定每个内置季节性的傅里叶阶数，这里将其增加到20。

# Rm <- prophet(df, yearly.seasonality = 20)prophet:::plot_yearly(m)

# Pythonfrom fbprophet.plot import plot_yearlym = Prophet(yearly_seasonality=20).fit(df)a = plot_yearly(m)

png

增加傅里叶项的数量，可以使季节性适应较快变化的周期，但也会导致过度拟合。N个傅里叶项对应2N个用于模拟周期的变量。

自定义季节性

如果时间序列长度超过两个周期，Prophet默认将适合每周和每年的季节性。它也会适应次日时间序列的日季节性。你可以使用 add_seasonality方法(Python)或函数(R)添加其他季节性因素(月度、季度、小时)。

这个函数的输入是一个名称，季节性的周期（以天为单位）和季节性的傅里叶阶数。作为参考，默认情况下，Prophet对每周季节性使用3的傅里叶阶，对每年季节性使用10的傅里叶阶。add_seasonality的一个可选输入是该季节性成分的先验尺度–这将在下面讨论。

举个例子，这里我们拟合快速开始中的佩顿-曼宁数据，但将周季节性替换为月季节性。这样月度季节性就会出现在成分图中。

# Rm <- prophet(weekly.seasonality=FALSE)m <- add_seasonality(m, name='monthly', period=30.5, fourier.order=5)m <- fit.prophet(m, df)forecast <- predict(m, future)prophet_plot_components(m, forecast)

# Pythonm = Prophet(weekly_seasonality=False)m.add_seasonality(name='monthly', period=30.5, fourier_order=5)forecast = m.fit(df).predict(future)fig = m.plot_components(forecast)

png

取决于其他因素的季节性因素

在某些情况下，季节性可能取决于其他因素，例如夏季的每周季节性模式与一年中其他时间不同，或者周末与平日的每日季节性模式不同。这些类型的季节性可以使用条件季节性来建模。

考虑快速开始中的佩顿-曼宁的例子。默认的周季节性假设周季节性的模式在全年都是一样的，但是我们希望周季节性的模式在on-season（每周日都有比赛）和off-season期间是不同的。我们可以使用条件季节性来构建单独的on-season和off-season周季节性。

首先，我们在dataframe 中添加一列boolean ，表示每个日期是在on-season还是 off-season。

# Ris_nfl_season <- function(ds) {  dates <- as.Date(ds)  month <- as.numeric(format(dates, '%m'))  return(month > 8 | month < 2)}df$on_season <- is_nfl_season(df$ds)df$off_season <- !is_nfl_season(df$ds)

# Pythondef is_nfl_season(ds):    date = pd.to_datetime(ds)    return (date.month > 8 or date.month < 2)df['on_season'] = df['ds'].apply(is_nfl_season)df['off_season'] = ~df['ds'].apply(is_nfl_season)

然后，我们禁用内置的每周季节性，用两个每周季节性代替，并将这些列指定为条件。这意味着季节性只适用于condition_name列为True的日期。我们还必须将该列添加到我们正在进行预测的futuredataframe 中。

# Rm <- prophet(weekly.seasonality=FALSE)m <- add_seasonality(m, name='weekly_on_season', period=7, fourier.order=3, condition.name='on_season')m <- add_seasonality(m, name='weekly_off_season', period=7, fourier.order=3, condition.name='off_season')m <- fit.prophet(m, df)future$on_season <- is_nfl_season(future$ds)future$off_season <- !is_nfl_season(future$ds)forecast <- predict(m, future)prophet_plot_components(m, forecast)

# Pythonm = Prophet(weekly_seasonality=False)m.add_seasonality(name='weekly_on_season', period=7, fourier_order=3, condition_name='on_season')m.add_seasonality(name='weekly_off_season', period=7, fourier_order=3, condition_name='off_season')future['on_season'] = future['ds'].apply(is_nfl_season)future['off_season'] = ~future['ds'].apply(is_nfl_season)forecast = m.fit(df).predict(future)fig = m.plot_components(forecast)

png

现在这两个季节性的因素都显示在上面的成分图中。我们可以看到，在on-season的时候，每周日都有比赛，周日和周一的比赛会有很大的增长，而在off-season则完全没有。

节假日和季节性的灵活性

如果你发现假期过度拟合，你可以使用参数holidays_prior_scale调整它们的前值比例来平滑它们。默认情况下，这个参数是10，它提供的正则化非常少。减少这个参数可以减弱假期的影响。

# Rm <- prophet(df, holidays = holidays, holidays.prior.scale = 0.05)forecast <- predict(m, future)forecast %>%  select(ds, playoff, superbowl) %>%  filter(abs(playoff + superbowl) > 0) %>%  tail(10)

# Pythonm = Prophet(holidays=holidays, holidays_prior_scale=0.05).fit(df)forecast = m.predict(future)forecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][    ['ds', 'playoff', 'superbowl']][-10:]

Untitled

与以前相比，假日效应的幅度有所降低，特别是对于观察次数最少的超级碗而言。有一个参数seasonality_prior_scale，它同样可以调整季节性模型对数据的适应程度。

通过在假期dataframe中加入一列prior_scale，可以为各个假期单独设置Prior scales。单个季节性的Prior scales可以作为add_seasonality的参数传递。例如，可以使用以下方法设置每周季节性的 prior scale。

# Rm <- prophet()m <- add_seasonality(  m, name='weekly', period=7, fourier.order=3, prior.scale=0.1)

# Pythonm = Prophet()m.add_seasonality(    name='weekly', period=7, fourier_order=3, prior_scale=0.1)

额外回归

可以使用add_regressor方法或函数将额外的回归器添加到模型的线性部分。拟合和预测dataframes中都需要有一列回归值。例如，我们可以在NFL赛季的周日添加一个附加效应。在成分图中，这个效应将显示在extra_regressors图中。

# Rnfl_sunday <- function(ds) {  dates <- as.Date(ds)  month <- as.numeric(format(dates, '%m'))  as.numeric((weekdays(dates) == "Sunday") & (month > 8 | month < 2))}df$nfl_sunday <- nfl_sunday(df$ds)m <- prophet()m <- add_regressor(m, 'nfl_sunday')m <- fit.prophet(m, df)future$nfl_sunday <- nfl_sunday(future$ds)forecast <- predict(m, future)prophet_plot_components(m, forecast)

# Pythondef nfl_sunday(ds):    date = pd.to_datetime(ds)    if date.weekday() == 6 and (date.month > 8 or date.month < 2):        return 1    else:        return 0df['nfl_sunday'] = df['ds'].apply(nfl_sunday)m = Prophet()m.add_regressor('nfl_sunday')m.fit(df)future['nfl_sunday'] = future['ds'].apply(nfl_sunday)forecast = m.predict(future)fig = m.plot_components(forecast)

png

NFL周日也可以通过创建一个过去和未来NFL周日的列表，使用上述 “假期”接口来处理。add_regressor函数为定义额外的线性回归器提供了一个更通用的接口，特别是不要求回归器是一个二进制指标。另一个时间序列也可以用作回归器，尽管它的未来值必须是已知的。

这个 jupyter notebook展示了一个在自行车使用量预测中使用天气因素作为额外回归因子的例子，并很好地说明了如何将其他时间序列作为额外回归因子。

The add_regressor function has optional arguments for specifying the prior scale (holiday prior scale is used by default) and whether or not the regressor is standardized - see the docstring with help(Prophet.add_regressor) in Python and ?add_regressor in R. Note that regressors must be added prior to model fitting. Prophet will also raise an error if the regressor is constant throughout the history, since there is nothing to fit from it.

add_regressor函数有可选的参数，用于指定 prior scale(默认使用假日先验尺度)和回归器是否标准化–参见Python中的help(Prophet.add_regressor) 和R中的?add_regressor`的。注意回归器必须在模型拟合前添加。如果回归因子在整个历史中是恒定的，Prophet也会提出一个错误，因为没有什么可以拟合。

额外的回归因子必须是历史和未来日期的已知值。因此，它必须是已知未来值的东西(如nfl_sunday)，或者是在其他地方单独预测过的东西。上面链接的jupyter notebook中使用的天气回归器就是一个很好的例子，它是一个额外的回归器，它有可以用于未来值的预测。也可以用另一个已经用时间序列模型预测过的时间序列作为回归器，比如Prophet。例如，如果将r(t)作为y(t)的回归器，则可以用Prophet预测r(t)，然后在预测y(t)时将该预测值作为未来值插入。围绕这个方法需要注意的是。除非r(t)比y(t)更容易预测，否则这种方法可能不会有用。这是因为对r(t)的预测错误会对y(t)的预测产生错误。在分层时间序列中，有一种情况是有用的，即有一个顶层预测，它的信噪比更高，因此更容易预测。它的预测可以包括在每个低级序列的预测中。

额外的回归因子被放在模型的线性成分中，所以基本模型是时间序列作为一个加法或乘法因子取决于额外的回归因子（乘法性见下一节）。