在Python中使用LSTM进行股票市场预测

本文概述

为什么需要时间序列模型？
下载数据
将数据分为训练集和测试集
通过平均进行一步一步预测
LSTM简介：使库存移动预测更远
可视化预测
结束语
参考文献

在本教程中, 你将看到如何使用称为长短期记忆的时间序列模型。 LSTM模型功能强大, 特别是通过设计保留了长期记忆, 这一点将在以后看到。你将在本教程中解决以下主题：

了解为什么你需要能够预测股价走势；
下载数据-你将使用从Yahoo Finance收集的股市数据；
分割火车测试数据, 并执行一些数据归一化；
仔细研究并应用一些可以用于提前一步预测的平均技术；
激励并简要讨论LSTM模型, 因为它可以预测超过一个步骤；
使用当前数据预测和可视化未来的股票市场

如果你不熟悉深度学习或神经网络, 则应该看看我们的Python深度学习课程。它涵盖了基础知识, 以及如何在Keras中自行构建神经网络。这与本教程中将使用的TensorFlow软件包不同, 但是想法是相同的。

为什么需要时间序列模型？

你想正确地模拟股票价格, 因此作为股票购买者, 你可以合理地决定何时购买股票以及何时出售股票以获利。这就是时间序列建模的用武之地。你需要良好的机器学习模型, 这些模型可以查看数据序列的历史记录并正确预测序列的未来元素。

警告：股市价格非常难以预测且波动很大。这意味着, 数据中没有一致的模式可以让你近乎完美地模拟一段时间内的股票价格。不要从我那儿拿走它, 而是从普林斯顿大学的经济学家伯顿·马尔基尔那里拿来的, 他在1973年的著作《华尔街的随机漫步》中指出, 如果市场真正有效, 并且股价能够立即反映所有因素, 当它们被公开时, 被蒙住眼睛的猴子向报纸股票上市投掷飞镖应该和任何投资专家一样。

但是, 让我们不要一直相信这只是一个随机或随机的过程, 并且对于机器学习没有希望。让我们看看是否至少可以对数据建模, 以便你做出的预测与数据的实际行为相关。换句话说, 你不需要确切的未来股票价值, 而需要股价波动(即, 如果近期内下跌趋势会上升)。

# Make sure that you have all these libaries available to run the code successfully
from pandas_datareader import data
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import urllib.request, json
import os
import numpy as np
import tensorflow as tf # This code has been tested with TensorFlow 1.6
from sklearn.preprocessing import MinMaxScaler

下载数据

你将使用以下来源的数据：

Alpha Vantage。但是, 在开始之前, 你首先需要一个API密钥, 你可以在此处免费获得。之后, 你可以将该键分配给api_key变量。
使用此页面上的数据。你将需要将zip文件中的Stocks文件夹复制到项目主文件夹。

股票价格有几种不同的风格。他们是,

开盘：当天的开盘价
收盘：当日收盘价
高：数据的最高股价
低：当日最低股价

从Alphavantage获取数据

你将首先从Alpha Vantage加载数据。由于你将要利用美国航空股票市场的价格进行预测, 因此请将股票代码设置为” AAL”。此外, 你还定义了一个url_string和一个file_to_save, 该JSON_file返回最近20年内美国航空的所有股市数据的JSON文件, 该文件将是你将数据保存到的文件。你将使用预先定义的代码变量来帮助命名该文件。

接下来, 你将指定一个条件：如果尚未保存数据, 则继续并从你在url_string中设置的URL中获取数据；你将日期, 低价, 高价, 成交量, 平仓, 开盘价存储到pandas DataFrame df中, 并将其保存到file_to_save。但是, 如果数据已经存在, 则只需从CSV加载数据即可。

从Kaggle获取数据

在Kaggle上找到的数据是csv文件的集合, 你无需进行任何预处理, 因此你可以将数据直接加载到Pandas DataFrame中。

data_source = 'kaggle' # alphavantage or kaggle

if data_source == 'alphavantage':
    # ====================== Loading Data from Alpha Vantage ==================================

    api_key = '<your API key>'

    # American Airlines stock market prices
    ticker = "AAL"

    # JSON file with all the stock market data for AAL from the last 20 years
    url_string = "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=%s&outputsize=full&apikey=%s"%(ticker, api_key)

    # Save data to this file
    file_to_save = 'stock_market_data-%s.csv'%ticker

    # If you haven't already saved data, # Go ahead and grab the data from the url
    # And store date, low, high, volume, close, open values to a Pandas DataFrame
    if not os.path.exists(file_to_save):
        with urllib.request.urlopen(url_string) as url:
            data = json.loads(url.read().decode())
            # extract stock market data
            data = data['Time Series (Daily)']
            df = pd.DataFrame(columns=['Date', 'Low', 'High', 'Close', 'Open'])
            for k, v in data.items():
                date = dt.datetime.strptime(k, '%Y-%m-%d')
                data_row = [date.date(), float(v['3. low']), float(v['2. high']), float(v['4. close']), float(v['1. open'])]
                df.loc[-1, :] = data_row
                df.index = df.index + 1
        print('Data saved to : %s'%file_to_save)        
        df.to_csv(file_to_save)

    # If the data is already there, just load it from the CSV
    else:
        print('File already exists. Loading data from CSV')
        df = pd.read_csv(file_to_save)

else:

    # ====================== Loading Data from Kaggle ==================================
    # You will be using HP's data. Feel free to experiment with other data.
    # But while doing so, be careful to have a large enough dataset and also pay attention to the data normalization
    df = pd.read_csv(os.path.join('Stocks', 'hpq.us.txt'), delimiter=', ', usecols=['Date', 'Open', 'High', 'Low', 'Close'])
    print('Loaded data from the Kaggle repository')

Data saved to : stock_market_data-AAL.csv

数据探索

在这里, 你将把收集到的数据打印到DataFrame中。你还应确保按日期对数据进行排序, 因为数据的顺序在时间序列建模中至关重要。

# Sort DataFrame by date
df = df.sort_values('Date')

# Double check the result
df.head()

	日期	打开	高	低	关
0	1970-01-02	0.30627	0.30627	0.30627	0.30627
1	1970-01-05	0.30627	0.31768	0.30627	0.31385
2	1970-01-06	0.31385	0.31385	0.30996	0.30996
3	1970-01-07	0.31385	0.31385	0.31385	0.31385
4	1970-01-08	0.31385	0.31768	0.31385	0.31385

数据可视化

现在, 让我们看看你拥有什么样的数据。你希望数据随着时间发生各种变化。

plt.figure(figsize = (18, 9))
plt.plot(range(df.shape[0]), (df['Low']+df['High'])/2.0)
plt.xticks(range(0, df.shape[0], 500), df['Date'].loc[::500], rotation=45)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Mid Price', fontsize=18)
plt.show()

该图已经说明了很多事情。我之所以选择这家公司而不是其他公司, 是因为该图随着时间的推移出现了不同的股价行为。这将使学习变得更强大, 并且可以让你进行更改以测试在各种情况下的预测效果如何。

还要注意的另一件事是, 接近2017年的价格比接近1970年代的价格高得多, 并且波动更大。因此, 你需要确保数据在整个时间范围内的行为在相似的值范围内。在数据规范化阶段, 你将处理此问题。

将数据分为训练集和测试集

你将使用通过计算一天中最高和最低记录价格的平均值计算出的中间价格。

# First calculate the mid prices from the highest and lowest
high_prices = df.loc[:, 'High'].as_matrix()
low_prices = df.loc[:, 'Low'].as_matrix()
mid_prices = (high_prices+low_prices)/2.0

现在, 你可以拆分训练数据和测试数据。训练数据将是时间序列的前11, 000个数据点, 其余的将是测试数据。

train_data = mid_prices[:11000]
test_data = mid_prices[11000:]

现在, 你需要定义一个缩放器以标准化数据。 MinMaxScalar将所有数据缩放到0和1的范围内。你还可以将训练和测试数据的形状调整为[data_size, num_features]的形状。

# Scale the data to be between 0 and 1
# When scaling remember! You normalize both test and train data with respect to training data
# Because you are not supposed to have access to test data
scaler = MinMaxScaler()
train_data = train_data.reshape(-1, 1)
test_data = test_data.reshape(-1, 1)

由于你之前所做的观察, 即数据的不同时间段具有不同的值范围, 因此你可以通过将整个序列划分为多个窗口来对数据进行归一化。如果你不这样做, 则较早的数据将接近0, 并且不会为学习过程增加太多价值。在这里, 你选择的窗口大小为2500。

提示：选择窗口大小时, 请确保它不要太小, 因为在执行窗口标准化时, 由于每个窗口都是独立标准化的, 因此可能会在每个窗口的末尾引入中断。

在此示例中, 将影响4个数据点。但是, 如果你有11, 000个数据点, 那么4点不会有任何问题

# Train the Scaler with training data and smooth data
smoothing_window_size = 2500
for di in range(0, 10000, smoothing_window_size):
    scaler.fit(train_data[di:di+smoothing_window_size, :])
    train_data[di:di+smoothing_window_size, :] = scaler.transform(train_data[di:di+smoothing_window_size, :])

# You normalize the last bit of remaining data
scaler.fit(train_data[di+smoothing_window_size:, :])
train_data[di+smoothing_window_size:, :] = scaler.transform(train_data[di+smoothing_window_size:, :])

将数据重新调整为[data_size]的形状

# Reshape both train and test data
train_data = train_data.reshape(-1)

# Normalize test data
test_data = scaler.transform(test_data).reshape(-1)

现在, 你可以使用指数移动平均值对数据进行平滑处理。这可以帮助你摆脱股价中数据固有的参差不齐, 并生成更平滑的曲线。

请注意, 你应该只平滑训练数据。

# Now perform exponential moving average smoothing
# So the data will have a smoother curve than the original ragged data
EMA = 0.0
gamma = 0.1
for ti in range(11000):
  EMA = gamma*train_data[ti] + (1-gamma)*EMA
  train_data[ti] = EMA

# Used for visualization and test purposes
all_mid_data = np.concatenate([train_data, test_data], axis=0)

通过平均进行一步一步预测

平均机制使你可以通过将未来股票价格表示为先前观察到的股票价格的平均值来预测(通常提前一个时间)。这样做超过一个时间步可能会产生非常糟糕的结果。你将在下面查看两种平均技术。标准平均和指数移动平均线。你将对两种算法产生的结果进行定性(目视检查)和定量(均方误差)评估。

均方误差(MSE)可以通过将提前一步的真实值与预测值之间的平方误差乘以所有预测的平均值来计算。

标准平均值

你可以通过首先尝试将其建模为平均计算问题来了解此问题的难度。首先, 你将尝试以固定大小窗口(例如, xt-N, …, xt)内先前观察到的股票市场价格的平均值预测未来的股票市场价格(例如, xt + 1)(例如前100天)。此后, 你将尝试使用更新颖的”指数移动平均线”方法, 并观察其效果如何。然后, 你将继续进行时间序列预测的”圣杯”；长短期记忆模型。

首先, 你将了解正常平均的工作原理。就是说

换句话说, 你说对$ t + 1 $的预测是你在$ t $到$ t-N $的窗口内观察到的所有股票价格的平均值。

window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []

for pred_idx in range(window_size, N):

    if pred_idx >= N:
        date = dt.datetime.strptime(k, '%Y-%m-%d').date() + dt.timedelta(days=1)
    else:
        date = df.loc[pred_idx, 'Date']

    std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
    mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
    std_avg_x.append(date)

print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))

MSE error for standard averaging: 0.00418

看看下面的平均结果。它非常接近股票的实际行为。接下来, 你将看到一种更准确的单步预测方法。


plt.figure(figsize = (18, 9))
plt.plot(range(df.shape[0]), all_mid_data, color='b', label='True')
plt.plot(range(window_size, N), std_avg_predictions, color='orange', label='Prediction')
#plt.xticks(range(0, df.shape[0], 50), df['Date'].loc[::50], rotation=45)
plt.xlabel('Date')
plt.ylabel('Mid Price')
plt.legend(fontsize=18)
plt.show()

那么上面的图(和MSE)怎么说？

对于非常短的预测(提前一天), 似乎模型并不算太坏。考虑到股价在一夜之间不会从0变为100, 这种行为是明智的。接下来, 你将了解一种称为指数移动平均值的更高级的平均技术。

指数移动平均线

你可能已经在互联网上看到了一些使用非常复杂的模型并预测股市的确切行为的文章。但是要当心！这些只是错觉, 而不是由于学习有用的东西。你将在下面看到如何使用简单的平均方法来复制该行为。

在指数移动平均法中, 你将$ x_ {t + 1} $计算为,

xt + 1 = EMAt =γ×EMAt-1 +(1-γ)xt其中EMA0 = 0, 并且EMA是你随时间保持的指数移动平均值。

上面的等式基本上从$ t + 1 $时间步长计算指数移动平均值, 并将其用作提前一步的预测。 $ \ gamma $决定最新预测对EMA的贡献。例如, $ \ gamma = 0.1 $仅将当前值的10％存入EMA。由于你只使用了最新数据的一小部分, 因此可以保留你在平均值中很早就看到的更旧的值。看看下面的预测步骤时, 效果如何。

window_size = 100
N = train_data.size

run_avg_predictions = []
run_avg_x = []

mse_errors = []

running_mean = 0.0
run_avg_predictions.append(running_mean)

decay = 0.5

for pred_idx in range(1, N):

    running_mean = running_mean*decay + (1.0-decay)*train_data[pred_idx-1]
    run_avg_predictions.append(running_mean)
    mse_errors.append((run_avg_predictions[-1]-train_data[pred_idx])**2)
    run_avg_x.append(date)

print('MSE error for EMA averaging: %.5f'%(0.5*np.mean(mse_errors)))

MSE error for EMA averaging: 0.00003


plt.figure(figsize = (18, 9))
plt.plot(range(df.shape[0]), all_mid_data, color='b', label='True')
plt.plot(range(0, N), run_avg_predictions, color='orange', label='Prediction')
#plt.xticks(range(0, df.shape[0], 50), df['Date'].loc[::50], rotation=45)
plt.xlabel('Date')
plt.ylabel('Mid Price')
plt.legend(fontsize=18)
plt.show()

如果指数移动平均线很好, 为什么需要更好的模型？

你会看到它符合遵循True分布的完美路线(并通过极低的MSE进行了证明)。实际上, 仅凭第二天的股票市值就无法做很多事情。我个人想要的不是第二天的确切股市价格, 而是未来30天的股市价格会上涨还是下跌。尝试执行此操作, 你将发现EMA方法的功能不足。

现在, 你将尝试在窗口中进行预测(例如, 你预测了未来2天的窗口, 而不仅仅是第二天)。然后, 你将认识到错误的EMA会如何发展。这是一个例子：

预测未来不止一步

为了使事情具体, 让我们假设一些值, 例如$ x_t = 0.4 $, $ EMA = 0.5 $和$ \ gamma = 0.5 $

假设你使用以下等式获得输出Xt + 1 = EMAt =γ×EMAt-1 +(1-γ)Xt因此, 你有$ x_ {t + 1} = 0.5 \ times 0.5 +(1-0.5)\ times 0.4 = 0.45 $所以$ x_ {t + 1} = EMA_t = 0.45 $
因此, 下一个预测$ x_ {t + 2} $变为Xt + 2 =γ×EMAt +(1-γ)Xt + 1, 即$ x_ {t + 2} = \ gamma \ times EMA_t +(1- \ γ)EMA_t = EMA_t $或在此示例中, Xt + 2 = Xt + 1 = 0.45

因此, 无论你对未来的预测有多少步骤, 对于所有未来的预测步骤, 你都将获得相同的答案。

你可以输出有用信息的一种解决方案是查看基于动量的算法。他们根据过去的最近值是上升还是下降(而不是确切值)做出预测。例如, 他们会说, 如果过去几天价格一直在下跌, 那么第二天的价格可能会更低, 这听起来很合理。但是, 你将使用更复杂的模型：LSTM模型。

这些模型非常擅长对时间序列数据进行建模, 因此已经抢占了时间序列预测的领域。你将看到是否确实存在可以利用的数据中隐藏的模式。

LSTM简介：使库存移动预测更远

长短期记忆模型是功能非常强大的时间序列模型。他们可以预测未来的任意数量的步骤。 LSTM模块(或单元)具有5个基本组件, 可用于对长期和短期数据进行建模。

单元状态($ c_t $)-代表单元的内部存储器, 既存储短期存储器又存储长期存储器
隐藏状态($ h_t $)-这是通过w.r.t计算的输出状态信息。当前输入, 以前的隐藏状态和当前单元格输入, 你最终将使用它们来预测未来的股票市场价格。另外, 隐藏状态可以决定仅检索存储在单元状态中的短期或长期或两种类型的存储器, 以进行下一次预测。
输入门($ i_t $)-确定从当前输入流到单元状态的信息量
忘记门($ f_t $)-确定当前输入和先前单元状态中有多少信息流入当前单元状态
输出门($ o_t $)-确定当前单元状态中有多少信息流入隐藏状态, 因此, 如果需要, LSTM只能选择长期记忆或短期记忆和长期记忆

单元格如下图所示。

并且用于计算这些实体中的每一个的方程式如下。

$ it = \ sigma(W {ix} xt + W {ih} h_ {t-1} + b_i)$
$ \ tilde {c} t = \ sigma(W {cx} xt + W {ch} h_ {t-1} + b_c)$
$ ft = \ sigma(W {fx} xt + W {fh} h_ {t-1} + b_f)$
$ c_t = ft c {t-1} + i_t \ tilde {c} _t $
$ ot = \ sigma(W {ox} xt + W {oh} h_ {t-1} + b_o)$
$ h_t = o_tanh(c_t)$

要获得对LSTM的更好(更技术性)的了解, 可以参考本文。

TensorFlow提供了一个不错的子API(称为RNN API), 用于实现时间序列模型。你将在实现中使用它。

数据产生器

你首先要实现一个数据生成器来训练你的模型。此数据生成器将具有一个称为.unroll_batches(…)的方法, 该方法将输出一组num_unrollings批输入数据, 这些输入数据是顺序获取的, 其中一批数据的大小为[batch_size, 1]。然后, 每批输入数据将具有对应的输出批数据。

例如, 如果num_unrollings = 3和batch_size = 4一组展开的批次看起来可能像这样,

输入数据：$ [x_0, x_10, x_20, x_30], [x_1, x_11, x_21, x_31], [x_2, x_12, x_22, x_32] $
输出数据：$ [x_1, x_11, x_21, x_31], [x_2, x_12, x_22, x_32], [x_3, x_13, x_23, x_33] $

数据扩充

同样, 为了使模型更健壮, 你将不会使$ xt $的输出始终为$ x {t + 1} $。而是从集合$ x {t + 1}, x {t + 2}, \ ldots, x_ {t + N} $中随机采样输出, 其中$ N $是一个很小的窗口大小。

在这里, 你进行以下假设：

$ x {t + 1}, x {t + 2}, \ ldots, x_ {t + N} $距离不会很远

我个人认为, 这是股票走势预测的合理假设。

在下面, 你说明了如何直观地创建一批数据。


class DataGeneratorSeq(object):

    def __init__(self, prices, batch_size, num_unroll):
        self._prices = prices
        self._prices_length = len(self._prices) - num_unroll
        self._batch_size = batch_size
        self._num_unroll = num_unroll
        self._segments = self._prices_length //self._batch_size
        self._cursor = [offset * self._segments for offset in range(self._batch_size)]

    def next_batch(self):

        batch_data = np.zeros((self._batch_size), dtype=np.float32)
        batch_labels = np.zeros((self._batch_size), dtype=np.float32)

        for b in range(self._batch_size):
            if self._cursor[b]+1>=self._prices_length:
                #self._cursor[b] = b * self._segments
                self._cursor[b] = np.random.randint(0, (b+1)*self._segments)

            batch_data[b] = self._prices[self._cursor[b]]
            batch_labels[b]= self._prices[self._cursor[b]+np.random.randint(0, 5)]

            self._cursor[b] = (self._cursor[b]+1)%self._prices_length

        return batch_data, batch_labels

    def unroll_batches(self):

        unroll_data, unroll_labels = [], []
        init_data, init_label = None, None
        for ui in range(self._num_unroll):

            data, labels = self.next_batch()    

            unroll_data.append(data)
            unroll_labels.append(labels)

        return unroll_data, unroll_labels

    def reset_indices(self):
        for b in range(self._batch_size):
            self._cursor[b] = np.random.randint(0, min((b+1)*self._segments, self._prices_length-1))



dg = DataGeneratorSeq(train_data, 5, 5)
u_data, u_labels = dg.unroll_batches()

for ui, (dat, lbl) in enumerate(zip(u_data, u_labels)):   
    print('\n\nUnrolled index %d'%ui)
    dat_ind = dat
    lbl_ind = lbl
    print('\tInputs: ', dat )
    print('\n\tOutput:', lbl)

Unrolled index 0
    Inputs:  [0.03143791 0.6904868  0.82829314 0.32585657 0.11600105]

    Output: [0.08698314 0.68685144 0.8329321  0.33355275 0.11785509]


Unrolled index 1
    Inputs:  [0.06067836 0.6890754  0.8325337  0.32857886 0.11785509]

    Output: [0.15261841 0.68685144 0.8325337  0.33421066 0.12106793]


Unrolled index 2
    Inputs:  [0.08698314 0.68685144 0.8329321  0.33078218 0.11946969]

    Output: [0.11098009 0.6848606  0.83387965 0.33421066 0.12106793]


Unrolled index 3
    Inputs:  [0.11098009 0.6858036  0.83294916 0.33219692 0.12106793]

    Output: [0.132895   0.6836884  0.83294916 0.33219692 0.12288672]


Unrolled index 4
    Inputs:  [0.132895   0.6848606  0.833369   0.33355275 0.12158521]

    Output: [0.15261841 0.6836884  0.83383167 0.33355275 0.12230608]

定义超参数

在本节中, 你将定义几个超参数。 D是输入的维数。这很简单, 因为你将前一个股票价格作为输入并预测下一个股票价格应为1。

然后你有num_unrollings, 这是与时间反向传播(BPTT)相关的超参数, 用于优化LSTM模型。这表示你为单个优化步骤考虑了多少个连续时间步骤。你可以将其视为, 而不是通过查看单个时间步长来优化模型, 而是通过查看num_unrollings时间步长来优化网络。越大越好。

然后, 你有batch_size。批次大小是你在单个时间步中考虑的数据样本数。

接下来, 定义num_nodes, 它表示每个单元格中隐藏神经元的数量。你可以看到在此示例中有三层LSTM。

D = 1 # Dimensionality of the data. Since your data is 1-D this would be 1
num_unrollings = 50 # Number of time steps you look into the future.
batch_size = 500 # Number of samples in a batch
num_nodes = [200, 200, 150] # Number of hidden nodes in each layer of the deep LSTM stack we're using
n_layers = len(num_nodes) # number of layers
dropout = 0.2 # dropout amount

tf.reset_default_graph() # This is important in case you run this multiple times

定义输入和输出

接下来, 为培训输入和标签定义占位符。这非常简单, 因为你有一个输入占位符列表, 其中每个占位符包含单批数据。并且该列表具有num_unrollings个占位符, 这些占位符将立即用于单个优化步骤。

# Input data.
train_inputs, train_outputs = [], []

# You unroll the input over time defining placeholders for each time step
for ui in range(num_unrollings):
    train_inputs.append(tf.placeholder(tf.float32, shape=[batch_size, D], name='train_inputs_%d'%ui))
    train_outputs.append(tf.placeholder(tf.float32, shape=[batch_size, 1], name = 'train_outputs_%d'%ui))

定义LSTM和回归层的参数

你将具有三层LSTM和一个线性回归层, 用w和b表示, 该层将获取最后一个Long Short-Term Memory单元的输出并输出下一个时间步的预测。你可以在TensorFlow中使用MultiRNNCell封装你创建的三个LSTMCell对象。此外, 你还可以使用已实现辍学的LSTM单元, 因为它们可以提高性能并减少过度拟合。

lstm_cells = [
    tf.contrib.rnn.LSTMCell(num_units=num_nodes[li], state_is_tuple=True, initializer= tf.contrib.layers.xavier_initializer()
                           )
 for li in range(n_layers)]

drop_lstm_cells = [tf.contrib.rnn.DropoutWrapper(
    lstm, input_keep_prob=1.0, output_keep_prob=1.0-dropout, state_keep_prob=1.0-dropout
) for lstm in lstm_cells]
drop_multi_cell = tf.contrib.rnn.MultiRNNCell(drop_lstm_cells)
multi_cell = tf.contrib.rnn.MultiRNNCell(lstm_cells)

w = tf.get_variable('w', shape=[num_nodes[-1], 1], initializer=tf.contrib.layers.xavier_initializer())
b = tf.get_variable('b', initializer=tf.random_uniform([1], -0.1, 0.1))

计算LSTM输出并将其提供给回归层以获得最终预测

在本部分中, 你首先创建TensorFlow变量(c和h), 这些变量将保存长期短期记忆单元的单元状态和隐藏状态。然后, 将train_inputs的列表转换为[num_unrollings, batch_size, D]的形状, 这是使用tf.nn.dynamic_rnn函数计算输出所必需的。然后, 你可以使用tf.nn.dynamic_rnn函数计算LSTM输出, 并将输出拆分回num_unrolling张量的列表。预测与真实股价之间的损失。

# Create cell state and hidden state variables to maintain the state of the LSTM
c, h = [], []
initial_state = []
for li in range(n_layers):
  c.append(tf.Variable(tf.zeros([batch_size, num_nodes[li]]), trainable=False))
  h.append(tf.Variable(tf.zeros([batch_size, num_nodes[li]]), trainable=False))
  initial_state.append(tf.contrib.rnn.LSTMStateTuple(c[li], h[li]))

# Do several tensor transofmations, because the function dynamic_rnn requires the output to be of
# a specific format. Read more at: https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
all_inputs = tf.concat([tf.expand_dims(t, 0) for t in train_inputs], axis=0)

# all_outputs is [seq_length, batch_size, num_nodes]
all_lstm_outputs, state = tf.nn.dynamic_rnn(
    drop_multi_cell, all_inputs, initial_state=tuple(initial_state), time_major = True, dtype=tf.float32)

all_lstm_outputs = tf.reshape(all_lstm_outputs, [batch_size*num_unrollings, num_nodes[-1]])

all_outputs = tf.nn.xw_plus_b(all_lstm_outputs, w, b)

split_outputs = tf.split(all_outputs, num_unrollings, axis=0)

损失计算与优化

现在, 你将计算损失。但是, 你应该注意, 计算损耗时有一个独特的特征。对于每一批预测和真实输出, 你可以计算均方误差。然后, 你将所有这些均方根损失加在一起(而不是平均)。最后, 你定义将用于优化神经网络的优化器。在这种情况下, 你可以使用Adam, 这是一个非常新的且性能良好的优化程序。

# When calculating the loss you need to be careful about the exact form, because you calculate
# loss of all the unrolled steps at the same time
# Therefore, take the mean error or each batch and get the sum of that over all the unrolled steps

print('Defining training Loss')
loss = 0.0
with tf.control_dependencies([tf.assign(c[li], state[li][0]) for li in range(n_layers)]+
                             [tf.assign(h[li], state[li][1]) for li in range(n_layers)]):
  for ui in range(num_unrollings):
    loss += tf.reduce_mean(0.5*(split_outputs[ui]-train_outputs[ui])**2)

print('Learning rate decay operations')
global_step = tf.Variable(0, trainable=False)
inc_gstep = tf.assign(global_step, global_step + 1)
tf_learning_rate = tf.placeholder(shape=None, dtype=tf.float32)
tf_min_learning_rate = tf.placeholder(shape=None, dtype=tf.float32)

learning_rate = tf.maximum(
    tf.train.exponential_decay(tf_learning_rate, global_step, decay_steps=1, decay_rate=0.5, staircase=True), tf_min_learning_rate)

# Optimizer.
print('TF Optimization operations')
optimizer = tf.train.AdamOptimizer(learning_rate)
gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer = optimizer.apply_gradients(
    zip(gradients, v))

print('\tAll done')

Defining training Loss
Learning rate decay operations
TF Optimization operations
    All done

预测相关计算

在这里, 你可以定义与预测相关的TensorFlow操作。首先, 定义一个占位符以馈入输入(sample_inputs), 然后类似于训练阶段, 定义用于预测的状态变量(sample_c和sample_h)。最后, 你可以使用tf.nn.dynamic_rnn函数计算预测, 然后将输出发送通过回归层(w和b)。你还应该定义reset_sample_state操作, 该操作将重置单元格状态和隐藏状态。每次进行一系列预测时, 都应在开始时执行此操作。

print('Defining prediction related TF functions')

sample_inputs = tf.placeholder(tf.float32, shape=[1, D])

# Maintaining LSTM state for prediction stage
sample_c, sample_h, initial_sample_state = [], [], []
for li in range(n_layers):
  sample_c.append(tf.Variable(tf.zeros([1, num_nodes[li]]), trainable=False))
  sample_h.append(tf.Variable(tf.zeros([1, num_nodes[li]]), trainable=False))
  initial_sample_state.append(tf.contrib.rnn.LSTMStateTuple(sample_c[li], sample_h[li]))

reset_sample_states = tf.group(*[tf.assign(sample_c[li], tf.zeros([1, num_nodes[li]])) for li in range(n_layers)], *[tf.assign(sample_h[li], tf.zeros([1, num_nodes[li]])) for li in range(n_layers)])

sample_outputs, sample_state = tf.nn.dynamic_rnn(multi_cell, tf.expand_dims(sample_inputs, 0), initial_state=tuple(initial_sample_state), time_major = True, dtype=tf.float32)

with tf.control_dependencies([tf.assign(sample_c[li], sample_state[li][0]) for li in range(n_layers)]+
                              [tf.assign(sample_h[li], sample_state[li][1]) for li in range(n_layers)]):  
  sample_prediction = tf.nn.xw_plus_b(tf.reshape(sample_outputs, [1, -1]), w, b)

print('\tAll done')

Defining prediction related TF functions
    All done

运行LSTM

在这里, 你将训练和预测几个时期的股价走势, 并查看预测随着时间的推移会变得更好还是更差。你遵循以下过程。

在时间序列上定义一组测试起点(test_points_seq)来评估模型
对于每个时期对于训练数据的完整序列长度, 展开一组num_unrollings批次的集合使用展开的批次来训练神经网络计算平均训练损失对于测试集中的每个起点, 通过迭代找到的先前num_unrollings数据点来更新LSTM状态在测试点之前使用先前的预测作为当前输入, 连续对n_predict_once个步骤进行预测计算在预测的n_predict_once个点与这些时间戳下的真实股价之间的MSE损失

epochs = 30
valid_summary = 1 # Interval you make test predictions

n_predict_once = 50 # Number of steps you continously predict for

train_seq_length = train_data.size # Full length of the training data

train_mse_ot = [] # Accumulate Train losses
test_mse_ot = [] # Accumulate Test loss
predictions_over_time = [] # Accumulate predictions

session = tf.InteractiveSession()

tf.global_variables_initializer().run()

# Used for decaying learning rate
loss_nondecrease_count = 0
loss_nondecrease_threshold = 2 # If the test error hasn't increased in this many steps, decrease learning rate

print('Initialized')
average_loss = 0

# Define data generator
data_gen = DataGeneratorSeq(train_data, batch_size, num_unrollings)

x_axis_seq = []

# Points you start your test predictions from
test_points_seq = np.arange(11000, 12000, 50).tolist()

for ep in range(epochs):       

    # ========================= Training =====================================
    for step in range(train_seq_length//batch_size):

        u_data, u_labels = data_gen.unroll_batches()

        feed_dict = {}
        for ui, (dat, lbl) in enumerate(zip(u_data, u_labels)):            
            feed_dict[train_inputs[ui]] = dat.reshape(-1, 1)
            feed_dict[train_outputs[ui]] = lbl.reshape(-1, 1)

        feed_dict.update({tf_learning_rate: 0.0001, tf_min_learning_rate:0.000001})

        _, l = session.run([optimizer, loss], feed_dict=feed_dict)

        average_loss += l

    # ============================ Validation ==============================
    if (ep+1) % valid_summary == 0:

      average_loss = average_loss/(valid_summary*(train_seq_length//batch_size))

      # The average loss
      if (ep+1)%valid_summary==0:
        print('Average loss at step %d: %f' % (ep+1, average_loss))

      train_mse_ot.append(average_loss)

      average_loss = 0 # reset loss

      predictions_seq = []

      mse_test_loss_seq = []

      # ===================== Updating State and Making Predicitons ========================
      for w_i in test_points_seq:
        mse_test_loss = 0.0
        our_predictions = []

        if (ep+1)-valid_summary==0:
          # Only calculate x_axis values in the first validation epoch
          x_axis=[]

        # Feed in the recent past behavior of stock prices
        # to make predictions from that point onwards
        for tr_i in range(w_i-num_unrollings+1, w_i-1):
          current_price = all_mid_data[tr_i]
          feed_dict[sample_inputs] = np.array(current_price).reshape(1, 1)    
          _ = session.run(sample_prediction, feed_dict=feed_dict)

        feed_dict = {}

        current_price = all_mid_data[w_i-1]

        feed_dict[sample_inputs] = np.array(current_price).reshape(1, 1)

        # Make predictions for this many steps
        # Each prediction uses previous prediciton as it's current input
        for pred_i in range(n_predict_once):

          pred = session.run(sample_prediction, feed_dict=feed_dict)

          our_predictions.append(np.asscalar(pred))

          feed_dict[sample_inputs] = np.asarray(pred).reshape(-1, 1)

          if (ep+1)-valid_summary==0:
            # Only calculate x_axis values in the first validation epoch
            x_axis.append(w_i+pred_i)

          mse_test_loss += 0.5*(pred-all_mid_data[w_i+pred_i])**2

        session.run(reset_sample_states)

        predictions_seq.append(np.array(our_predictions))

        mse_test_loss /= n_predict_once
        mse_test_loss_seq.append(mse_test_loss)

        if (ep+1)-valid_summary==0:
          x_axis_seq.append(x_axis)

      current_test_mse = np.mean(mse_test_loss_seq)

      # Learning rate decay logic
      if len(test_mse_ot)>0 and current_test_mse > min(test_mse_ot):
          loss_nondecrease_count += 1
      else:
          loss_nondecrease_count = 0

      if loss_nondecrease_count > loss_nondecrease_threshold :
            session.run(inc_gstep)
            loss_nondecrease_count = 0
            print('\tDecreasing learning rate by 0.5')

      test_mse_ot.append(current_test_mse)
      print('\tTest MSE: %.5f'%np.mean(mse_test_loss_seq))
      predictions_over_time.append(predictions_seq)
      print('\tFinished Predictions')

Initialized
Average loss at step 1: 1.703350
    Test MSE: 0.00318
    Finished Predictions
  ...
  ...
  ...
Average loss at step 30: 0.033753
    Test MSE: 0.00243
    Finished Predictions

可视化预测

你可以看到随着培训量的增加, MSE损失如何下降。这很好地表明该模型正在学习有用的东西。为了量化你的发现, 你可以将网络的MSE损失与进行标准平均(0.004)时获得的MSE损失进行比较。你可以看到LSTM的性能比标准平均更好。而且你知道标准均值(尽管并不完美)合理地遵循了真实的股价走势。

best_prediction_epoch = 28 # replace this with the epoch that you got the best results when running the plotting code

plt.figure(figsize = (18, 18))
plt.subplot(2, 1, 1)
plt.plot(range(df.shape[0]), all_mid_data, color='b')

# Plotting how the predictions change over time
# Plot older predictions with low alpha and newer predictions with high alpha
start_alpha = 0.25
alpha  = np.arange(start_alpha, 1.1, (1.0-start_alpha)/len(predictions_over_time[::3]))
for p_i, p in enumerate(predictions_over_time[::3]):
    for xval, yval in zip(x_axis_seq, p):
        plt.plot(xval, yval, color='r', alpha=alpha[p_i])

plt.title('Evolution of Test Predictions Over Time', fontsize=18)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Mid Price', fontsize=18)
plt.xlim(11000, 12500)

plt.subplot(2, 1, 2)

# Predicting the best test prediction you got
plt.plot(range(df.shape[0]), all_mid_data, color='b')
for xval, yval in zip(x_axis_seq, predictions_over_time[best_prediction_epoch]):
    plt.plot(xval, yval, color='r')

plt.title('Best Test Predictions Over Time', fontsize=18)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Mid Price', fontsize=18)
plt.xlim(11000, 12500)
plt.show()

虽然不是很完美, 但LSTM似乎能够在大多数时间正确预测股价行为。请注意, 你所做的预测大致在0到1.0的范围内(即, 不是真实的股票价格)。没关系, 因为你是在预测股价走势, 而不是价格本身。

结束语

我希望你发现本教程有用。我应该提到, 这对我来说是有益的经历。在本教程中, 我了解到使用能够正确预测股价波动的模型有多困难。你从动机开始, 解释了为什么需要对股票价格进行建模。然后是说明和用于下载数据的代码。然后, 你研究了两种平均技术, 这些技术可使你对未来进行一步的预测。你接下来会看到, 当你需要预测未来的多个步骤时, 这些方法是徒劳的。此后, 你讨论了如何使用LSTM对未来进行许多预测。最后, 你对结果进行了可视化处理, 发现你的模型(尽管并不完美)非常擅长正确地预测股价走势。

如果你想了解有关深度学习的更多信息, 请务必阅读我们的Python深度学习课程。它涵盖了基础知识, 以及如何在Keras中自行构建神经网络。这与本教程中将使用的TensorFlow软件包不同, 但是想法是相同的。

在这里, 我要说明本教程的几个要点。

股票价格/运动预测是一项极其困难的任务。我个人认为, 不应将其中的任何股票预测模型视为理所当然, 而要盲目地依赖它们。但是, 模型可能能够在大多数时间(并非总是)正确预测股价走势。

不要被那些显示与真实股价完全重叠的预测曲线的文章所迷惑。可以使用简单的平均技术来复制它, 实际上, 它是没有用的。更明智的做法是预测股价走势。

模型的超参数对你获得的结果非常敏感。因此, 要做的一件好事将是在超参数上运行一些超参数优化技术(例如, 网格搜索/随机搜索)。下面我列出了一些最关键的超参数

优化器的学习率
层数和每层中的隐藏单元数
优化器。我发现亚当表现最好
模型的类型。你可以尝试带有窥孔和评估性能差异的GRU / Standard LSTM / LSTM

在本教程中, 你做错了一些事(由于数据量很小)！那就是你使用测试损失来降低学习率。这会将有关测试集的信息间接泄漏到训练过程中。处理此问题的更好方法是拥有一个独立的验证集(除测试集外), 并针对验证集的性能降低学习率。

如果你想与我联系, 你可以给我发送电子邮件至sothh@gmail.com或通过LinkedIn与我联系。

参考文献

我参考该存储库以了解如何使用LSTM进行库存预测。但是细节可能与参考文献中的实现有很大不同。