This one is the most important, before starting anything you should learn about programming. Coding will make you assimilate a certain logic that’s close to mathematical formulas and can help you formalize your trading process. It’s essential to be able to understand everything that’s “under the hood”, what if you strategy starts to slow down after a few months and you’re not able to improve it yourself.
You won’t learn programming in a day, you should take your time to learn and understand the process. Fortunately, there are multiple free methods you can use to learn about Python. You can use websites like EDX, Coursera, and Udacity.
#2 Backtesting and training on the same period
Let’s say you found the perfect strategy that makes +300% in the 2014 period, you may want to backtest it on a different period, the strategy may work in that specific time but it could make you lose a lot on another period. This beginner mistake has a name: overfitting. Ideally you want to split your data set into at least 2 parts: train and test. But if you want to have a rock-solid performance, you can try K-Fold cross validation, it’ll split your data set into K parts, train 1 part and test it on the other ones, and so on.
#3 Not backtesting enough
Backtest, backtest and backtest. Use different time periods, adjust the trading size, the strategy could work by buying 100$ worth of stocks at a time but what if you want to scale it ? You could introduce slippage and of course broker fees.
Backtesting is good but paper trading is better, you should run the strategy in real-time but without any broker connection, this way you can simulate how it’s going to behave with current market situation.
#4 Not having a risk management strategy
Risk management is going to make a difference during bear markets or high-volatility periods. You can limit the maximum exposure and ignore any buying signal if you hit the limit, or automatically close any position older than a few days. These are suggestions, it’s important to make sure you won’t get stuck with a growing loss over time.
#5 Having unreliable data
Your strategy will be based on financial data, either real-time, minute or daily data, a single data point can destroy your profits. You need to make sure it’s coming from a reliable source and not some random websites, a good source is Quandl, some of their datasets are free.
Zipline is a backtesting engine for Python, if you’re a Quantopian member you should be familiar with it since it’s the one they’re using. It provides metrics about the strategy such as returns, standard deviations, Sharpe ratios etc. basically everything you need to know in order to validate or not a strategy before going live.
Zipline can be install using pip:
pip install zipline
If you’re on Windows I suggest using Conda:
conda install -c Quantopian zipline
Here is the basic structure of a strategy in Zipline:
from zipline.api import order, record, symbol
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data.current(symbol('AAPL'), 'price'))
In initialize you can set some global variables used for the strategy such as a list of stocks, certain parameters, the maximum percentage of portfolio invested. Then handle_data is entered at every tick, that’s where your strategy logic should be. You can check previous articles and incorporate strategies into your code.
Let’s breakdown the handle_data()code.
The order() function let you create an order, here we specify the AAPL ticker (Apple stock) with a quantity of 10. A positive value means you’re buying 10 stocks, a negative value would mean you’re selling the stock.
Then, the record() function allows you to save the value of a variable at each iteration. Here, you’re saving the current stock price under the variable named AAPL, you’ll then be able to retrieve that information in the backtest result, this way you can compare your strategy performance versus the stock price.
Now you want to finally backtest the strategy and see if it’s profitable. To do that, run the following command:
This command is going to run the backtest between 2015-01-01 and 2020-01-01 and output the result into a pickle file for later analysis. The pickle is simply a Pandas DataFrame with a line per day and (a lot of) columns regarding your strategy, such as the return, the number of orders, the portofolio size and so on.
To show you the full process of creating a trading strategy, I’m going to work on a super simple strategy based on the VIX and its futures. I’m just skipping the data downloading from Quandl, I’m using the VIX index from here and the VIX futures from here, only the VX1 and VX2 continuous contracts datasets.
Data loading
First we need to load all the necessary imports, the backtest import will be used later:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from backtest import backtest
from datetime import datetime
For the sake of simplicity, I’m going to put all values in one DataFrame and in different columns. We have the VIX index, VX1 and VX2, this gives us this code:
VIX = "VIX.csv"
VIX1 = "VX1.csv"
VIX2 = "VX2.csv"
data = []
fileList = []
# Create the base DataFrame
data = pd.DataFrame()
fileList.append(VIX)
fileList.append(VIX1)
fileList.append(VIX2)
# Iterate through all files
for file in fileList:
# Only keep the Close column
tmp = pd.DataFrame(pd.DataFrame.from_csv(path=file, sep=',')['Close'])
# Rename the Close column to the correct index/future name
tmp.rename(columns={'Close': file.replace(".csv", "")}, inplace=True)
# Merge with data already loaded
# It's like a SQL join on the dates
data = data.join(tmp, how = 'right')
# Resort by the dates, in case the join messed up the order
data = data.sort_index()
And here’s the result:
Date
VIX
VX1
VX2
02/01/2008
23.17
23.83
24.42
03/01/2008
22.49
23.30
24.60
04/01/2008
23.94
24.65
25.37
07/01/2008
23.79
24.07
24.79
08/01/2008
25.43
25.53
26.10
Signals
For this tutorial I’m going to use a very basic signal, the structure is the same and you can replace the logic with your whatever strategy you want, using very complex machine learning algos or just crossing moving averages.
The VIX is a mean-reverting asset, at least in theory, it means it will go up and down but in the end its value will move around an average. Our strategy will be to go short when it’s way higher than its mean value and to go short when it’s very low, based on absolute values to keep it simple.
high = 65
low = 12
# By default, set everything to 0
data['Signal'] = 0
# For each day where the VIX is higher than 65, we set the signal to -1 which means: go short
data.loc[data['VIX'] > high, 'Signal'] = -1
# Go long when the VIX is lower than 12
data.loc[data['VIX'] < low, 'Signal'] = 1
# We store only days where we go long/short, so that we can display them on the graph
buys = data.ix[data['Signal'] == 1]
sells = data.ix[data['Signal'] == -1]
Now we’d like to visualize the signal to check if, at least, the strategy looks profitable:
# Plot the VX1, not the VIX since we're going to trade the future and not the index directly
plt.plot(data.index, data['VX1'], label='VX1')
# Plot the buy and sell signals on the same plot
plt.plot(sells.index, data.ix[sells.index]['VX1'], 'v', markersize=10, color='r')
plt.plot(buys.index, data.ix[buys.index]['VX1'], '^', markersize=10, color='g')
plt.ylabel('Price')
plt.xlabel('Date')
plt.legend(loc=0)
# Display everything
plt.show()
The result is quite good, even though there’s no trade between 2009 and 2013, we could improve that later:
Backtesting
Let’s check if the strategy is profitable and get some metrics. We’re going to compare our strategy returns with the “Buy and Hold” strategy, which means we just buy the VX1 future and wait (and roll it at each expiry), this way we can see if our strategy is more profitable than a passive one.
I put the backtest method in a separate file to make the main code less heavy, but you can keep the method in the same file:
import numpy as np
import pandas as pd
# data = prices + dates at least
def backtest(data):
cash = 100000
position = 0
total = 0
data['Total'] = 100000
data['BuyHold'] = 100000
# To compute the Buy and Hold value, I invest all of my cash in the VX1 on the first day of the backtest
positionBeginning = int(100000/float(data.iloc[0]['VX1']))
increment = 1000
for row in data.iterrows():
price = float(row[1]['VX1'])
signal = float(row[1]['Signal'])
if(signal > 0 and cash - increment * price > 0):
# Buy
cash = cash - increment * price
position = position + increment
print(row[0].strftime('%d %b %Y')+" Position = "+str(position)+" Cash = "+str(cash)+" // Total = {:,}".format(int(position*price+cash)))
elif(signal < 0 and abs(position*price) < cash):
# Sell
cash = cash + increment * price
position = position - increment
print(row[0].strftime('%d %b %Y')+" Position = "+str(position)+" Cash = "+str(cash)+" // Total = {:,}".format(int(position*price+cash)))
data.loc[data.index == row[0], 'Total'] = float(position*price+cash)
data.loc[data.index == row[0], 'BuyHold'] = price*positionBeginning
return position*price+cash
In the main code I’m going to use the backtest method like this:
It’s important to display the annualized return, a strategy with a 20% return over 10 years is different than a 20% return over 2 months, we annualize everything so that we can compare strategies easily. The Sharpe Ratio is a useful metric, it allows us to see if the return is worth the risk, in this example I just assumed a 0% risk-free rate, if the ratio is > 1 it means the risk-adjusted return is interesting, if it’s > 10 it means the risk-adjusted return is very interesting, basically high return for a low volatility.
In our example we have a pretty nice Sharpe ratio of 4.6 which is quite good:
The strategy perfomed very well until 2010 but then from 2013 the PnL starts to stagnate:
Conclusion
I showed you a basic structure of creating a strategy, you can adapt it to your needs, for example you can implement your strategy using zipline instead of a custom bactktesting module. With zipline you’ll have way more metrics and you’ll easily be able to run your strategy on different assets, since market data is managed by zipline.
I didn’t mention any transactions fees or bid-ask spread in this post, the backtest doesn’t take into account all of this so maybe if we include them the strategy would lose money!