Fama-French regression

The purpose of this application is to estimate a Fama-French 3-factor model for a specific corporation. We will study IBM. First, we probably need the usual tools.

In [26]:
import pandas as pd # pandas is excellent for creating and manipulating dataframes, R-style
import numpy as np # great for simulations, we may not use for running regressions here
import matplotlib.pyplot as plt #graphing module with matlab-like properties
%matplotlib inline 
import requests # to make requests to webpages (like mine)
import statsmodels.api as sm # module full of standard statistical and econometric models, including OLS and time-series stuff
from IPython.display import Latex # to be able to display latex in python command cells
import getFamaFrenchFactors as gff # a library that downloads FF data nicely, could use datareader for that too 
from pandas_datareader.data import DataReader # datereader downloads from all kinds of places including yahoo and google finance
import datetime

Next we build the data we need to run our regressions using various scraping libraries.

In [27]:
# Get five years worth of data for the Fama French 3-factor model (monthly data)

FF3data = gff.famaFrench3Factor(frequency='m') #Datareader could do this too
FF3data=FF3data.tail(60) # keep 5 years of data
FF3data['Month']=FF3data['date_ff_factors'].dt.month
FF3data['Year']=FF3data['date_ff_factors'].dt.year #creates month and year variables for merging below

#import yfinance as yf
#IBM = yf.Ticker("IBM")
#IBMdata=IBM.history(period="5y")

# get 5 years of adjusted close price data for IBM

end=datetime.datetime.today()
IBMdata=DataReader('IBM', 'yahoo', '2015-01-01', end)['Adj Close'] # (at least) five years worth of IBM data 
IBMdataF=IBMdata.resample('M').last() #only keep the last observation of every month
IBMdataF=IBMdataF.to_frame() # converts the series format to a frame format which makes the upcomning merge easier

IBMdataF['Year']=IBMdataF.index.year
IBMdataF['Month']=IBMdataF.index.month
IBMdataF.rename(columns={'Adj Close':'IBM'},inplace=True)

# now we merge our two datasets into one, using year and month as the merging variables

datanow=pd.merge(
IBMdataF,
FF3data,
left_on=['Year','Month'],
    right_on=['Year','Month'])

# finally we create IBM's monthly return data from the adjusted price series
# Note the use of panda's shift operator to create a lag variable

datanow.sort_values(['date_ff_factors'],axis=0, ascending=False, inplace=True)

datanow['rIBM']=(datanow['IBM']/datanow.IBM.shift(-1)-1) #computes monthly return based on adjusted series for IBM 
datanow['Mkt']=datanow['Mkt-RF']+datanow['RF'] # this the market return, we will use that below
datanow.dropna(subset=['rIBM'], inplace=True) # drop the entries with missing returns (missing due to lag operator)

Now let's look at our data a bit to make sure it all looks good.

In [28]:
datanow[0:4]
Out[28]:
IBM Year Month date_ff_factors Mkt-RF SMB HML RF rIBM Mkt
59 119.930939 2020 9 2020-09-30 -0.0363 0.0010 -0.0259 0.0001 -0.013300 -0.0362
58 121.547493 2020 8 2020-08-31 0.0763 -0.0026 -0.0295 0.0001 0.016142 0.0764
57 119.616600 2020 7 2020-07-31 0.0577 -0.0218 -0.0131 0.0001 0.017968 0.0578
56 117.505257 2020 6 2020-06-30 0.0246 0.0270 -0.0222 0.0001 -0.033067 0.0247

Next we plot IBM's monthly return vs the market and fit a capm line. Here we estimate IBM's CAPM beta to be around $1.18$.

In [29]:
datanow.sort_values(['date_ff_factors'],axis=0, ascending=False, inplace=True)


x=datanow['Mkt']
y=datanow['rIBM']

fig, ax = plt.subplots()
plt.plot(x, y, 'o') # each dot is a given month
ax.set_ylabel('Return on IBM')
ax.set_xlabel('Market return')

m, b = np.polyfit(x, y, 1)  # fit the best possible line, beta is the slope of that line 
plt.plot(x, m*x + b)  # draw the line on our chart

print('Our estimate of IBMs (CAPM) beta is %.3f' %m)

plt.show() # show our work
Our estimate of IBMs (CAPM) beta is 1.184

Instead of taking the quick route above, we can also run a proper CAPM regression. For that we need the excess return on IBM and then we need to estimate the following model: $$r^{IBM}_t-r^F_t= \alpha + \beta \left(r^{S\&P}_t-r^F_t\right)+ \epsilon_t.$$ If CAPM holds, $\alpha$ should estimate to a number that is not statistically different from zero. We will need to create a few variables first.

As shown below, we get about the same estimate of $\beta$ as with the quick route (that's because $r^F$ shows little variability so it is as if it were not in the regression at all) and we do get a statistically insignificant $\alpha.$

In [30]:
datanow['rIBM-rF']=datanow['rIBM']-datanow['RF']

y=datanow['rIBM-rF']
x=datanow['Mkt-RF']
x=sm.add_constant(x) # we run a standard OLS with constant 
mod=sm.OLS(y,x)
res=mod.fit()
res.summary()
Out[30]:
OLS Regression Results
Dep. Variable: rIBM-rF R-squared: 0.570
Model: OLS Adj. R-squared: 0.563
Method: Least Squares F-statistic: 75.66
Date: Fri, 20 Nov 2020 Prob (F-statistic): 4.82e-12
Time: 11:26:16 Log-Likelihood: 98.626
No. Observations: 59 AIC: -193.3
Df Residuals: 57 BIC: -189.1
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -0.0092 0.006 -1.491 0.141 -0.022 0.003
Mkt-RF 1.1824 0.136 8.698 0.000 0.910 1.455
Omnibus: 2.485 Durbin-Watson: 2.140
Prob(Omnibus): 0.289 Jarque-Bera (JB): 1.632
Skew: -0.335 Prob(JB): 0.442
Kurtosis: 3.463 Cond. No. 22.6


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Now we want to estimate a 3-factor Fama-French model for IBM: $$ \left(r^{IBM}_t-r^F_t\right)= \alpha + \beta_{Market} \left(r^{Market}_t-r^F_t\right) + \beta_{HML} r^{HML}_t + \beta_{SMB} r^{SMB}_t + \epsilon_t$$ where $\epsilon $ is an error term with all the standard properties. The python for this and the results follow. Only the CAPM beta is significant. Its size changes ever so slightly (though not significantly) which owes to the different specification and, more so, the fact that Fama-French's market portfolio is bigger than just the SP500.

In [31]:
X=datanow[['Mkt-RF','SMB','HML']]
X=sm.add_constant(X) # here we're building our right-hand side variables
y=datanow['rIBM']

mod=sm.OLS(y,X)
res=mod.fit()
res.summary()
Out[31]:
OLS Regression Results
Dep. Variable: rIBM R-squared: 0.569
Model: OLS Adj. R-squared: 0.545
Method: Least Squares F-statistic: 24.16
Date: Fri, 20 Nov 2020 Prob (F-statistic): 4.18e-10
Time: 11:26:16 Log-Likelihood: 98.555
No. Observations: 59 AIC: -189.1
Df Residuals: 55 BIC: -180.8
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -0.0084 0.007 -1.249 0.217 -0.022 0.005
Mkt-RF 1.1778 0.155 7.590 0.000 0.867 1.489
SMB 0.0164 0.282 0.058 0.954 -0.548 0.581
HML -0.0116 0.198 -0.059 0.954 -0.408 0.385
Omnibus: 2.113 Durbin-Watson: 2.135
Prob(Omnibus): 0.348 Jarque-Bera (JB): 1.323
Skew: -0.307 Prob(JB): 0.516
Kurtosis: 3.403 Cond. No. 47.4


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.