10
\$\begingroup\$

Background

In an attempt to learn a bit more about investing and economics, I've begun writing a simple historical portfolio analysis tool in Python.

The Portfolio class is designed to facilitate portfolio analysis by fetching historical market data for a given list of assets and set of asset weightings within a specified date range. It calculates various portfolio metrics such as returns, volatility, and risk measures including P&L, beta, Sharpe ratio, and Conditional Value at Risk (CVaR). Additionally, it offers functionality to decompose asset volatility contributions and provides insights into portfolio performance over time.

I intend for the portfolio to automatically rebalance, and (think) I have achieved that effect by simply pulling data with the same frequency as the sought rebalancing frequency, and multiplying the monthly market returns (in %) by the weightings. Regrettably, I'm unable to compare this to readily available portfolio analysis tools, as they incorporate a variety of additional effects that I have not given consideration to. I'm unsure as to whether this accomplishes the desired effect, as I'm still not entirely comfortable with all of the jargon.


Code

import yfinance as yf
import pandas as pd
import numpy as np
from dataclasses import dataclass

@dataclass
class Portfolio:
    tickers: list
    weights: list
    start_date: str
    end_date: str
    rebalancing_frequency: str = '1mo'

    def __post_init__(self):
        if len(self.tickers) != len(self.weights):
            raise ValueError("The number of tickers must match the number of weights")
        self.weights = np.array(self.weights) / np.sum(self.weights)  # Normalize weights
        self.market_data = self.get_market_data()
        self.market_returns = self.calculate_market_returns()

    def get_market_data(self):

        try:
            data = yf.download(self.tickers, start=self.start_date, end=self.end_date, 
                               interval=self.rebalancing_frequency, progress=False)['Adj Close']
            return data
        except Exception as e:
            print(f"Error fetching market data: {e}")
            return pd.DataFrame()

    def calculate_market_returns(self):
        returns = self.market_data.pct_change().dropna()
        return returns

    def asset_volatility_decomposition(self):
        asset_volatilities = self.market_returns.std(axis=0)
        asset_volatility_decomposition = asset_volatilities * self.weights
        return asset_volatility_decomposition

    def portfolio_return_metrics(self):
        portfolio_returns = self.market_returns @ self.weights
        portfolio_value = (1 + portfolio_returns).cumprod()
        cumulative_pnl = portfolio_value - 1
        pnl = portfolio_value.diff().fillna(0)
        return portfolio_returns, portfolio_value, cumulative_pnl, pnl

    def portfolio_volatility_metrics(self, risk_free_rate=0.0, alpha=0.05):
        portfolio_returns, _, _, _ = self.portfolio_return_metrics()
        market_returns = self.market_returns.mean(axis=1)
        portfolio_beta = np.cov(portfolio_returns, market_returns)[0, 1] / np.var(market_returns)
        portfolio_cvar = portfolio_returns.quantile(alpha)
        portfolio_annualized_std = portfolio_returns.std() * np.sqrt(12) * 100
        annualized_sharpe_ratio = (portfolio_returns.mean() - risk_free_rate) / portfolio_std * np.sqrt(12)
        downside_returns = portfolio_returns[portfolio_returns < risk_free_rate]
        downside_std = downside_returns.std() if not downside_returns.empty else np.nan
        sortino_ratio = (portfolio_returns.mean() - risk_free_rate) / downside_std if not np.isnan(downside_std) else np.nan
        return portfolio_annualized_std, portfolio_beta, annualized_sharpe_ratio, portfolio_cvar, sortino_ratio

Example Usage

portfolio = Portfolio(
    tickers=['APPL', 'MSFT'],
    weights=[0.6, 0.4],
    start_date='2020-01-01',
    end_date='2024-01-01'
)

returns, value, cumulative_pnl, pnl = portfolio.portfolio_return_metrics()
portfolio_annualized_std, portfolio_beta, sharpe_ratio, cvar, sortino_ratio = portfolio.portfolio_volatility_metrics()


Goals

I seek to....

  1. Improve readability & conciseness.
  2. Identify any missed edge cases, obvious bugs, etc.
  3. Improve error handling, where possible/necessary.
\$\endgroup\$
1
  • 1
    \$\begingroup\$ using types is good, but tickers: list is not enough: it could be a list of anything. I assume list[str], with weights as list[float]? \$\endgroup\$
    – njzk2
    Commented May 28 at 18:50

2 Answers 2

10
\$\begingroup\$

time representation

class Portfolio:
    ...
    start_date: str
    end_date: str

I'm sad those aren't datetimes, or at least dates. Then we get validation (no February 30th), and there's no misunderstanding about what it represents.

Also, tickers looks to me like it's really of list[str]. Similarly for FP weights. I encourage you to lint with $ mypy --strict *.py.

You're using the identifier weights to denote both "raw" and "normalized" weights. I think this is Fine, but will just note in passing that you might possibly consider using two different names for the two different concepts, maybe raw_weights.

fortran style

These "coupled arrays" are reminiscent of how Fortran libraries had to consume such inputs:

        if len(self.tickers) != len(self.weights):

I am glad you're making the relationship very explicit here. But still. Consider inventing a new namedtuple pair, and storing a single list of that. Then the relationship is clear, and we can't possibly have the wrong number of tickers or weights.

doing too much at init time

        self.market_data = self.get_market_data()
        self.market_returns = self.calculate_market_returns()

Consider deferring this pair of actions until later. As written, a test suite will have too many (slow!) dependencies. We can't possibly create a small test Portfolio without internet access plus Yahoo servers being in good shape.

Consider writing a local .csv cache file, so if you're re-running an analysis you can skip the download.

fatal exception

In get_market_data() I can't imagine what use an empty dataframe will be to anyone. Rather than a try, consider just letting the exception bubble up the call stack.

In the parlance of DbC, you can't guarantee your own promise, so callers won't be able to guarantee theirs. Better to be honest about it and signal an exception, rather than swallowing it as the OP code does.

complex return values

        return portfolio_returns, portfolio_value, cumulative_pnl, pnl
        ...
        return portfolio_annualized_std, portfolio_beta, annualized_sharpe_ratio, portfolio_cvar, sortino_ratio

We're starting to see enough variables there that humans might possibly mix them up, transpose them, do the wrong thing when a new release adds one more, and so on.

Consider returning named tuples so there can be no such confusion.

        portfolio_returns, _, _, _ = self.portfolio_return_metrics()

Consider combining both methods, and returning just a single named tuple. Neither method seems to be all that expensive to compute, even for a caller that only needs a subset of the result.

months, magic constant

I found this (and the next) line slightly disconcerting:

        portfolio_annualized_std = portfolio_returns.std() * np.sqrt(12) * 100

It seems to put 28-day February on the same footing as neighboring 31-day months, a ten percent discrepancy. We're not using mythical 30-day banker months or anything? We don't track days per month?

Some folks prefer to work only with isocalendar weeks. So for example, instead of considering an interval whose start is day 1 of some month, you might only consider an interval whose start is a Monday. And then your "monthly" figures cover 28 days. The downside, of course, is this tends to align poorly with quarterly earnings calls.

citation

You use several fairly ordinary finance formulas, with ordinary naming. Nonetheless, it wouldn't hurt to include # comments or """docstrings""" that mention the book or URL you relied on when writing down those formulas. It will help future maintenance engineers, who may not have easy access to the very nice Review Context that you helpfully supplied.


This codebase achieves its design goals.

I would be willing to delegate or accept maintenance tasks on it.

\$\endgroup\$
6
\$\begingroup\$

You've mixed degrees of freedom by computing some of the stats with Pandas (\$N-1\$) and others with NumPy (\$N\$):

  • pandas.Series.std (default ddof=1)

    asset_volatilities = self.market_returns.std(axis=0)
    ...
    portfolio_annualized_std = portfolio_returns.std() * np.sqrt(12) * 100
    ...
    downside_std = downside_returns.std() if not downside_returns.empty else np.nan
    
  • numpy.cov (default ddof=None) and numpy.var (default ddof=0)

    portfolio_beta = np.cov(portfolio_returns, market_returns)[0, 1] / np.var(market_returns)
    

So choose a ddof and stick to it. Note that NumPy is the odd one out by using the biased ddof=0, whereas Pandas, R (sd), and MATLAB (std) all use the unbiased ddof=1.

Here the simplest fix would be to switch the NumPy functions (commented out below) to pandas.Series.cov and pandas.Series.var:

# portfolio_beta = np.cov(portfolio_returns, market_returns)[0, 1] / np.var(market_returns)
portfolio_beta = portfolio_returns.cov(market_returns) / market_returns.var()

Also a couple typos:

  1. APPL -> AAPL
  2. portfolio_std doesn't exist, so maybe you meant portfolio_annualized_std
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.