Since my previous post about factor models, I have embarked on creating a new package named factorpricingmodelriskmodel. The aim of this opensource project is to generate factor models from specified market data. To better understand the technical aspects and explore the possibilities, I delved into several existing libraries. Now, Iâ€™d like to share an update on my progress through this checkpoint post.
pyportfolioopt
Instead of starting from scratch, I initially explored existing opensource libraries to see if they provided the features I was seeking. One such library that caught my attention is pyportfolioopt, an opensource creation by Robert Martin designed for portfolio optimization toolkits. The project stands out with its support for various essential portfolio optimization models and techniques, including the Markowitz optimizer and BlackLitterman allocation. While it also covers risk model generation from raw market data, it is currently limited to computing covariances from raw prices. I find the library to be among the top 3 open source libraries that closely resemble how institutional investors and asset managers manage their portfolio. However, there seems to be a missing piece in the risk model generation specifically related to factor models, and thatâ€™s where my new library comes into play.
qlib
Another library is Microsoftâ€™s qlib, an AIoriented quantitative investment library offering a wide range of models and examples, from point in time databases to reinforcement learning. The library sounds powerful, but, similar to my perspective on most Microsoftâ€™s products (yes, I admit to having some bias towards MSFT products), its features are fragmented and lack good documentation. I found a few modules in the source code about covariance estimators, including structured covariance estimator and covariance shrinkage, but I could not find their descriptions on the official documentation site. While I regard qlib as a great reference, I believe my new library can provide a better structure and a more coherent solution for factor model generation and risk analysis.
Factor pricing model
My primary focus is to create a library that centers on factor models themselves. Unlike existing libraries and scripts that mainly address risk model construction, my goal is to cater specifically to factor models. With this library, users will be able to:
 Choose from various models, e.g. statistical models, to build their risk models
 Compare and benchmark the forecasting accuracy of different risk models
 Easily scale up the size of their risk models without encountering significant challenges
In the following sections, I will walk you through how I approach these problems and outline the solutions.
Data
Building and benchmarking risk models heavily relies on highquality market data. Unfortunately, obtaining such historical data, especially for equities, can be challenging and often requires costly subscriptions to data providers. To demonstrate the models and allow everyone to try them out in Colab notebooks, I have chosen cryptocurrency data as a starting point. It doesnâ€™t mean that I havenâ€™t tested the library with equity data â€“ I indeed have and compared it with a risk model from a prominent vendor. However, using cryptocurrency data provides a more accessible playground for anyone to start experimenting without incurring significant costs.
Statistical model
In developing the factorpricingmodelriskmodel
library, I decided to focus on statistical models as the starting point. Unlike fundamental models, which can be challenging to define factors beyond obvious ones like momentum, statistical models, such as Principal Component Analysis (PCA), offer a more straightforward approach and also serve as a natural benchmark for other statistical models.
Forecast accuracy
To evaluate the forecast accuracy of different models, we need appropriate metrics. For this purpose, I have incorporated two metrics:

Bias test: This metric assesses forecasting accuracy of volatility on a time series of observed portfolio returns.

Value at risk (VaR): VaR is a risk metric that represents the maximum possible loss for a given holding horizon and specific level of confidence.
These metrics provide valuable insights into how well the models can forecast in general.
Agility
Scalability is a crucial consideration, especially as we increase the size of the universe. With the covariance matrix growing to N x N for an Nsized universe, the runtime complexity can become a concern. I have been exploring ways to manage runtime efficiently, even when scaling up the universeâ€™s size.
Additionally, researchers often face the challenge of comparing various models with benchmarks once they have defined the estimation and model universe. This comparison may involve their existing models retrieved from vendors. Finding an efficient and reliable way to perform such comparisons is an essential aspect of the libraryâ€™s design.
Universe
The concept of universes plays a pivotal role in the factorpricingmodelriskmodel library. Two types of universes are defined:

Estimation Universe: This comprises the top market cap instruments in the market. It is used to construct factor returns.

Model Universe: This consists of instruments with sufficient trading liquidity, such as the top 50% based on average daily volume. The model universe derives its factor exposures and covariances from the computed factor returns.
Both universes include instruments actively trading in the past 6 months and exclude stablecoins and forkedlike pairs (e.g., BTC vs. wrapped BTC). Similar to constructing equity universes, the process involves excluding ETFs and instruments listed on multiple exchanges.
By carefully defining and managing these universes, the library ensures robust and accurate factor model construction and risk analysis.
The following graphs illustrate the number of valid instruments in the estimation universe
and that in the model universe.
Comparing the number of valid instruments in the two universes in cryptocurrency class, we can see the ratio between the estimation and model universe is around 10:1. This ratio number aligns with the universes of equity factor models provided by professional vendors.
Factor Model
With the defined universes and market data in place, we can now proceed to build the factor model and compute the covariances for the model universe.
To begin, we create a DataFrame to represent the estimation universe, capturing its validity within the date index and instrument information across columns.
index  algorand  apecoin  avalanche2  axieinfinity  binanceusd 

20220911 00:00:00  false  false  true  true  false 
20220912 00:00:00  false  false  true  true  false 
20220913 00:00:00  true  true  true  true  false 
20220914 00:00:00  false  true  true  true  false 
20220915 00:00:00  false  true  true  true  false 
Next, we generate a risk model using statistical factors obtained through PCA, which serves as our benchmark. The parameter n_components
is set to 0.9, indicating that we select the number of components needed to explain more than 90% of the total variance. This model operates on a rolling basis of 180 days, using daily returns from the estimation universe. To enhance stability, the daily returns are windorized by +/20%.
from fpm_risk_model.rolling_factor_risk_model import RollingFactorRiskModel
from fpm_risk_model.statistical import PCA
model = PCA(n_components=0.9)
rolling_risk_model = RollingFactorRiskModel(
window=180,
model=model,
show_progress=True,
)
rolling_risk_model.fit(
X=est_universe_returns,
validity=est_universe_validity,
)
Similarly, we define the model universe in the same format as the estimation universe. With a larger set of returns in the model universe, we can then transform the risk model from the estimation universe to the model universe, enabling us to effectively analyse and evaluate the covariances for the entire model universe.
rolling_risk_model.transform(
X=model_universe_returns,
validity=model_universe_validity,
)
Covariance estimation
The rolling risk model allows us to derive covariances from any subsets of the model universe.
For instance, we can retrieve the unadjusted covariance on a specified date (20230531)
rolling_risk_model.get("20230531").cov()
or the unadjusted correlation between two specified pairs (BTC v.s. ETH)
rolling_risk_model.get("20230430").corr().loc[["BTC_Bitcoin", "ETH_Ethereum"], ["BTC_Bitcoin", "ETH_Ethereum"]]
For instance, the pairwise correlation of BTC and ETH on 20230430 is computed as follows.
Â  BTC_Bitcoin  ETH_Ethereum 

BTC_Bitcoin  1  0.892195 
ETH_Ethereum  0.892195  1 
However, for better covariance estimation, we can employ two techniques. Firstly, we can adjust the diagonal entries using alternative volatility estimation methods. For example, replacing the existing volatility estimation, derived from longerterm observations, with GARCH volatility estimation can better capture the volatility structure.
The second technique involves â€ścoverage shrinkageâ€ť. Covariance estimation typically requires a large sample of data to converge towards the true population values. However, due to the curse of dimensionality, smaller observation sets often lead to noisy sample covariances and even noisier inverses. This phenomenon is especially common in financial data with extreme and potentially noisy returns.
Coverage shrinkage is a powerful method to address this issue. By introducing a parameter \(\delta\), we can suppress the influence of offdiagonal elements in the covariance (or correlation) matrix, achieving a better signaltonoise ratio. This technique yields more reliable results, particularly in scenarios with smaller datasets or when dealing with extreme and noisy financial data.
The covariance shrinkage parameter \(\delta\) can be specified during covariance estimator creation using the following formula:
\[\hat{Q} = (1  \delta) * Q + \delta * \mu * I\]Here, \(\hat{Q}\) is the adjusted covariance matrix, \(Q\) is the sample covariance matrix, \(\mu\) is the mean variance, and \(I\) is the identity matrix.
For instance, to specify a constant shrinkage with \(\delta\) equal to 0.2 and adjust the volatility using a GARCH estimation named garch_est, you can follow the specified procedure.
from fpm_risk_model import RollingCovarianceEstimator
estimator = RollingCovarianceEstimator(
rolling_risk_model,
shrinkage_method="constant",
delta=0.2
)
estimator.cov(volatility=garch_est)
Result
The results presented below are derived from the analysis conducted on the crypto universe dataset. The factor model is initially trained using statistical models with an estimation universe and subsequently transformed with the model universe. The covariances, both unadjusted and adjusted, are compared using accuracy estimators to evaluate their performance.
The following types of covariances are compared
 Unadjusted covariances derived by PCA factor model
 Unadjusted covariances derived by APCA factor model
 Covariances adjusted with EWMA volatilities derived by PCA factor model
 Covariances adjusted with EWMA volatilities and constant shrinkage (0.2) derived by PCA factor model
The observed results are as follows:
 PCA is the stable benchmark for comparison
 APCA does not work better even theoretically it should give better short term variance predictions
 PCA adjusted with short term EWMA volatility gives a slightly better result than the benchmark
 Constant shrinkage delta does not give a sufficiently stable result. More advanced shrinkage technique is needed to get optimal shrinkage delta.
Whatâ€™s next
In conclusion, as the factorpricingmodelriskmodel library is still in its beta stage, the results obtained so far serve as an encouraging glimpse of its potential. With each iteration, we are committed to refining and expanding the libraryâ€™s capabilities, ensuring it becomes even more powerful and versatile. The comparisons between covariance estimation techniques have provided valuable insights that will guide our ongoing development efforts.
While the library is a work in progress, it is open to contributions and welcomes ideas from the community. As more features are added and improvements made, the goal is to provide users with a comprehensive and powerful tool for factor pricing models across various asset classes. Feedback, suggestions, and contributions from the community will play a crucial role in shaping the libraryâ€™s future and ensuring it becomes a valuable resource for researchers, institutional investors, and asset managers in their pursuit of sophisticated risk analysis and datadriven decisionmaking.
All kinds of contributions and ideas from the community are welcomed, as we continue to refine the library. Please do not hesitate to drop me PMs or emails if you want to understand more about the library.
Library factorpricingmodelriskmodel