Online Portfolio Selection
Online Portfolio Selection
Li and Hoi
With the aim to sequentially determine optimal allocations across a set of
assets, Online Portfolio Selection (OLPS) has significantly reshaped the
Selection
financial investment landscape. Online Portfolio Selection: Principles
and Algorithms supplies a comprehensive survey of existing OLPS
principles and presents a collection of innovative strategies that leverage
machine learning techniques for financial investment.
The book presents four new algorithms based on machine learning Principles and Algorithms
techniques that were designed by the authors, as well as a new back-
K23731
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access [Link]-
[Link] ([Link] or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
[Link]
and the CRC Press Web site at
[Link]
Contents
List of Figures ix
List of Tables xi
Preface xv
Acknowledgments xvii
Authors xix
I Introduction 1
1 Introduction 3
1.1 Background 4
1.1.1 Challenge 1: Voluminous Financial Instruments 4
1.1.2 Challenge 2: Human Behavioral Biases 4
1.1.3 Challenge 3: High-Frequency Trading 4
1.1.4 Algorithmic Trading and Machine Learning 4
1.2 What Is Online Portfolio Selection? 5
1.3 Methodology 7
1.4 Book Overview 7
2 Problem formulation 11
2.1 Problem Settings 11
2.2 Transaction Costs and Margin Buying Models 13
2.3 Evaluation 14
2.4 Summary 16
II Principles 17
3 Benchmarks 21
3.1 Buy-and-Hold Strategy 21
3.2 Best Stock Strategy 21
3.3 Constant Rebalanced Portfolios 22
6 Pattern Matching 35
6.1 Sample Selection Techniques 36
6.2 Portfolio Optimization Techniques 37
6.3 Combinations 38
6.4 Summary 39
7 Meta-Learning 41
7.1 Aggregating Algorithms 41
7.2 Fast Universalization 42
7.3 Online Gradient and Newton Updates 43
7.4 Follow the Leading History 43
7.5 Summary 43
III Algorithms 45
8 Correlation-Driven Nonparametric Learning 47
8.1 Preliminaries 48
8.1.1 Motivation 48
8.2 Formulations 50
8.3 Algorithms 51
8.4 Analysis 56
8.5 Summary 57
IV Empirical Studies 93
12 Implementations 95
12.1 The OLPS Platform 95
12.1.1 Preprocess 96
12.1.2 Algorithmic Trading 96
12.1.3 Postprocess 97
12.2 Data 97
12.3 Setups 99
12.3.1 Comparison Approaches and Their Setups 100
12.4 Performance Metrics 100
12.5 Summary 101
V Conclusion 135
15 Conclusions 137
15.1 Conclusions 137
15.2 Future Directions 138
15.2.1 On Existing Work 138
15.2.2 On Practical Issues 140
15.2.3 Learning for Index Tracking 140
Bibliography 193
Index 205
ix
xi
xiii
Introduction
Computational intelligence techniques, including machine learning and data mining,
have significantly reshaped the financial investment community over recent decades.
Examples include high-frequency trading and algorithmic trading. This book studies a
fundamental problem in computational finance, or online portfolio selection (OLPS),
which aims to sequentially determine optimal allocations across a set of assets. This
book investigates this problem by conducting a comprehensive survey on existing
principles and presenting a family of new strategies using machine-learning tech-
niques. A back-test system using historical data has been developed to evaluate the
performance of trading strategies.
Our goal in writing this monograph is to present a self-contained text to a wide
range of audiences, including graduate students in finance, computer science, and
statistics, as well as researchers and engineers who are interested in computational
investment. The readers are encouraged to visit our project website for more updates:
[Link]
Organization
Part I introduces the OLPS problem. Chapter 1 introduces the background and sum-
marizes the contributions of this book. Chapter 2 formally formulates OLPS as a
sequential decision task.
Part II presents some key principles for this task. Chapter 3 summarizes three
benchmarks: the Buy-and-Hold strategy, Best Stock strategy, and Constant Rebal-
anced Portfolios. Chapter 4 presents the principle of Follow the Winner, which moves
weights from winning assets to losing assets. Chapter 5 presents an opposite princi-
ple called Follow the Loser, which moves weights from losers to winners. Chapter 6
demonstrates the principle of Pattern Matching, which exploits similar patterns among
historical markets. Chapter 7 talks about Meta-Learning, which views the strategies
as assets, and thus hyperstrategies.
Part III designs four novel algorithms to solve the OLPS problem. All algo-
rithms apply the state-of-the-art machine-learning techniques to the task. Chapter 8
designs a new strategy named CORrelation-driven Nonparametric (CORN) learn-
ing, which overcomes the limitations of existing pattern matching–based strategies
using Euclidean distance to measure the similarity between two patterns. Chapter 9
develops Passive–Aggressive Mean Reversion (PAMR), which is based on the
first-order passive–aggressive online learning method, and Chapter 10 designs
xv
Bin Li
Economics and Management School
Wuhan University, People’s Republic of China
xvii
Dr. Bin Li received a bachelor’s degree in computer science from Huazhong Univer-
sity of Science and Technology, Wuhan, China, and a bachelor’s degree in economics
from Wuhan University, Wuhan, China, in 2006. He earned a PhD degree from the
School of Computer Engineering of Nanyang Technological University, Singapore, in
2013. He completed the CFAProgram in 2013. He is currently an associate professor of
finance at the Economics and Management School of Wuhan University. Dr. Li was a
postdoctoral research fellow at the Nanyang Business School of Nanyang Technologi-
cal University. His research interests are computational finance and machine learning.
He has published several academic papers in premier conferences and journals.
Dr. Steven C.H. Hoi received his bachelor’s degree in computer science from
Tsinghua University, Beijing, China, in 2002, and both his master’s and PhD degrees in
computer science and engineering from The Chinese University of Hong Kong, Hong
Kong, China, in 2004 and 2006, respectively. He is currently an associate professor in
the School of Information Systems, Singapore Management University, Singapore.
Prior to joining SMU, he was a tenured associate professor in the School of Computer
Engineering, Nanyang Technological University, Singapore. His research interests
are machine learning and data mining and their applications to tackle real-world big
data challenges across varied domains, including computational finance, multimedia
information retrieval, social media, web search and data mining, computer vision and
pattern recognition, and so on. Dr. Hoi has published more than 150 referred arti-
cles in premier international journals and conferences. As an active researcher in his
research communities, he has served as general co-chair for ACM SIGMM Workshops
on Social Media (WSM’09-11), program co-chair for Asian Conference on Machine
Learning (ACML’12), editor for Social Media Modeling and Computing, guest edi-
tor for journals such as Machine Learning and ACM TIST, associate editor-in-chief
of Neurocomputing, associate editor for several reputable journals, area chair/senior
PC member for conferences, including ACM Multimedia 2012 and ACML’11–’15,
technical PC member for many international conferences, and referee for top journals
and magazines. He has often been invited for external grant review by worldwide
funding agencies, including the US NSF funding agency, Hong Kong RGC funding
agency, and so on. He is a senior member of IEEE and a member of AAAI and ACM.
xix
Introduction
Introduction
∗ Sell side often refers to investment banks that sell investment services, such as routing orders to
exchanges, to asset management firms.
† Buy side usually refers to the asset management firms that buy the services from the sell side. For
example, Citadel, an asset management firm (buy side), may send their purchase orders via Goldman Sachs,
an investment bank (sell side).
‡ Calendar anomalies refer to the patterns in asset returns from year to year, or month to month. One
famous example is the January effect.
§ Fundamental anomalies are the patterns in asset returns related to the fundamental values of a company,
such as size effect and value effect.
¶ Technical anomalies are patterns related to historical prices, such as momentum, and contrarian.
∗ Here, one million is an arbitrary number; of course, the more the better.
† Treasury bill is often regarded as a risk-free asset, earning a guaranteed risk-free return. Once he does
not want to buy any stocks, he can put all money in Treasury bills, instead of cash.
‡ Here, “month” represents a period, which can be one day, one week, or one month, etc.
§ For example, he may buy $5000 MSFT stock, $3000 GS stocks, and $2000 Treasury bills.
1.3 Methodology
OLPS for real-world trading tasks is challenging in that the market information
(mainly the market data) arrives sequentially, and a portfolio manager has to make a
decision immediately based on the known information. The problem is endogenously
online. Two types of machine-learning methodologies have been explored to design
strategies for this task.
The first methodology is batch learning, where the model is trained from a
batch of training instances. In this way, we assume that all price information (and
maybe other information) is complete at one decision point, and thus one can deploy
batch-learning methods to learn the portfolios. In this mode, one decision is always
irrelevant to previous decisions. In particular, we adopt such a mode in one proposed
algorithm, which deploys nonparametric learning (or instance-based learning, or
case-based learning; Aha 1991; Aha et al. 1991; Cherkassky and Mulier 1998). With
an effective trading principle, such a mode can achieve the goal of our project.
The second methodology is online learning (or incremental learning), where the
model is trained from a single instance in a sequential manner (Shalev-Shwartz 2012;
Loveless et al. 2013). Online learning is the process of solving a sequence of prob-
lems, given (maybe partial) the solutions to previous problems and possibly additional
side information. This definition naturally fits our problem, which is innately online.
Contrary to the batch mode, in this mode, one decision is often connected to previous
decisions. In particular, in the remaining three of the four algorithms, we adopt two
types of online learning techniques (Crammer et al. 2006, 2008, 2009; Dredze et al.
2008) to solve the problem. Besides, to achieve the target of our project, it is also
important to exploit an effective trading principle when designing a specific strategy.
In this book, we will introduce a variety of classical and modern trading principles
that are commonly used for designing OLPS strategies.
After designing a trading strategy, we need to evaluate the effectiveness of the
proposed strategy using a back-test methodology. In particular, we feed the historical
market data into the testbed to evaluate the strategy and examine how it performs.
Through an extensive set of evaluation and analysis of the back-testing performance,
we can decide how likely the proposed trading strategy may survive in real-life appli-
cations. In this book, we developed an open-source back-testing system, named Online
Portfolio Selection, which allows us to benchmark empirical performances of differ-
ent strategies and algorithms on the same platform. Throughout the book, all the
algorithms and strategies will be evaluated on this platform.
Problem formulation
Benchmarks Follow the winner Follow the loser Pattern matching–based Meta-algorithms
PAMR
Passive–aggressive mean reversion
CORN CWMR
Correlation-driven nonparametric learning Confidence-weighted mean reversion
OLMAR
Online moving average reversion
Part V: Conclusion
Part I introduces the background, motivations, and basic definitions of the OLPS
problem. Specifically, Chapter 1 introduces the background of computational finance,
algorithmic trading, and machine learning and their connections to OLPS. Chapter 2
formally formulates the problem of OLPS as a scientific task.
Part II summarizes the main principles and algorithms of OLPS. In particular,
Chapter 3 introduces a family of strategies commonly known as the benchmark prin-
ciples for OLPS. Chapter 4 introduces the principle of “follow the winner,” which is
Problem Formulation
This chapter introduces the problem setting of online portfolio selection (OLPS)
and formally formulates the problem mathematically as a sequential decision task.
We further relax the problem setting by adding two practical constraints: transaction
costs and margin buying. Finally, we introduce the idea of how to evaluate a strategy’s
performance.
Specifically, this chapter is organized as follows. Section 2.1 formally formulates
the OLPS task as a sequential decision problem. Section 2.2 relaxes the transaction
costs and margin buying constraints. Section 2.3 introduces several evaluation metrics
for the task. Finally, Section 2.4 summarizes this chapter.
∗ When m = 1, the problem is reduced to single-stock trading, which is out of the scope of this book.
†A period can be a week, a day, an hour, or even a second in high-frequency trading.
‡ Here we adopt simple gross return, while one may choose simple net return, i.e., pt,i −pt−1,i .
pt−1,i
For the calculation of the first period, suppose we have p0,i .
§ For example, x = 2 means that the investment on an asset will increase by 100%, or double its
t,i
initial investment. xt,i = 1 means that the capital will remain its initial capital.
11
m(t−1)
bt : R+ → m , t = 2, 3, . . . ,
where bt = bt x1t−1 is the portfolio determined at the beginning of the t-th period
upon observing past market behaviors. We denote by bn1 = {b1 , . . . , bn } the strategy
for n periods, which is the output of an OLPS strategy.
At the t-th period, a portfolio bt produces aportfolio period return st , that is, the
wealth increases by a factor of st = b t xt =
m †
i=1 bt,i xt,i . Since we reinvest and
adopt relative prices, the wealth would grow multiplicatively. Thus, after n periods,
a portfolio strategy bn1 will produce a portfolio cumulative wealth of Sn , which
increases the initial wealth by a factor of nt=1 bt xt , that is,
n
Sn (bn1 , x1n ) = S0 b
t xt ,
t=1
The preceding assumptions are nontrivial. We will further analyze and discuss their
implications and effects for our empirical studies in Sections 13.4 and 14.1.
Finally, as we are going to design intelligent learning algorithms that fit the above
model, let us fix the objective of the proposed learning algorithms. For a portfolio
selection task, one can choose to maximize risk-adjusted return (Markowitz 1952;
Sharpe 1964) or to maximize cumulative return (Kelly 1956; Thorp 1971) at the end
of a period. While the model is online, which contains multiple periods, we choose
to maximize the cumulative return (Hakansson 1971),∗ which is also the objective of
most existing algorithmic studies.
Another practical issue is margin buying, which allows the portfolio managers
to buy securities with cash borrowed from securities brokers, using their own equity
positions as collateral. Following existing studies (Cover 1991; Helmbold et al. 1998;
Agarwal et al. 2006), we relax this constraint and evaluate it empirically. We assume
the margin setting to be 50% down and a 50% loan,∗ at an annual interest rate
of 6% (equivalently, the corresponding daily interest rate of borrowing, c, is set to
0.000238). With such a setting, a new asset named “margin component” is generated
for each asset, and its price relative for period t equals 2 × xt,i − 1 − c. In the case
of xt,i ≤ 1+c
2 , which means the stock drops more than half, we simple set its mar-
gin component to 0 (Li et al. 2012).† As a result, if margin buying is allowed, the
total number of assets becomes 2m. By adding such a “margin component,” we can
magnify both the potential profit or loss on the i-th asset.‡
2.3 Evaluation
One standard criterion to evaluate an OLPS strategy is its portfolio cumulative wealth
at the end of trading periods. As we set the initial wealth, S0 = 1 and thus Sn also
denote the portfolio cumulative return, which is the ratio of final portfolio cumulative
wealth divided by its initial wealth. Another equivalent criterion, which considers
√
compounding effect, is annualized percentage yield (APY), that is, APY = y Sn − 1,
where y is the number of years corresponding to n periods.§ APY measures the average
wealth increment that a strategy could achieve in a year. Typically, the higher the
portfolio cumulative wealth or annualized percentage yield, the better the strategy’s
performance is.
Besides the absolute return metrics, it is also important to evaluate a strategy’s
risk and risk-adjusted return (Sharpe 1963, 1994). One common criterion is the annu-
alized standard deviation of portfolio period returns to measure volatility risk and
∗ That is, if one has $100 stock (down or collateral) one can borrow at most $100 cash (loan).
† Such a measure is not perfect since it manually changes the margin component, although less than
5 per dataset. One may refer to Györfi et al. (2012, Chapter 4) for other solutions to the possibility of ruin.
‡ For example, assume two assets with price relatives of (1.1, 0.9). After adjustment, the price relative
vector becomes (1.1, 0.9, 1.2, 0.8). Putting wealth on the latter two margin components, the portfolio’s
profit or loss magnifies. That is, 10% profit (1.1) becomes 20% (1.1 × 2 − 1 − c) and 10% loss (0.9) also
becomes 20% (0.9 × 2 − 1 − c). Note that the portfolio vector representing the proportions of capital is
still a simplex.
§ One year consists of 252 trading days or 50 trading weeks.
where Rf is the risk-free return† and σp is the annualized standard deviation. The
higher the annualized SR, the better the strategy’s (volatility) risk-adjusted return is.
Portfolio management community often conducts drawdown analysis (Magdon-
Ismail and Atiya 2004) to measure the decline from a historical peak of portfolio
cumulative wealth. Formally, a strategy’s drawdown (DD) at period t is defined as
DD(t) = sup[0, supi∈(0,t) Si − St ]. Its maximum drawdown (MDD) is the maximum
of drawdowns over all periods and can effectively measure a strategy’s downside
risk. Formally, maximum drawdown for a horizon of n, MDD(n), is defined as
Moreover, practitioners also adopt the Calmar ratio (CR) (Young 1991) to measure
a strategy’s drawdown risk-adjusted return:
APY
CR = .
MDD
The smaller the maximum drawdown, the more drawdown risk the strategy can tol-
erate. The higher the Calmar ratio, the better (drawdown) risk-adjusted return the
strategy is.
To test whether simple luck can generate the return achieved by a strategy, portfo-
lio management practitioners (Grinold and Kahn 1999) can conduct statistical tests.
Since all test datasets are just samples of the market population, such tests can val-
idate a strategy for future. We conduct a Student’s t-test to determine the likelihood
that the observed profitability is due to chance alone (under the assumption that a
strategy is not profitable in the population). Since the sample profitability is being
compared with no profitability, 0 is subtracted from the sample mean profit/loss. Note
that (daily) profit/loss equals (daily) return minus 1. The standard error of mean is cal-
culated as the standard deviation divided by square root of the number of periods. The
t-statistic is the sample profit mean‡ divided by the sample standard error. Finally,
the probability of the t-statistic can be calculated with a degree of freedom equal to the
number of periods minus 1. Note that the Student’s t-test assumes that the underlying
distribution of data is normal. According to the central limit theorem, as the sample
size increases, the distribution of the sample mean approaches normal. If a sample
∗ Here, 252 denotes the average number of annual trading days. For other frequencies, we can choose
their corresponding numbers.
† Typically, it equals the return of Treasury bills, and we fix it at 4% per year, or 0.000159 per day.
‡ Suppose we compare the sample profit mean with 0.
2.4 Summary
Online portfolio selection (OLPS) is a fundamental and practical computational
finance problem. It can be mathematically formulated as a sequential decision task
that aims to decide the best sequence of decisions to maximize the investment goals in
the long run. It has been extensively studied in the literature, and recent years have wit-
nessed a rapid growth of fruitful research achievements. The next part will introduce
a family of important principles widely used for solving this challenging task.
Principles
17
(Continued)
Table II.1 (Continued) Principles and representative online portfolio selection algorithms
Classifications Algorithms Representative References
Pattern Nonparametric Histogram Györfi et al. (2006)
matching– Log-Optimal Strategy
based Nonparametric Kernel-Based Györfi et al. (2006)
approaches Log-Optimal Strategy
Nonparametric Nearest Györfi et al. (2008)
Neighbor Log-Optimal
Strategy
Correlation-Driven Li et al. (2011a)
Nonparametric Learning
Strategy
Nonparametric Kernel-Based Györfi et al. (2007)
Semi-Log-Optimal Strategy
Nonparametric Kernel-Based Ottucsák and Vajda (2007)
Markowitz-Type Strategy
Nonparametric Kernel-Based Györfi and Vajda (2008)
GV-Type Strategy
Meta- Aggregating Algorithm Vovk (1990); Vovk and
algorithms Watkins (1998)
Fast Universalization Algorithm Akcoglu et al. (2005)
Online Gradient Updates Das and Banerjee (2011)
Online Newton Updates Das and Banerjee (2011)
Follow the Leading History Hazan and Seshadhri (2009)
Source: Li and Hoi (2014).
Benchmarks
whereb · x denotes
the inner product b x. The BAH strategy with uniform portfolio
b1 = m1 , . . . , m1 is referred to as the uniform BAH strategy, which is usually adopted
as a market strategy to produce a market index.†
∗ For example, assuming two assets with price relative vectors (2, 1) and the portfolio at the beginning of
(0.5×2,0.5×1)
a period is (0.5, 0.5), then the actual weights at the end of the period becomes 0.5×2 + 0.5×1 = (0.67, 0.33).
† Market index can also be calculated using other methods, such as capitalization weighted index and
market share weighted index.
21
n
Sn (CRP(b)) = b xt .
t=1
One special CRP with uniform portfolio b = m1 , . . . , m1 is named as Uniform Con-
stant Rebalanced Portfolios (UCRP). Another special CRP is the optimal offline†
CRP strategy, whose portfolio can be calculated as
n
b = arg max Sn (CRP(b)) = arg max b xt ,
bn ∈m b∈m t=1
which is convex and can be efficiently solved. The CRP with b is denoted as Best
Constant Rebalanced Portfolios (BCRPs), which achieve a final cumulative wealth as
Note that BCRP is a hindsight strategy, which can only be calculated with com-
plete market sequences. Cover (1991) proved that BCRP is the best strategy in an
independent and identically distributed (i.i.d.) market and showed its benefits as a
target, that is, BCRP exceeds the Best Stock strategy, Value Line Index (geometric
mean of asset returns), and Dow Jones Index (arithmetic mean of asset returns, or
BAH). In addition, BCRP is invariant under permutations of market sequence, that
is, it does not depend on the order in which x1 , x2 , . . . , xn occur.
One desired theoretical result for an OLPS algorithm is universality (Cover 1991;
Ordentlich 2010). An algorithm Alg is universal if the average (external) regret
(Stoltz and Lugosi 2005; Blum and Mansour 2007) for n periods asymptotically
approaches 0, that is,
1 1 n→∞
regretn (Alg) = (log Sn (BCRP) − log Sn (Alg)) −−−→ 0. (3.1)
n n
In other words, for an arbitrary sequence of price relatives, a universal algorithm
asymptotically approaches the same exponential growth rate as the BCRP strategy.
Since CRP rebalances to a fixed portfolio each period, its frequent transactions
will incur high transaction costs. Helmbold et al. (1998) proposed a Semi-Constant
Rebalanced Portfolio, which rebalances on selected periods rather than every period.
∗ CRP differs from BAH as the former actively rebalances to a predefined portfolio for every period,
while the latter does not rebalance during the entire trading period. However, the portfolio holding of BAH
passively changes as the stock prices fluctuate.
† Contrary to the online case, offline assumes that all price relatives over the n periods are available.
The first principle, follow the winner, is characterized by increasing the weights of
more successful experts or stocks. Rather than targeting market and best stock, algo-
rithms in this category often aim to track the BCRP strategy, that is, their target is to
be universal.
This chapter is organized as follows. Section 4.1 introduces Cover’s universal
portfolios (UP) algorithm, and Section 4.2 details the exponential gradient (EG) algo-
rithm. Sections 4.3 and 4.4 introduce the follow the leader (FTL) and follow the
regularized leader (FTRL) approaches, respectively. Finally, Section 4.5 summarizes
the follow the winner principle.
∗An FOF holds a portfolio of other investment funds, rather than directly investing in stocks, futures,
etc.
23
Note that at the beginning of period t + 1, one CRP manager’s wealth (historical
performance) equals St (b)dμ(b). Incorporating the initial wealth of S0 = 1, the final
cumulative wealth is the weighted average of CRP managers’ wealth (Cover and
Ordentlich 1996, Eq. (24)):
Sn (UP) = Sn (b)dμ(b). (4.1)
m
One special case is that μ equals a uniform distribution; the portfolio update
reduces
to Cover’s UP (Cover 1991, Eq. (1.3)). Another special cases is Dirichlet 12 , . . . , 12
weighted UP (Cover and Ordentlich 1996), which is proved to be a more optimal
allocation.
Alternatively, if a loss function is defined as the negative logarithmic function
of portfolio return, Cover’s UP is actually an exponentially weighted average fore-
caster (Cesa-Bianchi and Lugosi 2006). The regret (Cover 1991) achieved by Cover’s
UP is O(m log n), and its time complexity is O(nm ), where m denotes the number
of stocksand n refers to the number of periods. Cover and Ordentlich (1996) proved
that the 12 , . . . , 12 weighted UP has the same scale of regret bound, but a better
constant term (Cover and Ordentlich 1996, Theorem 2).
As Cover’s UP is based on an ideal market model, one research direction is to
extend the algorithm to handle various realistic assumptions. Cover and Ordentlich
(1996) considered side information, including experts’ opinions and fundamental
data. Cover and Ordentlich (1998) extended the algorithm to handle short selling and
margin, and Blum and Kalai (1999) took account of transaction costs.
Another research direction is to generalize Cover’s UP with different base classes,
rather than the CRP strategy. Jamshidian (1992) generalized the algorithm for con-
tinuous time markets and presented its long-term performance. Vovk and Watkins
(1998) applied the aggregating algorithm (AA) (Vovk 1990) to a finite number of
arbitrary investment strategies, of which Cover’s UP becomes a specialized case
when applied to an infinite number of CRPs. Ordentlich and Cover (1998) analyzed
the minimal ratio of final wealth achieved by any nonanticipating investment strategy
to that of BCRP and presented a strategy to achieve such an optimal ratio. Cross and
Barron (2003) generalized Cover’s UP from the CRP strategy class to any parameter-
ized target class and proposed a computation favorable universal strategy. Akcoglu
et al. (2005) extended Cover’s UP from a parameterized CRP class to a wide class
of investment strategies, including trading strategies operating on a single stock and
portfolio strategies operating on the whole stock market. Kozat and Singer (2011)
proposed a similar universal algorithm based on the class of semiconstant rebalanced
where R(b, bt ) denotes a regularization term and η > 0 denotes a learning rate. One
straightforward interpretation is to track the best stock in the last period while keeping
previous portfolio information via a regularization term.
Helmbold et al. (1998) proposed the EG strategy, which is based on the same
algorithm for mixture estimation (Helmbold et al. 1997). Following Equation 4.2,
EG adopts relative entropy as its regularization term, that is,
m
bi
R(b, bt ) = bi log .
bt,i
i=1
EG’s formulation is convex in b; however, it is hard to solve since the log function is
nonlinear. Thus, the authors adopted log’s first-order Taylor expansion at bt , that is,
xt
log b · xt ≈ log(bt · xt ) + (b − bt ).
bt · xt
Then the nonlinear log term becomes linear and the optimization is easy to solve.
Solving the optimization, we can obtain EG’s update rule as
xt,i
bt+1,i = bt,i exp η Z, i = 1, . . . , m,
bt · xt
where Z denotes the normalization term such that the portfolio weights sum to 1.
t
bt+1 = b∗t = arg max log(b · xτ ). (4.3)
b∈m τ=1
Intuitively, this category follows the BCRP leader over the known periods, and the
ultimate leader is BCRP over the whole periods.
∗ In case of η = 0, b 1
t+1,i = bt,i = · · · = b1,i = m .
t
bt+1 = arg max log(b · xτ ),
b∈m τ=t−W +1
t
β
bt+1 = arg max log(b · xτ ) − R(b), (4.4)
b∈m 2
τ=1
where β denotes a trade-off parameter and R(b) is the regularization term on b. Note
that the first term includes all historical information; thus, the regularization term only
relates to the next portfolio, which is different from the EG algorithm. One typical
regularization is L2-norm, that is, R(b) = b2 .
t xτ xτ t xτ
with At = τ=1 (bτ ·xτ )2 + Im and ct = 1 + β1 τ=1 bτ ·xτ , where β is a trade-off
parameter, δ is a scaling term, Im denotes an m × m diagonal matrix, and A t
m (·) is
an exact projection to the simplex domain.
ONS’s regret bound is O(m1.5 log(mn)), which is slightly worse than that of
Cover’s UP. Since it iteratively updates the first- and second-order information, it
costs O(m3 ) per period, which is irrelevant to the number of periods. To sum up, its
total time cost is O(m3 n).
While FTRL focuses on the worst-case investing, Hazan and Kale (2009, 2012)
linked the worst-case investing with practically widely used average-case investing,
that is, the geometric Brownian motion (GBM) model (Bachelier 1900; Osborne 1959;
Cootner 1964). The authors designed an investment strategy that is universal in the
worst case and is capable of exploiting the GBM model. The algorithm, or so-called
Exp-Concave-FTL, follows a similar formulation to ONS, that is,
t
1
bt+1 = arg max log(b · xτ ) − b2 .
b∈m 2
τ=1
The optimization problem can be efficiently solved via online convex optimization,
which typically requires a high time complexity (i.e., similar to the ONS). If the stock
price follows the GBM model, the regret round becomes O(m log Q), where Q is a
quadratic variability calculated as n − 1 times the sample variance of price relative
vectors. Since Q is typically much smaller than n, the regret bound is significantly
improved from previous O(m log n).
Besides the improved regret bound, the authors also discussed the relationship
between their algorithm and trading frequency. The authors asserted that increasing
the trading frequency would decrease the variance of minimum-variance CRP, while
the regret stays the same. Therefore, it is expected to see improved performance
as the trading frequency increases, which is empirically observed by Agarwal et al.
(2006).
Das and Banerjee (2011) further extended the FTRL approach to a generalized MA
termed online Newton update (ONU), which guarantees that the overall performance
is no worse than any convex combination of the base experts.
The Best Constant Rebalanced Portfolios (BCRP) strategy is optimal if the market
is independent and identically distributed (i.i.d.; Cover 1991); however, this assump-
tion may not fit the real market and thus may lead to the inferior performance of the
“follow the winner” category. Rather than tracking the winners, the follow the loser
approach is often characterized by transferring the wealth from winners to losers. The
underlying assumption is the mean reversion (contrarian) idea (Bondt and Thaler
1985), which means that good (poor)-performing assets will perform poor (good)
in the subsequent periods. Thus, follow the loser’s approaches often are character-
ized by transferring capital from poor-performing assets (losers) to good-performing
assets (winners). Although this principle is heavily investigated in finance journals,
it has not been widely disseminated in the topic of online portfolio selection. How-
ever, some algorithms do follow this principle. One famous example is the CRP
benchmark. Moreover, Cover’s UP, which buys and holds CRP strategies, can also be
viewed as follows the loser approach from the underlying stocks’ perspective, while
we categorize it as follow the winner from the experts’ perspective.
This chapter is organized as follows. Section 5.1 illustrates the mean rever-
sion idea, which is the key underlying the “follow the loser” principle. Section 5.2
introduces a representative strategy in this category, or the Anticor strategy. Finally,
Section 5.3 summarizes the follow the loser principle.
31
the
5 nsame after 2n periods. However, BCRP in hindsight can achieve a growth rate of
4 for a n-trading period.
Now let us analyze the BCRP’s behaviors to show the underlying
mean rever-
sion trading idea (Table 5.1). Suppose the initial portfolio is 12 , 12 and at the end of
period 1, the close price adjusted portfolio distribution becomes 15 , 45 and cumulative
wealth increases by a factor of 54 .At the beginning of period 2, portfolio manager rebal-
ances to initial portfolio 12 , 12 by transferring the wealth from a better-performing
asset (B) to a worse-performing asset (A). At the beginning of period 3, the wealth
transfer with the mean reversion trading idea continues. Although the market strategy
gains nothing, BCRP can achieve a growth rate of 54 per period with the underly-
ing mean reversion trading idea, which assumes that if one asset performs worse,
it tends to perform better in the subsequent trading period. It actually gains profit
via the volatility of the market, or so-called volatility pumping (Luenberger 1998,
Chapter 15).
Though extensive studies in finance show that mean reversion is a plausible idea
to be used in trading (Chan 1988; Poterba and Summers 1988; Lo and MacKinlay
1990; Conrad and Kaul 1998), its counterintuitive nature hides it from the OLPS
community. While the “follow the winner” strategies are sound in theory, they often
perform poorly when using real data, which will be shown in the empirical studies
in Part IV. Perhaps the reason is that their momentum principle does not fit the real
market, especially on the tested trading frequency (such as daily). It is thus natural
to utilize the mean reversion idea in developing new strategies so as to boost the
empirical performance.
5.2 Anticorrelation
Borodin et al. (2004) proposed a follow the loser strategy named an Anticorrelation
(Anticor). Instead of making no distributional assumption like Cover’s UP, Anticor
assumes that the market follows the mean reversion principle. To exploit the property,
it statistically makes bets on the consistency of positive lagged cross-correlation and
negative autocorrelation.
Following the mean reversion principle, Anticor transfers weights from the assets
increased more to the assets increased less, and the corresponding amounts are
adjusted by the cross-correlation matrix. In particular, if asset i increases more than
asset j and they are positively correlated, Anticor claims a transfer from asset i to
j with the amount equaling the cross-correlation (Mcor (i, j )) minus their negative
auto-correlation (min{0, Mcor (i, i)} and min{0, Mcor (j, j )}). Finally, these claims
are normalized to keep the portfolio in the simplex domain.
With the mean reversion nature, it is difficult to obtain a useful regret bound for
Anticor. Although heuristic and without theoretical guarantee, Anticor empirically
outperforms all other strategies at the time. On the other hand, though Anticor obtains
good performance, its heuristic nature cannot fully exploit mean reversion. Thus,
exploiting the property via systematic learning algorithms is highly desired, which
motivates one part of our research.
5.3 Summary
Although counterintuitive, the follow the loser principle is quite useful in obtaining
a high cumulative return in the empirical studies. This may be attributed to the fact
that many financial research studies have validated that the market behaviors follow
the mean reversion principle. Thus, to better exploit the market, a trading strategy has
to incorporate the market behaviors. We further propose three novel mean reversion-
based algorithms in Chapters 9, 10, and 11, respectively.
Pattern Matching
Besides follow the winner and follow the loser, another category utilizes both
winners and losers, and it is based on pattern matching. This category mainly covers
nonparametric sequential investment strategies, which guarantee an optimal growth
of capital under minimal assumptions on the market, that is, stationary and ergodic
of the financial time series. Based on nonparametric prediction (Györfi and Schäfer
2003), this category consists of several pattern matching–based investment strate-
gies (Györfi et al. 2006, 2007, 2008; Li et al. 2011a). Note that in the data-mining
communities, some researchers focus on detecting important signals or patterns in
time series (Mcinish and Wood 1992; Berndt and Clifford 1994; Agrawal and Srikant
1995; Srikant and Agrawal 1996; Ting et al. 2006; Cañete et al. 2008; Du et al. 2009),
which is beyond our discussion.
In general, the pattern matching–based approaches (Györfi et al. 2006) consist of
two steps, that is, the sample selection and portfolio optimization steps. Suppose we are
choosing a portfolio for period t + 1. First, the sample selection step selects a set Ct of
similar historical indices, whose corresponding price relatives will be used to predict
the next one. Then, each price relative vector xi , i ∈ Ct , is assigned a probability of
Pi , i ∈ Ct . Existing methods often choose uniform probability Pi = |C1t | , where | · |
denotes the cardinality of a set. Second, the portfolio optimization step learns an
optimal portfolio based on the selected set, that is,
where U (·) is a specified utility function, such as log utility. In case of an empty
sample set, a uniform portfolio is adopted.
In this chapter, we concretize the sample selection step in Section 6.1 and the port-
folio optimization step in Section 6.2. We finally combine the two steps to formulate
specific online portfolio selection algorithms in Section 6.3. Based on the principle,
we further proposed the correlation-driven nonparametric learning (CORN) algorithm
in Chapter 8.
35
Maximizing the above function results in a BCRP portfolio (Cover 1991) over the
similar price relatives.
Györfi et al. (2007) introduced semi-log-optimal utility function, which approx-
imates log utility in Equation 6.1 aiming to release its computational complexity;
and Vajda (2006) presented corresponding theoretical analysis and proved its
universality. The semi-log-optimal utility function is defined as
US (b, Ct ) = E{f (b · x)|xi , i ∈ Ct } = Pi f (b · xi ),
i∈Ct
where f (·) is the second-order Taylor expansion of log z with respect to z = 1, that
is,
1
f (z) = z − 1 − (z − 1)2 .
2
Györfi et al. (2007) adopted a uniform probability of Pi , thus, equivalently,
US (b, Ct ) = f (b · xi ).
i∈Ct
In any of the above procedures, if the similarity set is non-empty, we can obtain an
optimal portfolio based on the similar price relatives and their assumed probability. In
the case of an empty set, we can choose either a uniform portfolio or the last portfolio.
6.3 Combinations
Finally, let us combine the two steps and describe specific algorithms in the pattern
matching–based approach. Table 6.1 summarizes all existing combinations.
One default utility function is the log-optimal function. Györfi and Schäfer (2003)
introduced the nonparametric histogram-based log-optimal investment strategy (BH ),
which combines the histogram-based sample selection and log-optimal utility func-
tion. Györfi et al. (2006) presented the nonparametric kernel-based log-optimal
investment strategy (BK ), which combines the kernel-based sample selection and
log-optimal utility function. Györfi et al. (2008) proposed the nonparametric near-
est neighbor log-optimal investment strategy (BNN ), which combines the nearest
neighbor sample selection and log-optimal utility function.
Besides the log-optimal utility function, several algorithms using different util-
ity functions have been proposed. Györfi et al. (2007) proposed the nonparametric
kernel-based semi-log-optimal investment strategy (BS ) by combining the kernel-
based sample selection and semi-log-optimal utility function, which greatly eases
Table 6.1 Pattern matching–based approaches: sample selection and portfolio optimization
Sample Selection Techniques
Portfolio Optimization Histogram Kernel Nearest Neighbor
Log-optimal BH :CH + U L BK : CK + UL BNN : CN + UL
Correlation-driven — CORN —
Semi-log-optimal — B S : CK + US —
Markowitz-type — B M : CK + UM —
GV-type — BGV : CK + UR —
Note: —, no algorithm in the combinations.
6.4 Summary
This chapter summarizes the pattern matching–based principle, which mainly includes
pattern-matching and portfolio optimization steps. Empirically, these algorithms
exploit recurring patterns over the history and produce good empirical performance.
One of its key problems is to identify the recurring patterns, which leads to our CORN
strategy in Chapter 8.
Meta-Learning
∗ FOF selects portfolios on different fund managers, rather than on assets. For example, an FOF
manager may evenly split his fund, and put one part to fund A and the other to Fund B.
41
7.5 Summary
Meta-learning is another widely discussed principle in the research of online portfolio
selection (OLPS). It derives from base algorithms but treats these experts as the
underlying assets. Thus, from this aspect, meta-algorithms (MAs) can be widely
applied to all strategies discussed in previous chapters. We are interested in this
principle because practical trading systems usually contain multiple strategies, and
meta-learning can be used to combine these strategies in an effective way.
Algorithms
45
Correlation-Driven Nonparametric
Learning
As described in Part II, several approaches have been proposed to select portfolios
from financial markets. The pattern matching–based approach, which is intuitive in
nature, can achieve best performance at the present time. However, one key chal-
lenge to this approach is to effectively locate a set of trading days whose price
relative vectors are similar to the coming one. As detailed in Section 6.1, existing
strategies often adopt Euclidean distance to measure the similarity between two pre-
ceding market windows. Euclidean distance can somehow measure the similarity;
however, it simply considers the neighborhood of the latest market windows and
ignores the linear or nonlinear relationship between two market windows, which is
important for price relative estimation. In this chapter, we propose to exploit similar
patterns via a correlation coefficient, which effectively measures the linear relation-
ship, and further propose a novel pattern matching–based online portfolio selection
algorithm “CORrelation-driven Nonparametric learning” (CORN) (Li et al. 2011a).
The proposed CORN algorithm can better locate a similarity set, and thus can output
portfolios that are more effective than existing pattern matching–based strategies.
Moreover, we also proved CORN’s universal consistency,∗ which is a nice property
for the pattern matching–based algorithms. Further, in Part IV, we will extensively
evaluate the algorithm on several real stock markets, where the encouraging results
show that the proposed algorithm can easily beat both market index and best stock
substantially (without or with small transaction costs) and also surpass a variety of
the state-of-the-art techniques significantly.
This chapter is organized as follows. Section 8.1 motivates the proposed correla-
tion metric for selecting similarity sets. Section 8.2 details the ideas of the proposed
online portfolio selection algorithm, and then Section 8.3 illustrates the proposed
algorithms. Section 8.4 proves CORN’s universal consistency and further analyzes
the proposed algorithms. Finally, Section 8.5 summarizes this chapter and indicates
future directions.
47
∗ In our problem setting, there are no cash or risk-free assets. In reality, a weaker constraint (e.g., at
most, 90% of capital can be put in assets), may appear in mutual funds.
† Because their first asset is more favorable than the second one, which is different from the latest xt−1 .
t−2
‡ The radius is arbitrarily chosen to to limit the number of selected price relatives.
1.20 B2 X t–1
t–2
B1
1.00
... Periods
i–2 i–1 t–2 t–1
A2
0.80 A1
(a)
(b)
t−1
xt−2 A1 A2 B1 B2 C1 C2
Euclidean distances 0.45 0.42 0.14 0 0.32 0.28
Similar? (Y/N) N N Y Y N N
Correlation coefficients −1 1 −1 1 −1 1
Similar? (Y/N) N Y N Y N Y
(c)
measure will classify B1 and B2 to the similarity set, since they are both located
t−1
within the Euclidean ball of xt−2 (with a radius of 0.2). Such a classification is
clearly suboptimal, as it includes harmful B1 and excludes beneficial A2 and C2. As a
consequence of the imperfect similarity set, the subsequent portfolio optimization
will considerably suffer from irrelevant or even harmful market windows (such as
market window B1) and the neglect of beneficial market windows (such as market
8.2 Formulations
The proposed algorithm is mainly inspired by the idea of exploiting statistical correla-
tions between two market windows, and also driven by the consideration of exploring
the powerful nonparametric learning techniques to effectively optimize a portfolio.
Traditional portfolio selection methods in finance often try to estimate a target
function based on past data and build portfolios based on the learned function. How-
ever, since the financial market is complex and accurate modeling of its movements
is a difficult task, we adopt a nonparametric learning approach (or instance-based
learning, or case-based learning) (Aha 1991; Aha et al. 1991; Cherkassky and Mulier
1998). Nonparametric learning makes no assumptions on data distribution (or mar-
ket distribution), and it captures the knowledge from stored training data without
building any target functions. In particular, at the beginning of every period, the pro-
posed algorithm locates similar price relatives among all past price relatives, and then
maximizes the expected multiplicative portfolio return directly based on the similar
appearances. Without estimating any global functions of the market movements, the
proposed algorithm estimates a target value of next price relative.
To overcome the limitation of Euclidean distance in mining historical market
windows and the negligence of whole-market movements in all existing strategies,
we propose to employ the Pearson product–moment correlation coefficient, which is
an effective tool for measuring statistical linear relationships. Note that it measures
the statistical correlations between market windows of all assets, rather than pairs
of assets as Anticor does. Since market windows of all assets represent the whole-
market movements in a period, they could be more effective to match the similar price
relatives regarding the whole market.
Till now, we declare a correlation-similar set that contains historical trading days
whose previous market windows are statistically correlated to the latest one, and
formally define it as
i−1 t !
cov xi−w , xt−w+1
Ct (w, ρ) = w < i < t + 1
i−1 t ≥ρ ,
std xi−w std xt−w+1
8.3 Algorithms
Next, we present the proposed CORN algorithm, which exploits the correlation-
similar set in optimizing portfolios for actively rebalancing.
We start by defining a set of W × P experts, each expert indexed by (w, ρ), that is,
{E(w, ρ) : w ≥ 1, −1 ≤ ρ ≤ 1},
and W represents the maximum window size and P represents the number of corre-
lation coefficient thresholds. Each expert E(w, ρ) represents a CORN expert learning
algorithm and outputs a portfolio, denoted as E(w, ρ) = b(w, ρ).
As summarized in Algorithm 8.1, a CORN expert learning algorithm consists of
two major steps. The first step, as illustrated in Section 8.2, is to locate a correlation-
similar set via the correlation coefficient metric, and the second step is to obtain an
optimized portfolio that can maximize the expected return, which is the main target
of our research. After calculating the correlation-similar set Ct (w, ρ) at the end of
period t, we propose to learn an optimal portfolio following the idea of BCRP (Cover
1991), which maximizes the expected multiplicative return over the sequence of
similar price relatives, that is,
where bt+1 (w, ρ) represents the portfolio computed by expert E(w, ρ) and St (w, ρ)
represents its historical performance. For an individual expert, the higher its historical
return, the higher its weight assigned in the final portfolio.
where St (w, ρ) represents the cumulative wealth achieved by expert E(w, ρ) till
period t.
Therefore, it is straightforward that the cumulative wealth achieved by the pro-
posed CORN strategy after n periods is equivalent to a q-weighted sum of all experts’
returns,
Sn = q(w, ρ)Sn (w, ρ). (8.3)
w,ρ
Clearly, the final cumulative return is affected by all underlying experts, and the
portions of contributions made by each expert are determined by the predefined
distribution q(w, ρ) and expert’s performance Sn (w, ρ).
Ideally, indexed by (w, ρ), we can choose CORN experts such that they cover all
possible parameter settings, thus eliminating their effects. However, the computational
cost of such a combination is inhibitively high. To boost the efficiency, we can choose
finite discrete dimensions of the parameters, that is, a specified number of (w, ρ)
combinations.
The selection of experts also trades off an individual expert’s performance and its
computational time. First, Equation 8.3 clearly shows that each expert contributes to
the final cumulative wealth by its performance; thus, choosing a worse expert may
lower the final performance. Second, the mixture’s computation time is generally the
summation of all experts’ individual time. In other words, choosing too many experts,
which cost too much time, may affect its practical scalability.
In this study, we first adopted uniform combination, which chooses a uniform
distribution of q(w, ρ), and named it “CORN uniform combination” (CORN-U).
Algorithm 8.2 shows the details of the proposed CORN-U algorithm. In particular,
we assign the same weights to all CORN experts, although the weights can be adjusted
if we have more information. Moreover, CORN-U only considers P = 1 and chooses
a specific value of ρ.
The above uniform combination algorithm may include some poor experts, lead-
ing to the degradation of overall performance. To overcome such limitations, the
second algorithm, “CORN top-K combination” (CORN-K), combines only the top
K best experts. Algorithm 8.3 illustrates the proposed CORN-K algorithm. In partic-
ular, it chooses the top K experts with the highest historical returns and uniformly
combines them. That is, the strategy assigns the set of top K experts a uniform distri-
bution q(w, ρ) = K1 , while the weights assigned to other experts are simply set to 0.
Moreover, for the proposed CORN-K algorithm, we define P ≥ 1 associated experts,
each of which has a different ρ value.
end
Combine experts’ portfolios:
q(w, ρ)St (w, ρ)bt+1 (w, ρ)
bt+1 = w
w q(w, ρ)St (w, ρ)
end
end
end
If η = 1, then one gets the rule in Equation 8.2. The proof of universal consistency of
B H , B K , and B NN works without any difficulties if η ≤ 1. However, the exponential
results are superior if η is much larger than 1, but there is no theoretical support for this
phenomenon. The large η corresponds to CORN-K with K = 1 (i.e., this rule is the
end
end
Combine top K experts’ portfolios:
w,ρ q(w, ρ)St (w, ρ)bt+1 (w, ρ)
bt+1 =
w,ρ q(w, ρ)St (w, ρ)
end
end
end
follow the winner rule, which experts called follow the leader in the machine-learning
literature) (Cesa-Bianchi and Lugosi 2006). It would be nice to prove or disprove that
the follow the leader aggregation results in universally consistent strategies (i.e.,
asymptotically it is of growth optimal for any stationary and ergodic market process).
Theorem 8.1 The portfolio scheme CORN is universal with respect to the class of
all ergodic processes such that E{| log Xj |} < ∞, for j = 1, . . . , m.
8.5 Summary
This chapter proposed a novel “CORrelation-driven Nonparametric learning”
(CORN) strategy for online portfolio selection, which effectively exploits the sta-
tistical correlations hidden in stock markets, and benefits from the exploration of
powerful nonparametric learning techniques. The proposed CORN algorithm is sim-
ple in nature and easy to implement, and has parameters that are easy to set. It also
enjoys the universal consistency property. Our empirical studies on real markets, in
Part IV, show that CORN can substantially beat the market index and the best stock,
and also consistently surpasses a variety of state-of-the-art algorithms.
Currently, the proposed CORN can capture the linear relationship between two
market windows, and it is possible to further capture their nonlinear relationship.
Although high return strategies are often associated with high risk, it would be more
attractive to develop a strategy that can manage the risk properly without slashing
too much return. As an extension to this work, we are currently developing such risk-
limiting strategies for CORN. In future, we plan to investigate theoretical insights
of the algorithm and examine its extensions to improve the performance with high
transaction costs.
This chapter proposes a novel online portfolio selection (OLPS) strategy named
“passive–aggressive mean reversion” (PAMR) (Li et al. 2012). Unlike traditional
trend-following approaches, the proposed approach relies upon the mean reversion
relation of financial markets. We are the first to devise a loss function that reflects the
mean reversion principle. Further equipped with passive–aggressive online learning
(Crammer et al. 2006), the proposed strategy can effectively exploit mean reversion.
By analyzing PAMR’s update scheme, we find that it nicely trades portfolio return
with volatility risk and reflects the mean reversion principle. We conduct extensive
numerical experiments in Part IV to evaluate the proposed algorithms on various real
datasets. In most cases, the proposed PAMR strategy outperforms all benchmarks and
almost all state-of-the-art strategies under various performance metrics. In addition
to superior performance, the proposed PAMR runs extremely fast and thus is very
suitable for real-life online trading applications.
This chapter is organized as follows. Section 9.1 briefly reviews the ideas of
existing trend-following strategies and motivates the proposed strategy. Section 9.2
formulates the proposed PAMR strategy, and Section 9.3 derives the algorithms.
Section 9.4 further analyzes and discusses the algorithms. Finally, Section 9.5
summarizes this chapter and indicates future directions.
9.1 Preliminaries
9.1.1 Related Work
One popular trading idea in reality is trend following or momentum, which assumes
that historically outperforming stocks would still perform better than others in future.
Some existing algorithms, such as EG and ONS, approximate the expected loga-
rithmic daily return and logarithmic cumulative return, respectively, using historical
price relatives. Though this idea is easy to understand and makes fortunes for many
of the best traders and investors, trend following is hard to implement effectively. In
addition, in the short term, the stock price relatives may not follow previous
trends (Jegadeesh 1990; Lo and MacKinlay 1990).
59
9.1.2 Motivation
The proposed approach is motivated by the CRP (Cover and Gluss 1986), which
adopt the mean reversion trading idea. As shown in Chapter 5, the mean reversion
principle has not been widely investigated for OLPS.
Another motivation of the proposed algorithm is that, in financial crisis, all
stocks drop synchronously or certain stocks drop significantly. Under such situations,
actively rebalance may be inappropriate since it puts too much wealth on “mine”
stocks, such as Bear Stearns‡ during the subprime crisis. To avoid potential risk
concerning such “mine” stocks, it is better to stick to a previous portfolio, which
∗ From the expert level, UP follows the winner. However, since its experts belong to CRP, it also follows
the loser in stock level. In the preceding survey, we classify it following the expert level.
† Back-test refers to testing a trading strategy via historical market data.
‡ Bear Stearns was a US company whose stock price collapsed in September 2008.
9.2 Formulations
Now we shall formally devise the proposed PAMR strategy for the OLPS task.
PAMR is based on a loss function that exploits the mean reversion idea, which is
our innovation, and is equipped with the PA online learning technique (Crammer
et al. 2006).∗
First of all, given a portfolio vector b and a price relative vector xt , we define an
-insensitive loss function for the t-th period as
0 b · xt ≤
(b; xt ) = , (9.1)
b · xt − otherwise
∗ In fact, with the loss function, we can adopt any learning methods to exploit the mean reversion
property. We choose PA for its simplicity and effectiveness. Certainly, other learning techniques can be
adopted, if the new method can provide some new insights.
1
bt+1 = arg min b − bt 2 s. t. (b; xt ) = 0. (9.2)
b∈m 2
where C is a positive parameter to control the influence of the slack variable on the
objective function. We refer to this parameter as an aggressiveness parameter similar
to PA learning (Crammer et al. 2006) and call this variant “PAMR-1.”
Instead of a linear slack variable, for the second variant, we modify the objec-
tive function by introducing a term that scales quadratically with respect to a slack
variable ξ, which results in the following optimization problem.
∗ Here we use simple gross return, as defined in Section 9.2. Financial literature often adopts simple net
return (Tsay 2002), which fluctuates around 0.
9.3 Algorithms
We now derive the solutions for the three PAMR formulations using standard tech-
niques from convex analysis (Boyd and Vandenberghe 2004) and present the proposed
PAMR algorithms. Specifically, the following three propositions summarize their
closed-form solutions.
Update portfolio:
bt+1 = bt − τt xt − x̄t 1
Normalize portfolio:
end
9.4 Analysis
To reflect the mean reversion trading idea, we are interested in analyzing PAMR’s
update rules, which mainly involve portfolio bt+1 and step size τt . In particular, we
want to examine how the update rules are related to return and risk—the two most
important concerns in a portfolio selection task.
First of all, we analyze the portfolio update rule for the three algorithms, that is,
bt+1 = bt − τt xt − x̄t 1 .
The step size τt is nonnegative, and x̄t is mean return or market return. The xt − x̄t 1
represents stock abnormal returns with respect to the market on period t. We can
further interpret it as a directional vector for the weight transfer. The negative sign
before the term indicates that the update scheme is consistent with our motivation, that
is, to transfer weights from outperforming stocks (with positive abnormal returns) to
underperforming stocks (with negative abnormal returns).
It is interesting that the second part of the update,
at = −τt xt − x̄t 1 ,
coincides with the general form (Lo and MacKinlay 1990, Eq. (1)) of return-based
contrarian strategies (Conrad and Kaul 1998; Lo 2008), except a changing multi-
plier τt . This part represents an arbitrage (zero-cost) portfolio, since its elements
always sum to 0, that is, at · 1 = 0. Adding the arbitrage portfolio to the last portfolio,
bt , results in the next portfolio. The long elements of the arbitrage portfolio (at,i > 0)
increase the corresponding elements of the whole portfolio, and the short elements
(at,i < 0) decrease the corresponding elements. Such an explanation is similar to the
analysis in the last paragraph and connects PAMR’s update with the general form of
return-based contrarian strategies.
Besides, another important update is the step size τt calculated as Equations 9.6
through 9.8 for three PAMR methods, respectively. The step size τt adaptively controls
9.5 Summary
In this chapter, we proposed a novel online portfolio selection (OLPS) strategy,
passive–aggressive mean reversion (PAMR). Motivated by the idea of mean rever-
sion and passive–aggressive online learning, PAMR either aggressively updates the
portfolio following mean reversion, or passively keeps the previous portfolio. PAMR
executes in linear time, making it suitable for online applications. We also find that
its update scheme is based on the trade-off between return and volatility risk, which
is ignored by most existing strategies. This interesting property connects the PAMR
strategy with modern portfolio theory, which may provide further explanation from
the aspect of finance.
The proposed algorithms are still far from perfect and may be improved in the
following aspects. First of all, though the universality property may not be required
in real investment, PAMR’s universality is still an open question. Second, PAMR
sometimes fails if mean reversion does not exist in the market components. Thus, it is
crucial to locate asset sets exhibiting mean reversion. Finally, PAMR’s formulations
ignore transaction costs. Thus, directly incorporating the issue into formulations may
improve PAMR’s practical applicability.
Empirical evidence (Borodin et al. 2004) shows that stock price relatives may follow
the mean reversion property, which has not been fully exploited by existing strategies.
Moreover, all existing online portfolio selection (OLPS) strategies only focus on the
first-order information of a portfolio vector, though second-order information may
also benefit a strategy. This chapter proposes a novel strategy named “confidence-
weighted mean reversion” (CWMR) (Li et al. 2011b, 2013). Inspired by the mean
reversion principle in finance and confidence-weighted (CW) online machine learning
technique (Crammer et al. 2008; Dredze et al. 2008), CWMR models the portfolio vec-
tor as a Gaussian distribution, and sequentially updates the distribution following the
mean reversion principle. Analysis of CWMR’s closed form updates clearly reflects
the mean reversion trading idea and the interaction of first-order and second-order
information. Extensive experiments, in Part IV, on various real markets show that
CWMR is able to effectively exploit the power of mean reversion and second-order
information, and is superior to the state-of-the-art techniques.
This chapter is organized as follows. Section 10.1 motivates the proposed CWMR
strategy. Section 10.2 formulates the strategy, and Section 10.3 derives the algorithms
based on the formulations. Section 10.4 further analyzes the algorithms. Finally,
Section 10.5 summarizes this chapter and indicates future directions.
10.1 Preliminaries
10.1.1 Motivation
The proposed method, similar to passive–aggressive mean reversion (PAMR), is based
on the mean reversion trading idea, which, in the context of portfolio or multiple assets,
implies that good-performing assets tend to perform worse than others in subsequent
periods, and poor-performing assets are inclined to perform better. Thus, to maximize
the next portfolio return, we could minimize the expected return with respect to
today’s price relatives since next price relatives tend to revert. This seems somewhat
counterintuitive, but, according to Lo and MacKinlay (1990), the effectiveness of
mean reversion is due to the positive cross-autocovariances across assets.
71
10.2 Formulations
We model b as a Gaussian distribution with mean μ ∈ Rm and diagonal covariance
matrix ∈ Rm×m with nonzero diagonal elements and zero for off-diagonal elements.
The i-th element of μ represents the proportion of the i-th element. The i-th diagonal
term of stands for the confidence on the i-th proportion. The smaller the diagonal
term, the higher the confidence we have in the corresponding μ.
At the beginning of period t, we figure out a b based on the distribution N (μ, ),
that is, b ∼ N (μ, ). Then, after xt is revealed, the wealth increases by a factor
of b xt . It is straightforward that the return D = b xt can be viewed as a random
variable of the following univariate Gaussian distribution:
D ∼ N μ xt , xt xt .
Its mean is the return of mean vector, and its variance is proportional to the projection
of xt on .
For simplicity, we write Pr[b xt ≤ ] instead. Note that we are considering the mean
reversion profitability in a portfolio consisting of multiple stocks; thus, this definition
is equivalent to the motivating idea of buying poor-performing stocks or, equivalently,
selling good-performing stocks.
The algorithm adjusts the distribution to ensure that the probability of a mean
reversion profitable b is higher than a confidence-level parameter θ ∈ [0, 1]:
( )
Pr b xt ≤ ≥ θ.
This is somewhat counterintuitive but reasonable with respect to the mean reversion
idea. If it is highly probable that the portfolio return b xt is less than a threshold, it
is also highly probable that its next return based on xt+1 tends to be higher since xt+1
will revert.
Then, following the intuition underlying PA algorithms (Crammer et al. 2006),
our algorithm chooses a distribution closest to the current distribution N (μt , t ) in
terms of Kullback–Leibler (KL) divergence (Kullback and Leibler 1951). As a result,
at the end of period t, the algorithm updates the distribution by solving the following
optimization problem.
The optimization problem (10.1) clearly reflects our motivation. On the one hand,
if the current μt is mean reversion profitable, that is, the first constraint is satisfied,
CWMR chooses the same distribution, resulting in a passive CRP strategy. On the
other hand, if μt does not satisfy the mean reversion constraint, CWMR tries to
figure out a new distribution, which is expected to profit and not far from the current
distribution.
Let us reformulate the objective and constraints. For the objective part, the KL
divergence between two Gaussian distributions can be rewritten as
Substituting μD and σD by their definitions and rearranging the terms, we can obtain
,
− μ xt ≥ φ xt xt ,
where φ = −1 (θ). Clearly, we require that the weighted summation of return and
standard deviation is less than a threshold. Till now, we can rewrite the preceding
optimization problem.
For the optimization problem (10.2), the first constraint is not convex in , there-
fore we have two ways to handle it. The first way (Dredze et al. 2008) is to linearize it
by omitting the square root, that is, − μ xt ≥ φxt xt . As a result, we can finalize
the first optimization problem, named CWMR-Var.
Q is orthonormal and λ1 , . . . , λm are the eigenvalues of and thus ϒ is also PSD. This
reformulation yields the second final optimization problem, named CWMR-Stdev.
Clearly, the revised optimization problem (10.2) is equivalent to the raw optimiza-
tion problem (10.1). From the revised problem, we proposed two final optimization
problems, Equations 10.3 and 10.4, which are convex and thus can be efficiently
solved by convex optimization (Boyd and Vandenberghe 2004). The first variation,
CWMR-Var, linearizes the constraint; thus, it results in an approximate solution for the
revised and the raw optimizations. In contrast, the second variation, CWMR-Stdev, is
equivalent to the revised optimization problem (10.2) and results in an exact solution
for both the revised and raw optimization problems.
Remarks on Formulations: Note that the short version of this chapter (Li et al.
2011b) assumes log utility (Bernoulli 1954; Latané 1959) on μ xt and is slightly
different from this version (Li et al. 2013). Since both and φ are adjustable, they have
similar effects on μ. Assuming other parameters are constant except μ, as μ xt >
log μ xt , the current linear form can move μ toward the mean reversion profitable
portfolio more than the log form can. However, the log form in this constraint causes
another convexity issue besides the standard deviation on the right-hand side. To solve
the optimization problem with log, Li et al. (2011b) chose to replace the log term by
its linear approximation, which may converge to a different solution. Moreover, the
current form and log’s linear approximation are essentially the same.∗ Thus, we adopt
return without log, which has no above convexity issues concerning log and its linear
approximation.
10.3 Algorithms
Now, let us devise the proposed algorithms based on the optimization problem,
Equations 10.3 and 10.4. Their solutions are shown in Propositions 10.1 and 10.2,
respectively. Both proofs are presented in Appendices B.3.1 and B.3.2, respectively.
Proposition 10.1 The solution to the final optimization problem (10.3) (CWMR-Var)
without considering the non-negativity constraint (μ 0) is expressed as
∗ Current form is μ x . Li et al. (2011b) uses approximation, that is, log μ x ≈ log μ x + xt ·(μ−μt ) .
t t t t μ
t xt
Both terms are linear, although their scales are different.
xt xt
μt+1 = μt − λt+1 t (xt − x̄t 1), −1
t+1 = −1
t + λt+1 φ √ ,
Ut
∗ In investment, notional leverage denotes total holding assets plus total notional amount of liability
divided by equity. If shorting is allowed, the notional leverage equals i |bi | to 1. The problem setting in
our study is long-only, in which we do not allow shoring/margin; thus, the leverage is always 1 to 1.
† Long-only means no shoring/margin is allowed; thus, the notional leverage is always 1 to 1.
1 t xt
Mt = μ
t xt , Vt = xt t xt , Wt = xt t 1, x̄t =
1 t 1
Update the portfolio distribution:
⎧
⎪
⎨ λt+1 as in Equation B.10 in Appendix B.3.1
CWMR-Var μt+1 = μt − λt+1 t (xt − x̄t 1)
⎪
⎩
t+1 = ( −1
t + 2λt+1 φdiag (xt ))
2 −1
⎧
⎪
⎪ λt+1 as in Equation B.14 in Appendix B.3.2
⎪
⎪ ,
⎪
⎨ √ −λt+1 φVt + λ2t+1 φ2 Vt2 +4Vt
CWMR-Stdev Ut = 2
⎪
⎪ μt+1 = μt − λt+1 t (xt − x̄t 1)
⎪
⎪
⎪
⎩ t+1 = ( −1 + λt+1 √φ diag2 (xt ))−1
t U t
end
10.4 Analysis
In this section, we analyze and interpret the proposed algorithms. Firstly, we compare
CWMR with CW learning (Crammer et al. 2008; Dredze et al. 2008). Then, we
analyze CWMR’s update schemes, that is, μ and , with running examples. Further,
we describe the behavior of stochastic CWMR. Finally, we show its computational
time and compare it with existing work.
The proposed CWMR algorithms are partially motivated by CW learning, thus
their formulations and subsequent derivations are similar. However, they address
different problems, as CWMR handles OLPS while CW focuses on classification.
Although both objectives adopt KL divergence to measure the closeness between two
distributions, their constraints reflect that they are oriented toward different problems.
To be specific, CW’s constraint is the probability of a correct classification, while
Deterministic CWMR : bt = μt
Stochastic CWMR : b̃t ∼ N (μt , t ), bt = arg min b − b̃t 2
b∈m
Straightforwardly, we can rewrite its term as μt+1,i = μt,i − λt+1 σt2 (xt,i − x̄t ). Obvi-
ously, λt+1 is non-negative and t is PSD. The term xt − x̄t 1 denotes excess return
vector for period t, where x̄t is the CW average of xt . Holding other terms constant,
μt+1 tends to move toward μt , while the magnitude is negatively related to the last
excess return, which is the mean reversion principle. Meanwhile, these movements
are dynamically adjusted by λt+1 , the last covariance matrix t and mean μt , which
catch both first- and second-order information. To the best of our knowledge, none
of the existing algorithms has explicitly exploited the second-order information of b,
even though the second-order information could benefit the proposed algorithms.
Let us continue to analyze . With only nonzero diagonal elements, we can write
the update of the i-th variance as
t xt bt b
t xt λt xt − x̄t 1 diag( t ) μt
0 (0.25, 0.25) (0.5, 0.5)
1 (1.0, 0.5) (0.5, 0.5) 0.75 40.78 (0.25, −0.25) (0.10, 0.40) (0.0, 1.0)
2 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (−0.80, 0.20) (0.40, 0.10) (1.0, 0.0)
3 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, −0.40) (0.10, 0.40) (0.0, 1.0)
4 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (−0.80, 0.20) (0.40, 0.10) (1.0, 0.0)
5 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, −0.40) (0.10, 0.40) (0.0, 1.0)
.. .. .. .. .. .. .. ..
. . . . . . . .
the additional projection, as sometimes the stochastic b may be out of the simplex
domain. To better understand the two aspects, let us continue the Cover’s game in
Table 10.2. For the first case, let μ = (0.5, 0.5) and diag() = (0.25, 0.25). We draw
stochastic b for 10,000 times, and the average b after projection is (0.5038, 0.4962)
(before projection, the value is (0.5070, 0.4993)), which slightly deviates from the
optimal mean and will result in different performance. For the second case, let μ =
(0, 1) and diag() = (0.1, 0.4). We draw and project 10,000 stochastic b, and get an
average b of (0.1391, 0.8609), which is far from the optimal mean (0, 1). In both cases,
stochastic CWMR tends to deviate from the optimal mean, and thus underperforms
the deterministic one, which is shown in the related experiments (Li et al. 2013,
Table VII).
Since computational time is of crucial importance for certain trading scenarios,
such as high-frequency trading (Aldridge 2010), which can occur in fractions of a
second, we finally show CWMR’s time complexity. In the implementation, we only
consider the diagonal elements of ; thus, its inverse costs linear time. Moreover,
the projection (Line 3 in Algorithm 10.1) can be implemented in O(m) time (Duchi
et al. 2008). Thus, in total, CWMR algorithms (Algorithm 10.1) take O(m) time per
period. Straightforwardly, OLPS with CWMR (Algorithm 10.2) takes O(mn) time.
Table 10.3 compares CWMR’s time complexity with that of existing strategies.∗
Clearly, CWMR takes no more time than any others.
10.5 Summary
In this chapter, we proposed a novel online portfolio selection (OLPS) strategy named
confidence-weighted mean reversion (CWMR), which effectively learns portfolios
by exploiting the mean reversion property in financial markets and the second-order
information of a portfolio. CWMR’s update schemes are obtained by solving two
optimization problems that consider both first- and second-order information of a
portfolio vector, which goes beyond any existing approaches that only consider first-
order information. As shown in Part IV, the proposed approach beats a number of
∗ Nonparametric learning approaches (BK , BNN , and CORN) require to solve a nonlinear optimization
each period, that is, bt+1 = arg maxb∈m i (b xi ), whose time complexity is generally high. To produce
an approximate solution, batch gradient projection algorithms (Helmbold et al. 1997) take O(mn), while the
batch convex Newton method (Agarwal et al. 2006) takes O(m3 n). In the table, we set the step O(mn) time
complexity. In our implementation, we adopt MATLAB Optimization ToolboxTM (function fmincon
with active-set) to obtain exact solutions.
∗ Long-short portfolios can have negative weights, which denote the short positions.
Empirical evidence shows that a stock’s high and low prices are temporary, and stock
price relatives are likely to follow the mean reversion phenomenon. While exist-
ing mean reversion strategies can achieve good empirical performance on many real
datasets, they often make a single-period mean reversion assumption, which is not
always satisfied, leading to poor performance on some real datasets. To overcome the
limitation, this chapter (Li et al. 2015) proposes a multiple-period mean reversion,
or the so-called moving average reversion (MAR), and a new online portfolio selec-
tion (OLPS) strategy named the online moving average reversion (OLMAR), which
exploits MAR by applying powerful online learning techniques. Our empirical eval-
uations in Part IV show that OLMAR can overcome the drawbacks of existing mean
reversion algorithms and achieve significantly better results, especially on the datasets
where existing mean reversion algorithms failed. In addition to superior performance,
OLMAR also runs extremely fast, further supporting its practical applicability to a
wide range of applications.
This chapter is organized as follows. Section 11.1 analyzes existing works and
motivates the proposed strategy. Section 11.2 formulates the strategy, and Section 11.3
solves the formulations and derives the algorithms. Section 11.4 further analyzes the
proposed algorithm. Finally, Section 11.5 summarizes this chapter and indicates future
directions.
11.1 Preliminaries
11.1.1 Related Work
Most existing formulations follow the basic routine of Kelly-based portfolio selec-
tion (Kelly 1956; Thorp 1971). In particular, a portfolio manager predicts x̃t+1 in terms
1 , . . . , x̃k
of k possible values x̃t+1 t+1 and their corresponding probabilities p1 , . . . , pk .
i
Note that each x̃t+1 denotes one possible combination vector of individual price rela-
tive predictions. Then, he or she can figure out a portfolio by maximizing the expected
log return,
k
bt+1 = arg max pi log(b · x̃t+1
i
).
b∈m i=1
83
lations of PAMR and CWMR ignore the log utility due to the single-value prediction
and the consideration of convexity and computation. Though all three algorithms
assume that all information is fully reflected by xt , their performance diverges and
supports that mean reversion may better explain the markets. On the one hand, even
with a decent theoretical result, EG always performs poorly. On the other hand, though
without theoretical guarantees, PAMR and CWMR have produced the best results in
certain real markets. However, when such a single-period mean reversion assumption
is not satisfied, PAMR and CWMR would suffer from dramatic failures (Li et al.
2012, Table 4, the DJIA dataset), which motivates the following approach.
11.1.2 Motivation
Empirical results (Li et al. 2011b, 2012) show that mean reversion, which assumes
the poor stock may perform well in the subsequent periods, may better explain the
markets. PAMR and CWMR can exploit the mean reversion property well and achieve
good results on most datasets at the time, especially on the New York Stock Exchange
benchmark dataset (Cover 1991). However, they rely on a naïve assumption that next
∗ This assumption requires some transformations. That is, given x ∈ R+ , minimizing b · x is equivalent
t m t
to maximizing b · x1t . The latter follows the analysis framework here.
Table 11.1 Summary of existing optimization formulations and their underlying predictions
1 p̃t+1 pt−1
x̃t+1 = =⇒ = =⇒ p̃t+1 = pt−1 .
xt pt pt
Note that both x and p are vectors and the above operations are element-wise.
Though empirically effective on most datasets, PAMR and CWMR’s single-
period assumption causes two potential problems. Firstly, both algorithms suffer
from frequently fluctuating raw prices, as they often contain a lot of noise. Secondly,
their assumption of single-period mean reversion may not always be satisfied in the
real world. Even two consecutive declining price relatives, which are common, can
deactivate or fail both algorithms. One real example (Li et al. 2012) is the DJIA
dataset (Borodin et al. 2004), on which PAMR performs the worst among the state of
the art. Thus, traders are more likely to predict prices using some long-term values.
Also on the DJIA dataset, Anticor, which exploits the multiperiod statistical corre-
lation, performs much better than others. However, due to its heuristic nature (Li
et al. 2011b, 2012), Anticor cannot fully exploit the mean reversion property. The
two problems caused by the single-period assumption and Anticor’s inability to fully
exploit mean reversion call for a more powerful approach to effectively exploit mean
reversion, especially in terms of multiple periods.
Now let us see a classic example (Cover and Gluss 1986) to illustrate the draw-
backs of single-period mean reversion, as shown in Table 11.2. The toy market consists
of cash and one volatile stock, whose market sequencefollows A. It is easy to prove
that best constant rebalanced portfolio (BCRP) (b = 12 , 12 ) can grow by a factor
n/2
of 89 , while PAMR can grow by a better factor of 32 × 2(n−1)/2 . Note that this
virtual sequence is essentially single-period mean reversion, which perfectly fits with
PAMR and CWMR’s assumption. However, if market sequence does not satisfy such
an assumption, both PAMR and CWMR would fail badly. Let us extend the market
sequence to a two-period reversion, that is, market sequence B. In such a market,
BCRP can achieve the same growth as before. Contrarily, PAMR can achieve a con-
stant wealth 32 , which has no growth! More generally, if we further extend to k-period
mean reversion, 1 BCRP
can still achieve the same growth, while PAMR will grow to
(n−1)× − 1
2× 2
3 1 2 k , which definitely approaches bankruptcy if k ≥ 3.
11.2 Formulations
In this chapter, we adopt two types of moving average. The first, the so-called simple
moving average (SMA), truncates the historical prices via a window and calculates
its arithmetical average:
1
t
SMAt (w) = pi ,
w
i=t−w+1
where w denotes the window size and the summation is element-wise. Although
we can enlarge the window size such that SMA can include more historical price
relatives, the empirical evaluations in Part IV show that as the window size increases,
its performance drops.
To consider entire price relatives rather than a window, the second type, exponen-
tial moving average (EMA), adopts all historical prices, and each price is exponentially
weighted,
EMA1 (α) = p1
EMAt (α) = αpt + (1 − α)EMAt−1 (α)
= αpt + (1 − α)αpt−1 + (1 − α)2 αpt−2 + · · · + (1 − α)t−1 p1 ,
∗ We calculate OLMAR’s growth using Algorithm 11.1. As the market sequences repeat themselves,
OLMAR will finally stabilize.
where α ∈ (0, 1) denotes the decaying factor and the operations are all element-wise.
Based on the expected price relative vector in Equations 11.1 and 11.2, OLMAR
further adopts the idea of an effective online learning algorithm, that is, passive–
aggressive (PA) (Crammer et al. 2006) learning, to exploit the MAR. Generally
proposed for classification, PA passively keeps the previous solution if the classi-
fication is correct, while aggressively approaches a new solution if the classification
is incorrect. After formulating the proposed OLMAR, we solve its closed-form update
and design specific algorithms.
The proposed formulation, OLMAR, is to exploit MAR via PA online learning.
The basic idea is to maximize the expected return b · x̃t+1 and keep last portfolio
information via a regularization term. Thus, we follow the similar idea of PAMR (Li
et al. 2012) and formulate an optimization as follows.
1
bt+1 = arg min b − bt 2 s. t. b · x̃t+1 ≥ .
b∈m 2
Note that we adopt expected return rather than expected log return. According to
Helmbold et al. (1998), to solve the optimization with expected log return, one can
adopt the first-order Taylor expansion, which is essentially linear. Such discussions
are illustrated in Sections 9.2 and 10.2.
The above formulation explicitly reflects the basic idea of the proposed OLMAR.
On the one hand, if its constraint is satisfied, that is, the expected return is higher than
a threshold, then the resulting portfolio becomes equal to the previous portfolio. On
the other hand, if the constraint is not satisfied, then the formulation will figure out
a new portfolio such that the expected return is higher than the threshold, while the
new portfolio is not far from the last one.
Since OLMAR follows the same learning principle as PAMR, their formulations
are similar. However, the two formulations are essentially different. In particular,
PAMR’s core constraint (i.e., b · xt ≤ ) adopts the raw price relative and has a dif-
ferent inequality sign. After a certain transformation, PAMR may be written in a
similar form, as shown in Table 11.1. However, the prediction functions are different
(i.e., OLMAR adopts multiperiod mean reversion, while PAMR exploits single-period
mean reversion).
where x̄t+1 = m1 (1 · x̃t+1 ) denotes the average predicted price relative, and λt+1 is
the Lagrangian multiplier calculated as
" #
− bt · x̃t+1
λt+1 = max 0, .
x̃t+1 − x̄t+1 12
end
end
Normalize bt+1 :
bt+1 = arg min b − bt+1 2
b∈m
end
project the portfolio to the simplex domain (Duchi et al. 2008), which costs linear
time.
To this end, we can design the proposed algorithm based on the proposition.
The proposed OLMAR procedure is demonstrated in Algorithm 11.1, and the OLPS
procedure utilizing the OLMAR algorithm is illustrated in Algorithm 11.2.
11.4 Analysis
The update of OLMAR is straightforward, that is, bt+1 = bt + λt+1 (x̃t+1 − x̄t+1 1).
This second part of the update formula, +λt+1 (x̃t+1 − x̄t+1 1), coincides with the
general form (Conrad and Kaul 1998, Eq. (1)) of return-based momentum strategies,
except the varying λt+1 . Intuitively, the update divides assets into two groups by pre-
diction average. For assets in the group with higher predictions than average, OLMAR
increases their proportions; for other assets, OLMAR decreases their proportions. The
transferred proportions are related to the surprise of predictions over their average
value and the nonnegative Lagrangian multiplier. This is consistent with the normal
portfolio selection procedure, that is, to transfer the wealth to assets with a better
prospect to grow.
Clearly, the OLMAR update costs linear time per period with respect to m, and
the normalization step can also be implemented in linear time (Duchi et al. 2008).
To the best of our knowledge, OLMAR’s linear time is no worse than any existing
algorithms, which can be inferred from Table 10.3.
Empirical Studies
93
Implementations
95
12.1.1 Preprocess
This step aims to prepare trading environments. As existing datasets are often in MAT
files,∗ OLPS accepts datasets in MAT format. The dataset often contains an n × m
matrix, where n denotes the number of trading periods and m refers to the number of
assets. It is straightforward to incorporate market feeds† from real markets, such that
the toolkit can handle real-time data and conduct paper or even real trading.‡
12.1.3 Postprocess
After the algorithmic trading simulation, this step processes the results by providing
the following performance metrics:
• Cumulative return: The most widely used in related studies;
• Volatility and Sharpe ratio: Typically used to measure risk-adjusted return in the
investment industry;
• Drawdown and Calmar ratio: Used to measure downside risk and related risk-
adjusted return;
• T-test statistics: Tests whether a strategy’s return is significantly different from that
of the market.
12.2 Data
In our study, we focus on historical daily closing prices in stock markets, which are
easy to obtain from public domains (such as Yahoo Finance and Google Finance∗ ),
and thus are publicly available to other researchers. Data from other types of markets,
such as high-frequency intraday quotes† and Forex markets, are either too expensive
or hard to obtain and process, and thus may reduce the experimental reproducibil-
ity. Summarized in Table 12.1, six real and diverse datasets from several financial
markets‡ are employed.
The first dataset, “NYSE (O),” is one “standard” dataset pioneered by Cover
(1991) and followed by others (Helmbold et al. 1998; Borodin et al. 2004; Agarwal
et al. 2006; Györfi et al. 2006, 2008). This dataset contains 5651 daily price relatives
of 36 stocks§ in the New York Stock Exchange (NYSE) for a 22-year period from
July 3, 1962, to December 31, 1984.
The second dataset is an extended version of the NYSE (O) dataset. For consis-
tency, we collected the latest data in the NYSE from January 1, 1985, to June 30,
2010, a period that consists of 6431 trading days. We denote this new dataset as
“NYSE (N).”¶ Note that the new dataset consists of 23 stocks rather than the pre-
vious 36 stocks owing to amalgamations and bankruptcies. All self-collected price
relatives are adjusted for splits and dividends, which is consistent with the previous
“NYSE (O)” dataset.
The third dataset, “TSE,” is collected by Borodin et al. (2004), and it consists
of 88 stocks from the Toronto Stock Exchange (TSE) containing price relatives of
1259 trading days, ranging from January 4, 1994, to December 31, 1998. The fourth
dataset, SP500, is collected by Borodin et al. (2004), and it consists of 25 stocks
with the largest market capitalizations in the 500 SP500 components. It ranges from
January 2, 1998, to January 31, 2003, containing 1276 trading days.
The fifth dataset is “MSCI,” which is a collection of global equity indices that
constitute the MSCI World Index.∗ It contains 24 indices that represent the equity
markets of 24 countries around the world, and it consists of a total of 1043 trading
days, ranging from April 1, 2006, to March 31, 2010. The final dataset is the DJIA
dataset (Borodin et al. 2004), which consists of 30 Dow Jones composite stocks. DJIA
contains 507 trading days, ranging from January 14, 2001, to January 14, 2003.
Besides the six real-market data, in the main experiments (i.e., Experiment 1 in
Section 13.1), we also evaluate each dataset in their reversed form (Borodin et al.
2004). For each dataset, we create a reversed dataset, which reverses the original
order and inverts the price relatives. We denote these reverse datasets using a ‘−1’
superscript on the original dataset names. In nature, these reverse datasets are quite
different from the original datasets, and we are interested in the behaviors of the
proposed algorithms on such artificial datasets.
Unlike previous studies, the above testbed covers much longer trading peri-
ods from 1962 to 2010 and much more diversified markets, which enables us to
examine the behaviors of the proposed strategies under different events and crises.
For example, it covers several well-known events in the stock markets, such as the
∗ The constituents of the MSCI World Index are available on MSCI Barra ([Link]
accessed on 28 May 2010.
12.3 Setups
In our experiments, we implemented all the proposed approaches: CORN-U,
CORN-K, PAMR, PAMR-1, PAMR-2, CWMR-Var, CWMR-Stdev, OLMAR-1, and
OLMAR-2. For CWMR algorithms, we only present the results achieved by the
deterministic versions. The results of the stochastic versions are presented in Li et al.
(2013). Besides individual algorithms, we also designed their buy and hold (BAH)
versions whose results can be found on their respective studies (Li et al. 2011b,
2012, 2013; Li and Hoi 2012). Without ambiguity, when referring to CORN, PAMR,
CWMR, and OLMAR, we often focus on their representative versions, that is,
CORN-U, PAMR, CWMR-Stdev, and OLMAR-1, respectively.
As the proposed algorithms are all online, we follow the existing work and simply
set the parameters empirically without tuning for each dataset separately. Note that
the best values for these parameters are often dataset dependent, and our choices are
not always the best, as we will further evaluate in Section 13.3. Below, we introduce
the parameter settings of the proposed algorithms.
For the proposed CORN experts, two possible parameters can affect their perfor-
mance, that is, the correlation coefficient threshold ρ and the window size w. In our
evaluations, we simply fix ρ = 0.1 and W = 5 for the CORN-U algorithm, which is
not always the best. And for the CORN-K algorithm, we first fix W = 5, P = 10,
and K = 50, which means choose all experts in the experiments and denote it as
“CORN-K1.” We also provide “CORN-K2,” whose parameters are fixed as W = 5,
P = 10, and K = 5.
There are two key parameters in the proposed PAMR algorithms. One is the
sensitivity parameter , and the other is the aggressiveness parameter C. Specifically,
for all datasets and experiments, we set the sensitivity parameter to 0.5 in the
three algorithms, and set the aggressiveness parameter C to 500 in both PAMR-1
and PAMR-2, with which the cumulative wealth achieved tends to be stable on most
datasets. Our experiments on the parameter sensitivity show that the proposed PAMR
algorithms are quite robust with respect to different parameter settings.
CWMR has two key parameters, that is, the confidence parameter φ and the
sensitivity parameter . We set the sensitivity parameter to 0.5 and set the confi-
dence parameter φ to 2.0, or equivalently 95% confidence level, in both CWMR-Var
and CWMR-Stdev. As the results show, the proposed CWMR algorithm is generally
the less the strategy’s (downside) risk. The higher the CR values, the better the strat-
egy’s (downside) risk-adjusted return. We summarize them in Table 12.2 and present
their details as follows.
12.5 Summary
A strategy has to be back-tested using historical market data, such that we have
confidence that it will continue to be effective in the unseen future markets. This
chapter introduces some implementation issues for the empirical studies, including the
platform, data, and various setups. In future, we can further extend the online portfolio
selection (OLPS) system using real-market feeds and execute the orders using a
paper trading account or real trading account. The next chapter will demonstrate the
empirical results obtained from the implementation and corresponding back-tests.
Empirical Results
This chapter introduces the empirical results of the algorithms using the historical
market data. These results will demonstrate the effectiveness of these strategies
and provide confidence on their practicability in real trading. We also relax some
constraints to evaluate their capability in real trading scenarios.
This chapter is organized as follows. Section 13.1 conducts the experiments
to evaluate the cumulative wealth for all the algorithms. Section 13.2 shows the
experimental results of risk-adjusted returns. Section 13.3 measures the sensitivity
of parameters for these algorithms. Section 13.4 relaxes transaction costs and mar-
gin buying constraints. Section 13.5 compares the computational times for different
algorithms. Section 13.6 further analyzes the behaviors of the proposed algorithms.
Finally, Section 13.7 summarizes this chapter and proposes some future directions.
103
1012 106
108
103
4
10
100
100 1 1500 3000 4500 6000
1 1500 3000 4500
Trading days Trading days
(a) (b)
103
Total wealth achieved
Total wealth achieved
5
102
101
100 1
1 300 600 900 1200 1 300 600 900 1200
Trading days Trading days
(c) (d)
27
Total wealth achieved
Total wealth achieved
9 2
1
1
(e) (f)
Figure 13.1 Trends of cumulative wealth achieved by various strategies on the six datasets:
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
the evaluation results on the six datasets. In addition to the proposed four algorithms,
we also plot two benchmarks (Market and BCRP) and two state-of-the-art algorithms
(Anticor and BNN ). In particular, Figure 13.2a and 13.2b depicts the volatility risk
(standard deviation of daily returns) and the drawdown risk (maximum drawdown)
on the six stock datasets. Figure 13.2c and 13.2d compares their corresponding SRs
and CRs.
In the preceding results on cumulative wealth, we find that the proposed methods
achieve the highest cumulative return on most original datasets. However, high return
is associated with high risk, as no real financial instruments can guarantee high return
80
100
Volatility risk (%)
60
40
40
20
20
0 0
NYSE (O) NYSE (N) TSE SP500 MSCI DJIA NYSE (O) NYSE (N) TSE SP500 MSCI DJIA
Datasets Datasets
(a) (b)
10
10
Calmar ratio
Sharpe ratio
0 0
NYSE (O) NYSE (N) TSE SP500 MSCI DJIA NYSE (O) NYSE (N) TSE SP500 MSCI DJIA
Datasets Datasets
Market Anticor CORN CWMR Market Anticor CORN CWMR
BCRP ΒΝΝ PAMR OLMAR BCRP ΒΝΝ PAMR OLMAR
(c) (d)
Figure 13.2 Risk and risk-adjusted performance of various strategies on the six datasets.
In each diagram, the rightmost four bars represent the results of our proposed strategies:
(a) volatility risk; (b) drawdown risk; (c) Sharpe ratio; and (d) Calmar ratio.
without high risk.∗ The volatility risk in Figure 13.2a shows that the proposed four
methods almost achieve the highest risk in terms of volatility risk on most datasets. On
the other hand, the drawdown risk in Figure 13.2b shows that the proposed methods
also achieve high drawdown risk in most datasets. These results validate the notion
that high return is often associated with high risk.
To further evaluate the return and risk, we examine the risk-adjusted return in
terms of an annualized SR and CR. The results in Figure 13.2c and 13.2d clearly
show that CORN, PAMR, and CWMR achieve excellent performance in most cases,
∗ It is true for the long-only portfolio, which is our setting. However, such a statement may be suspect
in regard to long-short portfolios.
106
1012
Total wealth achieved
104 102
100 100
–1 –0.5 0 0.5 1 –1 –0.5 0 0.5 1
ρ ρ
(a) (b)
9 15
5 8
1
1
–1 –0.5 0 0.5 1 –1 –0.5 0 0.5 1
ρ ρ
2
64
Total wealth achieved
16
4 1
(e) (f)
Figure 13.3 Parameter sensitivity of CORN-U with respect to ρ with fixed W (W = 5):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
we choose = 0.5 in the experiments, with which the cumulative wealth stabilizes
in most cases. Contrarily, on the DJIA dataset, as approaches 0, the cumulative
wealth achieved by PAMR drops. Such phenomena can be interpreted to mean that
the motivating (single-period) mean reversion does not exist on the dataset, at least in
1012
Total wealth achieved
108
104
104 102
100 100
1 10 20 30 1 10 20 30
W W
9 15
Total wealth achieved
5
8
1
1
1 10 20 30 1 10 20 30
W W
CORN-U CORN CORN-U CORN
Market BCRP Market BCRP
(c) (d)
2
64
Total wealth achieved
Total wealth achieved
16
4 1
1 10 20 30 1 10 20 30
W W
Figure 13.4 Parameter sensitivity of CORN-U with respect to w (W) with fixed ρ (ρ = 0.1):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
the sense of our motivation. We also note that, on some datasets, PAMR with = 0
achieves the best. Though = 0 means moving more weights to underperforming
stocks, it may not mean moving everything to the worst stock. On the one hand, the
objectives in the formulations would prevent the next portfolio from being far from the
108
103
104
100 100
0 0.5 1 1.5 0 0.5 1 1.5
ε ε
PAMR Market BCRP PAMR Market BCRP
(a) (b)
103 16
102
4
101
100
1
0 0.5 1 1.5 0 0.5 1 1.5
ε ε
PAMR Market BCRP PAMR Market BCRP
(c) (d)
16
Total wealth achieved
Total wealth achieved
4
1
(e) (f)
Figure 13.5 Parameter sensitivity of PAMR with respect to : (a) NYSE (O); (b) NYSE (N);
(c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
last portfolio. On the other hand, PAMR-1 and PAMR-2 are designed to alleviate the
huge changes. In summary, the experimental results indicate that the proposed PAMR
is robust with respect to the mean reversion sensitivity parameter, in most cases.
Second, we evaluate the other important parameter for both PAMR-1 and
PAMR-2, that is, the aggressiveness parameter C. Figure 13.6 shows the effects on the
108
103
4
10
100 100
50 500 5000 50 500 5000
C C
(a) (b)
103
4
Total wealth achieved
Total wealth achieved
102
101
100
1
50 500 5000 50 500 5000
C C
(c) (d)
2
16
Total wealth achieved
4
1
(e) (f )
Figure 13.6 Parameter sensitivity of PAMR-1 (or PAMR-2) with respect to C with fixed
( = 0.5): (a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
108
103
4
10
100
100
0 0.5 1 1.5
0 0.5 1 1.5
ε ε
CWMR Market BCRP CWMR Market BCRP
(a) (b)
103
16
102
4
101
100
1
0 0.5 1 1.5 0 0.5 1 1.5
ε ε
(c) (d)
3
16
Total wealth achieved
Figure 13.7 Parameter sensitivity of CWMR with respect to : (a) NYSE (O); (b) NYSE (N);
(c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
initialized to uniform portfolio. If α = 0, then its expected price relative vector equals
x̃t+1 = t 1 x . Such price relatives inversely relate to all of an asset’s historical price
i=1 i
relatives and produce bad results.
Nevertheless, all the above observations show that OLMARs’ performance is
robust to their parameters, and it is convenient to choose satisfying parameters.
1012 106
108
103
104
100 100
1 10 100 1000 1 10 100 1000
ε ε
(a) (b)
103
Total wealth achieved
Total wealth achieved
10
102
101
1
100
1 10 100 1000 1 10 100 1000
ε ε
(c) (d)
30
3
Total wealth achieved
Total wealth achieved
20
10
1
1 10 100 1000 1 10 100 1000
ε ε
OLMAR Market BCRP OLMAR Market BCRP
(e) (f)
Figure 13.8 Parameter sensitivity of OLMAR-1 with respect to with fixed w (w = 5):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
1016
Total wealth achieved
108
103
104
100 100
25 50 75 100 25 50 75 100
W W
BAH (OLMAR) OLMAR BAH (OLMAR) OLMAR
Market BCRP Market BCRP
(a) (b)
103
102
30
Total wealth achieved
Total wealth achieved
101
100 20
10
1
25 50 75 100 25 50 75 100
W W
BAH (OLMAR) OLMAR BAH (OLMAR) OLMAR
Market BCRP Market BCRP
(c) (d)
30
4
Total wealth achieved
Total wealth achieved
20
3
10 2
1
1
25 50 75 100 25 50 75 100
W W
BAH (OLMAR) OLMAR BAH (OLMAR) OLMAR
Market BCRP Market BCRP
(e) (f)
Figure 13.9 Parameter sensitivity of OLMAR-1 with respect to w with fixed ( = 10):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
109
1016
Total wealth achieved
108
103
104
100 100
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
α α
OLMAR Market BCRP OLMAR Market BCRP
(a) (b)
25
103
102
5
101
100
1
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
α α
OLMAR Market BCRP OLMAR Market BCRP
(c) (d)
25
3
Total wealth achieved
5
2
1 1
(e) (f)
Figure 13.10 Parameter sensitivity of OLMAR-2 with respect to α with fixed ( = 10):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
First, the transaction cost is an important and unavoidable issue that should be
addressed in practice. To test the effects of transaction cost on the proposed strategies,
we adopt the proportional transaction cost model stated in Section 2.2. Figure 13.11
depicts the effects of proportional transaction cost when the algorithms are applied
on the six datasets, where the transaction cost rate γ varies from 0% to 1%. We
present only the results achieved by three representative algorithms (CORN, PAMR,
108
103
104
100 100
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Transaction costs (γ%) Transaction costs (γ%)
(a) (b)
20
103
Total wealth achieved
Total wealth achieved
5
102
101
1
100
(c) (d)
20 4
Total wealth achieved
2
5
(e) (f)
Figure 13.11 Scalability of the proposed strategies with respect to the transaction cost rate (γ):
(a) NYSE (O); (b) NYSE (N); (c) TSE; (d) SP500; (e) MSCI; and (f) DJIA.
From the results, we can clearly see that CORN and the state-of-the-art algorithms
have high costs, and in all cases the proposed PAMR, CWMR, and OLMAR take
significantly less computational time than others. Even though the computational
time in daily back-tests, especially per trading day, is small, it is important in certain
scenarios such as high-frequency trading (Aldridge 2010), where transactions may
occur in fractions of a second. Nevertheless, the results obviously demonstrate the
computational efficiency of three proposed mean reversion strategies, which further
enhances their real-world large-scale applicability.
∗ Due to the table constraints, we use indices to represent individual assets, whose symbols are available
at [Link]
† We ignore TSE, which has too much assets (m = 88) to show.
‡ Uniform CRP will be constant at uniform portfolio. This is approachable but not achievable as λ > 0.
Std 0.0135 0.0171 0.0128 0.0170 0.0140 0.0257 0.0156 0.0136 0.0371
Ac 0.1344 0.1206 0.0817 0.0952 0.0927 0.0378 0.0615 0.0479 −0.0217
10 11 12 13 14 15 16 17 18
Cum 14.16 10.70 6.85 7.86 6.75 7.64 32.65 30.61 12.21
Mean 1.0005 1.0006 1.0005 1.0005 1.0004 1.0004 1.0009 1.0008 1.0005
Std 0.0115 0.0175 0.0159 0.0138 0.0137 0.0130 0.0224 0.0202 0.0134
Ac 0.1114 0.1312 0.0455 0.0766 0.0637 0.0744 0.0449 0.0626 0.0064
19 20 21 22 23 24 25 26 27
Cum 4.81 8.92 17.22 10.36 4.13 6.21 4.31 22.92 14.43
Mean 1.0004 1.0010 1.0006 1.0005 1.0015 1.0004 1.0005 1.0010 1.0006
Std 0.0146 0.0346 0.0151 0.0148 0.0505 0.0149 0.0230 0.0313 0.0142
Ac 0.1301 0.0243 0.0956 0.1047 −0.2089 −0.0042 0.0226 −0.0915 0.1002
28 29 30 31 32 33 34 35 36
Cum 5.98 15.21 54.14 6.98 16.20 43.13 4.25 6.54 5.39
Mean 1.0004 1.0006 1.0008 1.0004 1.0006 1.0008 1.0004 1.0005 1.0004
Std 0.0139 0.0161 0.0153 0.0117 0.0159 0.0174 0.0143 0.0178 0.0143
Ac 0.0858 0.0697 0.1004 0.0870 0.0880 0.1024 0.0873 0.0626 0.0257
Note: “Cum” denotes the cumulative return (product of all price relatives) of an asset. “Mean” refers to one asset’s
0.6 0.035
0.4
Portfolio mean and std
0.2 0.03
0.028
0
–0.2 0.025
0 10 20 30 0 10 20 30
Asset # Asset #
(a) (b)
0.6 0.8
0.6
Portfolio mean and std
0.4
Portfolio mean and std
0.4
0.2
0.2
0
0
–0.2 –0.2
0 10 20 30 0 10 20 30
Asset # Asset #
(c) (d)
0.8 0.8
0.6 0.6
Portfolio mean and std.
Portfolio mean and std
0.4 0.4
0.2 0.2
0 0
–0.2 –0.2
0 10 20 30 0 10 20 30
Asset # Asset #
(e) (f)
Figure 13.12 Distributions of portfolio weights. The x-axis denotes indices of assets, and the
y-axis is each asset’s average weight. For each asset, the center of an error bar denotes its
portfolio mean (over 5651 trading days), and vertical lines denote its standard deviations:
(a) BCRP; (b) EG; (c) ONS; (d) BK ; (e) BNN ; and (f) CORN. (Continued)
0.4 0.4
Portfolio mean and std
0 0
–0.2 –0.2
0 10 20 30 0 10 20 30
Asset # Asset #
(g) (h)
0.6
0.4
Portfolio mean and std
0.2
–0.2
0 10 20 30
Asset #
(i)
Figure 13.12 (Continued) Distributions of portfolio weights. The x-axis denotes indices of
assets, and the y-axis is each asset’s average weight. For each asset, the center of an error
bar denotes its portfolio mean (over 5651 trading days), and vertical lines denote its standard
deviations: (g) Anticor; (h) PAMR; and (i) OLMAR.
Li et al. 2011a) that the asset is important in all these approaches. Moreover, the
increasing top five weights, which indicate more active exploitations, may lead to
their increased performance. However, their volatilities also show that the subsets of
assets are changing from day to day, which is inconvenient from the point of view
of transaction costs. Anyway, such observations confirm that their pattern-matching
process is improving and validate CORN’s motivation.
The three mean reversion algorithms (Anticor, PAMR, and OLMAR) generally
concentrate on the top five volatile stocks, as shown in Figure 13.12g through i and
Table 13.6, while their orders may vary. Since Anticor, PAMR/CWMR, and OLMAR,
in general, achieve the best performance on most other datasets, we also plot their
average allocations in Table C.5,∗ in Appendix C. From the figure and tables, we can
have several observations. First, similar to the pattern matching–based approaches,
these algorithms have much higher volatilities than EG or ONS. However, different
from the pattern matching–based algorithms, which only have higher volatilities on
∗ We ignore their corresponding figures, which are similar to Figure 13.12.
Asset # 23 20 9 6 26 Asset # 23 9 26 6 20
BNN 0.21 0.15 0.08 0.08 0.08 CORN 0.38 0.09 0.09 0.09 0.08
Asset # 20 23 9 26 6 Asset # 23 20 9 26 6
Anticor 0.11 0.10 0.10 0.06 0.05 PAMR 0.19 0.11 0.11 0.08 0.06
top five weighted assets, the three algorithms also have much higher volatilities on
other assets. Concerning their performance, it is possible that to achieve better per-
formance, a portfolio has to be frequently rebalanced, not only on certain assets as
the pattern matching–based algorithms do but also on all assets.
Second, most average weights of the state-of-the-art algorithms are assigned to
the assets with the highest volatilities (highest Std values). It is common knowl-
edge that high return is often associated with high risk,∗ while the reverse is not
always true. That is, although a portfolio has to be rebalanced among volatile assets,
such that the portfolio can gain profits from market volatility, high volatility can-
not guarantee high profit. For example, on the NYSE (O) dataset, although Anticor
and PAMR have the same top five average allocation pool, their performances are
drastically different.
Third, PAMR, which systematically exploits the mean reversion property, rebal-
ances more actively than Anticor, and OLMAR rebalances even more actively.
Connecting the rebalance activities to their performance, we may conclude that even
though both are based on the same principle, more active rebalance leads to better
performance, as it can better exploit market volatility. PAMR’s concentration on asset
#23, which has the highest negative autocorrelation, sheds lights on the possible con-
nection between mean reversion algorithms and the autocorrelation among assets (Lo
and MacKinlay 1990; Conrad and Kaul 1998; Lo 2008). Moreover, from Table C.5,
we can observe that most of the top average allocation weights of the mean reversion
algorithms are assets with negative autocorrelations, except DJIA.
13.7 Summary
In this chapter, we empirically evaluated the four proposed algorithms. The empiri-
cal results clearly validate the effectiveness of the proposed algorithms. In terms of
cumulative wealth, which is the main performance metric, our proposed algorithms
sequentially beat the state-of-the-art algorithms. In terms of (volatility/drawdown)
risk-adjusted return, the proposed algorithms achieve high risk-adjusted returns,
∗ Such a statement is true in traditional finance. However, in recent years, some arbitrage strategies,
which can earn return without high risk, have emerged.
Threats to Validity
Profitable real trading systems are complex systems, involving varying market
scenarios. While the empirical results have demonstrated the effectiveness of the
proposed strategies, there is still a long way to the production stage. In this chapter,
we provides some arguments of various assumptions made during the trading model,
back tests, and so on.
This chapter is organized as follows: Section 14.1 discusses the assumptions on
the model, and Section 14.2 discusses the assumptions on the mean reversion princi-
ples. Section 14.3 discusses the proposed algorithms from a theoretical perspective.
Section 14.4 validates the empirical studies. Finally, Section 14.5 summarizes this
chapter and proposes some future directions.
129
∗ However, we cannot say that we have removed or eliminated the impact of the bid–ask spread.
14.4 On Back-Tests
Due to the unavailability of the intraday data and order books, we have conducted
all the experiments based on public daily data, even though it may suffer from cer-
tain potential problems. One potential problem is that our algorithms may be earning
“dealer’s profits” in an uncontrolled and unfair way, or simply they are earning from
the “bid–ask bounce” (Mcinish and Wood 1992; Porter 1992), which denotes a result
of trades replacing the market maker’s bid or ask quotes. This suspicion is compat-
ible with the algorithms being contrarian strategies, such as PAMR, CWMR, and
OLMAR. To eliminate this possibility, it would be good to try to eliminate the bid–
ask bounce by replacing the market prices by the midpoint of the best bid and ask
∗ Borodin et al. (2004) failed to provide a regret bound for Anticor strategy, which passively exploits
the mean reversion idea.
∗ In fact, we collected this dataset following Li et al. (2012)’s review comments, which means the dataset
does not exist before its third-round submission.
14.5 Summary
This chapter argued some assumptions in our models and back-tests, which will be
faced by various empirical research in trading strategies. When back-testing a strategy,
researchers should be aware of these assumptions and thus can take measures to
weaken their impacts on the profits in real trading.
Conclusion
135
Conclusions
15.1 Conclusions
This book aims to advance the state of the art in online portfolio selection (OLPS).
Here, our objective is to achieve better performance on real markets. The main
principles we adopted are the principles of pattern matching and mean reversion.
For the principle of pattern analysis, we try to locate similar patterns from the
historical market and construct optimal portfolios based on these patterns. Observing
that existing pattern matching–based approaches often adopt Euclidean distance to
measure the similarity between two patterns, we find that Euclidean distance ignores
their linear similarity and whole-market movements. Thus, we proposed to measure
the similarity via a correlation coefficient, which considers both ignored aspects,
and designed the CORrelation-driven Nonparametric learning (CORN) approach for
OLPS. The proposed CORN performs much better than existing pattern matching–
based strategies, which validates its motivations.
For mean reversion, we directly output portfolios based on the principle, which
assumes that the price trends will revert to their previous trends. Firstly, we proposed
to exploit the principle via passive–aggressive learning, resulting in the passive–
aggressive mean reversion (PAMR). In particular, PAMR tries to obtain portfolios
that perform worse than a threshold on the last price relatives, and also close to
the last portfolio. PAMR’s formulation is clear to understand, and its closed-form
solutions reflect the mean reversion principle. PAMR can achieve the best empirical
performance on most datasets at the time.
Observing that most existing algorithms only exploit the first-order information
of a portfolio, we proposed to exploit the second-order information and the mean
reversion property via confidence-weighted learning, resulting in a new family of
strategies called confidence-weighted mean reversion (CWMR). It models the port-
folio as a Gaussian distribution and sequentially updates the distribution similar to
137
A.1 Introduction
A.1.1 Target Task
In this section, we briefly formulate the OLPS model, which will be used in our model.
Suppose we have a finite number of m ≥ 2 investment assets, over which an investor
can invest for a finite number of n ≥ 1 periods.
At the t-th period, t = 1, . . . , n, the asset (close) prices are represented by a
vector pt ∈ Rm + , and each element pt,i , i = 1, . . . , m represents the close price of
asset i. Their price changes are represented by a price relative vector xt ∈ Rm + , each
component of which denotes the ratio of the t-th close price to the last close price,
pt,i
that is, xt,i = pt−1,i . Thus, an investment in asset i throughout period t changes by a
factor of xt,i . Let us denote by x1n = {x1 , . . . , xn } a sequence of price relative vectors
for n periods, and xse = {xs , . . . , xe }, 1 ≤ s < e ≤ n as a market window.
143
m(t−1)
bt : R+ → m , t = 2, 3, . . . ,
where bt = bt (x1t−1 ) is the portfolio determined at the beginning of the t-th period
upon observing past market behaviors. We denote by bn1 = {b1 , . . . , bn } the strategy
for n periods, which is the output of an OLPS strategy.
At the t-th period, a portfolio bt produces a portfolio period return st , that is,
the wealth changes by a factor of st = b x
t t = m
i=1 bt,i xt,i . Since we reinvest and
adopt relative prices, the wealth would change multiplicatively. Thus, after n periods,
a portfolio strategy bn1 will produce a portfolio cumulative wealth of Sn , which changes
the initial wealth by a factor of nt=1 b t xt :
n
Sn (bn1 , x1n ) = S0 b
t xt ,
t=1
We present the framework of the above task in Algorithm A.1. In this task,
a portfolio manager’s goal is to produce a portfolio strategy (bn1 ) upon the market
price relatives (x1n ), aiming to achieve certain targets. He or she computes the portfo-
lios in a sequential manner. At each period t, the manager has access to the sequence
of past price relative vectors x1t−1 . He or she then computes a new portfolio bt for next
price relative vector xt , where the decision criterion varies among different managers.
∗ 0 denotes that each element of the vector is nonnegative.
A.1.2 Installation
A.1.2.1 Supported Platforms
OLPS is based on MATLAB (both 32- and 64-bit) and Octave (except the Graph-
ical User Interface [GUI] Part); thus, it is supported on 32- and 64-bit versions of
Linux, Mac OS, and Windows. The first version of OLPS is developed and tested on
MATLAB 2009a, while the latest version of OLPS is tested on MATLAB 2013a.
>> OLPS_gui
After executing the above command, the Trading Manager starts. As shown in
Figure A.1, the opening window has five buttons. The About and Exit buttons are
self-explanatory. The other three are the main functional buttons. The Algorithm
Analyser button will start a new window, in which the user can run a single algorithm
and analyze its performance relative to the basic benchmarks. The Experimenter but-
ton is used for selecting multiple algorithms and comparing their performances. The
Configuration button is used to add or delete algorithms and datasets that can be used
by the toolbox.
A.2.1.3 Experimenter
When devising trading strategies, we usually want to compare the performance of
these strategies relative to each other. For this purpose, we provide the Experimenter.
On pressing the Experimenter button, a new window opens that will offer us the plat-
form for comparing different strategies. First, the dataset is selected. From the list of
algorithms, a subset can be selected to be executed. Among the selected algorithms,
the input parameters have to be provided and saved (default values are already there).
Figure A.3 gives an example of comparing six strategies on the MSCI World Index
dataset. The six algorithms being compared are Uniform Buy & Hold, Uniform Con-
stant Rebalanced Portfolio, Best Constant Rebalanced Portfolio, Passive–Aggressive
Mean Reversion, Confidence Weighted Mean Reversion, and Online Moving Average
Reversion.
When the execution is over, the Results Manager shows all the basic performance
metrics of the algorithms. Since we have two different managers—one for analyzing a
single algorithm and one for comparing multiple algorithms—we made two different
Results Managers.
Results Manager 1 The first Results Manager for the Algorithm Analyzer is shown
in Figure A.4. The table in the window quantifies the results of the algorithm as
compared to the basic benchmarks. The numbers from this table can directly be
copied and pasted. There is a large graph space that displays the information on a
particular attribute selected in the left column.
Returns It contains information about the daily performance of the algorithm.
The user can choose to view the cumulative returns and the daily returns. The option
of a log (base 10) plot is provided for easier visualization when the difference in
performance of the algorithm and the benchmarks is significantly high.
Risk Analysis There are five metrics to evaluate the risk and risk-adjusted returns
of the algorithm. They are the Sharpe Ratio, Calmar Ratio, Sortino Ratio, Value at
Risk, and Maximum Draw down. An input box called Window is provided next to each
metric. The purpose of the window is to analyze the consistency of the algorithm,
instead of just the final result. For example, entering 252 in the Sharpe Ratio Window
will plot a graph of the Sharpe Ratio of the algorithm for time period t − 252 to t, for
all t. When the window size is large such that t is less than the window size, then the
computation starts from t = 1. The risk metrics are assumed to be zero for the first
50 time periods. This has been done to avoid extreme values due to lack of data in
the initial periods.
Portfolio Analysis The Portfolio Allocation shows the distribution of wealth
allocated to each asset by the algorithm. The Step by Step helps us look at the port-
folio allocation for any particular given day. Lastly, we have a portfolio Animation
that accepts an input called Window. Visualizing portfolio changes based on daily
frequency can be overwhelming and difficult to interpret, especially when the daily
portfolio changes are significant. Instead, we allow the user to choose a moving aver-
age portfolio of the last Window number of days. This results in a smoother change
of the portfolio allocation.
Result Manager 2 The second Results Manager is very similar to the first manager,
except that it is designed for the Experimenter. The table in the window quantifies
the performance of the algorithms relative to each other. Like the first manager, this
manager also has three sections. A preview of this manager can be seen in Figure A.5.
Returns The daily returns across the entire time period of the dataset for all the
algorithms can be overwhelming to view. A time period can be selected, and the daily
performance of the algorithms is displayed for only that time period.
Risk Analysis This section is almost identical to that of the first Results Manager.
The only difference is that here the metrics are evaluated for every algorithm and
displayed together.
Portfolio Analysis This shows the distribution of portfolio allocation for all the
algorithms.
New Strategy A template (“template.m”) has been provided in the Strategy folder
that is based on the general framework for OLPS (as described in Algorithm A.1).
The user should enter his code to learn the new portfolio within the specified region
of the loop. Without any changes to the code, the template will behave as a Uniform
Constant Rebalanced Portfolio strategy, owing to the fact that we start with a uniform
portfolio and never update it. All new strategies coded must remain in the Strategy
folder. Once the files are created in the folders, the configuration should be changed
using the Configuration Manager GUI, which controls the loading of algorithms and
datasets into the Trading Manager.
New Dataset A dataset is in the form of price relative vectors of various assets. The
t-th row represents the price relative of all the assets at time t. The user just has to
save the new price relative matrix in the Data folder. Data of different frequencies
can be used as well. All the datasets provided in the toolbox are of daily frequency.
Once the files are created in the folders, the configuration should be changed using the
Configuration Manager GUI, which controls the loading of algorithms and datasets
into the Trading Manager.
Configuration The configuration determines the algorithms and datasets being used
in the toolbox. Within the config folder, there is a file called config. This is the active
configuration, which means the toolbox uses this file to determine which algorithms
and datasets would be preloaded. There is another file config_default, which is the
configuration provided by the toolbox. Initially, the content of the default and the
active configuration are the same. A new configuration can be created by clicking
on the Configuration button in the start window. It automatically loads the active
configuration, to which the user can add or delete new algorithms or datasets.
The Trading Manager, as shown in Algorithm A.2, controls the whole simulation
of OLPS. At the start (Line 2), it loads market data from the specified dataset. Note
that this can be easily extended to load data from real brokers. Then, Lines 3 and 8
open and close two logging files, one for text and one for .MAT format. Lines 4 and
6 measure the computational time of the execution of a specified strategy. Measuring
the time in the trading manager ensures a fair comparison of the computational time
among different strategies. Line 5 is the core component, which calls the specified
strategy with specified parameters. Section A.3 will illustrate all included strategies
and their usages. Line 7 analyzes the executed results of the strategy, which will be
introduced later. The “manager.m” usage is shown as follows.
Usage
function [cum_ret, cumprod_ret, daily_ret, ra_ret,
run_time]...
= manager(strategy name, dataset name, varargins,
opts);
• cum_ret: cumulative return;
• cumprod_ret: a vector of cumulative returns at the end of every trading day;
• daily_ret: a vector of daily returns at the end of every trading day;
• ra_ret: analyzed result;
• run_time: computational time of the core strategy (excluding the manager routine);
Example This example calls the ubah (Uniform Buy and Hold, or commonly
referred to as the market strategy) strategy on the “NYSE (O)” dataset.
Usage
Adding Your Own Strategy or Data Adding new strategies and datasets in the CLI
mode is similar to that in the GUI mode. Adding the strategy involves replacing the
portfolio update component of the algorithms, and adding a dataset involves storing
the market matrix and placing the files in the data folder.
A.2.2.2 Examples
Example 1 Calling a BCRP strategy on the SP500 dataset, mute verbosed outputs:
Example 2 Calling a BCRP strategy on the SP500 dataset, display verbosed outputs:
A.3.1 Benchmarks
In the financial markets, there exist various benchmarks (such as indices, etc.). In this
section, we introduce four benchmarks: Uniform Buy and Hold, Best Stock, Uniform
Constant Rebalanced Portfolios, and Best Constant Rebalanced Portfolios.
Usage
ubah(fid, data, {λ}, opts);
• fid: file handle for writing log file;
• data: market sequence matrix;
• λ ∈ [0, 1): proportional transaction cost rate; and
• opts: options for behavioral control.
Example Call market (uniform BAH) strategy on the “NYSE (O)” dataset with a
transaction cost rate of 0.
Its portfolio update can also be explicitly written as the same as Equation A.1, except
that the initial portfolio equals b◦ .
Usage
best(fid, data, {λ}, opts);
Example Call Best Stock strategy on the “NYSE (O)” dataset with a transaction
cost rate of 0.
n
Sn (CRP(b)) = b xt .
t=1
In
1 particular,
1
UCRP chooses a uniform portfolio as the preset portfolio, that is, b =
m , . . . , m .
Usage
ucrp(fid, data, {λ}, opts);
• fid: file handle for writing log file;
• data: market sequence matrix;
Example Call UCRP strategy on the “NYSE (O)” dataset with a transaction cost
rate of 0.
Usage
bcrp(fid, data, {λ}, opts);
• fid: file handle for writing log file;
• data: market sequence matrix;
• λ ∈ [0, 1): transaction costs rate; and
• opts: options for behavioral control.
Example Call BCRP strategy on the “NYSE (O)” dataset with a transaction cost
rate of 0.
Example Call Cover’s Universal Portfolios on the “NYSE (O)” dataset with default
parameters and a transaction cost rate of 0.
m η refersbito the learning rate and R(b, bt ) denotes relative entropy, or R(b, bt ) =
where
i=1 bi log bt,i . Solving the optimization, we can obtain EG’s portfolio explicit
update:
xt,i
bt+1,i = bt,i exp η /Z, i = 1, . . . , m,
bt · xt
where Z denotes the normalization term such that the portfolio element sums to 1.
Usage
eg(fid, data, {η, λ}, opts);
• fid: file handle for writing log file;
• data: market sequence matrix;
• η: learning rate;
• λ: transaction costs rate; and
• opts: options for behavioral control.
Example Call EG on the “NYSE (O)” dataset with a learning rate of 0.05 and a
transaction cost rate of 0.
t
β
bt+1 = arg max log(b · xτ ) − b.
b∈m 2
τ=1
Solving the optimization, we can obtain the explicit portfolio update of ONS:
1 1 −1
b1 = ,..., , bt+1 = A t
m (δAt pt ),
m m
with
t
t
xτ xτ 1 xτ
At = + Im , pt = 1 + ,
(bτ · xτ )2 β bτ · xτ
τ=1 τ=1
Example Call the ONS on the “NYSE (O)” dataset with a transaction cost rate of 0.
A.3.3.1 Anticorrelation
Description “Anticorrelation” (Anticor) (Borodin et al. 2004) transfers the
wealth from the outperforming stocks to the underperforming stocks via their
1
Mcov (i, j ) = (y1,i − ȳ1 ) (y2,j − ȳ2 ),
w−1
Mcov (i,j )
Mcor (i, j ) = σ1 (i)∗σ2 (j ) σ1 (i), σ2 (j )
= 0 .
0 otherwise
Then, following the cross-correlation matrix, Anticor moves the proportions from the
stocks increased more to the stocks increased less, in which the corresponding amounts
are adjusted according to the cross-correlation matrix. In particular, if asset i increases
more than asset j and their sequences in the window are positively correlated, Anticor
claims a transfer from asset i to j with the amount equaling the crosscorrelation
value (Mcor (i, j )) minus their negative autocorrelation values (min{0, Mcor (i, i)}
and min{0, Mcor (j, j )}). These transfer claims are finally normalized to keep the
portfolio in the simplex domain.
Example Call both Anticor algorithms on the “NYSE (O)” dataset with a maximal
window size of 30 and a transaction cost rate of 0.
1
bt+1 = arg min b − bt 2 s.t. (b; xt ) = 0,
b∈m 2
where (b; xt ) denotes a predefined loss function to capture the mean reversion
property,
0 b · xt ≤
(b; xt ) = .
b · xt − otherwise
Example Call the three PAMR algorithms on the “NYSE (O)” dataset with a mean
reversion threshold of 0.5, an aggressive parameter of 30, and a transaction cost rate
of 0.
Example Call the two CWMR algorithms on the “NYSE (O)” dataset with a
confidence parameter of 2, a mean reversion parameter of 0.5, and a transaction cost
rate of 0.
x̃t
x̃t+1 (α) = α1 + (1 − α) ,
xt
where α ∈ (0, 1) denotes the decaying factor and the operations are all element-wise.
Then, OLMAR’s formulation is
1
bt+1 = arg min b − bt 2 s.t. b · x̃t+1 ≥ .
b∈m 2
where x̄t+1 = m1 (1 · x̃t+1 ) denotes the average predicted price relative and λt+1
is the Lagrangian multiplier calculated as
" #
− bt · x̃t+1
λt+1 = max 0, .
x̃t+1 − x̄t+1 12
Example Call the two OLMAR algorithms on the “NYSE (O)” dataset with a mean
reversion threshold of 10, a window size of 5, a decaying factor of 0.5, and a trans-
action cost rate of 0.
where c and are the thresholds used to control the number of similar samples. Then,
it obtains an optimal portfolio via solving Equation A.2.
Usage
bk_run(fid, data, {K, L, c, λ}, opts);
Example Call the BK algorithm on the “NYSE (O)” dataset with default parameters
and a transaction cost rate of 0.
Example Call the BNN algorithm on the “NYSE (O)” dataset with default
parameters and a transaction cost rate of 0.
Example Below we call three CORN algorithms with their default parameters.
A.4 Summary
In this manual, we describe the OLPS toolbox in detail. OLPS is the first toolbox for
the research of OLPS problems. It is easy to use and can be extended to include new
algorithms and datasets. We hope this toolbox can facilitate further research on this
topic.
1
limlog Sn (B) = W ∗ almost surely.
n→∞ n
Before we give the theorem and its proof, we introduce some necessary lemmas.
Lemma B.1 (Breiman 1957 [Correction version 1960]) Let Z = {Zi }∞ −∞ be a sta-
i
tionary and ergodic process. For each positive integer i, let T denote the operator
that shifts any sequence {. . . , z−1 , z0 , z1 , . . .} by i digits to the left. Let f1 , f2 , . . . be
a sequence of real-valued functions such that limn→∞ fn (Z) = f (Z) almost surely
for some function f . Assume that E supn |fn (Z)| < ∞. Then,
1
n
lim fi (T i Z) = Ef (Z) almost surely.
n→∞ n
i=1
Lemma B.2 (Algoet and Cover 1988) Let Qn∈N ∪{∞} be a family of regular proba-
(j )
bility distributions over the set Rd+ of all market vectors such that E{| log Un |} < ∞
(1) (d)
for any coordinate of a random market vector Un = (Un , . . . , Un ) distributed
∗
according to Qn . In addition, let B (Qn ) be the set of all log-optimal portfolios with
∗ The proof idea is mainly provided by Vladimir Vovk, and the proof is then finished by Dingjiang
Huang and Bin Li.
171
Qn → Q∞ weakly as n → ∞,
Fk F∞ ⊆ F,
then
E max E[logb, X|Fk ] E max E[logb, X|F∞ ] ,
b b
Lemma B.4 Let μ be the Lebesgue measure on the Euclidean space R n and A
be a Lebesgue measurable subset of R n . Define the approximate density of A in
a ε-neighborhood of a point x in R n as
μ(A ∩ Bε (x))
dε (x) = ,
μ(Bε (x))
where Bε denotes the closed ball of radius ε centered at x. Then for almost every point
x of A the density,
d(x) = lim dε (x)
ε→0
cov(X, X )
√ √ ≥ ρ,
Var(X) Var(X )
Since both Var(X) and |E{X − X }| have the same order of magnitude,∗ they are in
the range 10−4 , 10−3 ; therefore, the previous inequality approximately means that
2Var(X)(1 − ρ) ≥ E{(X − X )2 }.
Lemma B.6 Assume that x1 , x2 , . . . are the realizations of the random vectors
X1 , X2 , . . . drawn from the vector-valued stationary and ergodic process {Xn }∞
−∞ .
The fundamental limits (determined in Algoet 1992, 1994; Algoet and Cover 1988),
reveal that the so-called log-optimum portfolio B∗ = {b∗ (·)} is the best possible
choice. More precisely, in trading period n, let b∗ (·) be such that
1 Sn
lim sup log ∗ ≤ 0 almost surely
n→∞ n Sn
and
1
lim log Sn∗ = W ∗ almost surely,
n→∞n
where
−1 −1
W ∗ = E max E{logb(X−∞ ), X0 |X−∞ } ,
b(·)
Theorem B.1 The portfolio scheme CORN is universal with respect to the class of
all ergodic processes such that E{| log X (j ) |} < ∞, for j = 1, 2, . . . , d.
Proof To prove that the strategy CORN is universal with respect to the class of all
ergodic processes, we need to prove that if, for each process in the class,
1
lim log Sn (B) = W ∗ almost surely,
n→∞ n
1
lim inf Wn (B) = lim inf log Sn (B) ≥ W ∗ almost surely.
n→∞ n→∞ n
Without loss of generality, we may assume S0 = 1, so that
1
Wn (B) = log Sn (B)
n
1
= log qω,ρ Sn ( (ω,ρ)
)
n ω,ρ
1
≥ log sup qω,ρ Sn ((ω,ρ) )
n ω,ρ
1
= sup(log qω,ρ + log Sn ((ω,ρ) ))
n ω,ρ
log qω,ρ
= sup Wn ( (ω,ρ)
)+ .
ω,ρ n
Thus,
log qω,ρ
lim inf Wn (B) = lim inf sup Wn ( (ω,ρ)
)+
n→∞ n→∞ ω,ρ n
log qω,ρ
≥ sup lim inf Wn ((ω,ρ) ) +
ω,ρ n→∞ n
= sup lim inf Wn ((ω,ρ) ). (B.1)
ω,ρ n→∞
The simple argument above shows that the asymptotic rate of growth of the
strategy B is at least as large as the supremum of the rates of growth of all elemen-
tary strategies (ω,ρ) . Thus, to estimate lim inf n→∞ Wn (B), it suffices to investigate
where I IA denotes the indicator of function of the set A. If the above set of Xi s is
(ω,ρ)
empty, then let Pj,s = δ(1,...,1) be the probability measure concentrated on the vector
(ω,ρ)
(1, . . . , 1). In other words, Pj,s (A) is the relative frequency of the vectors among
X1−j +ω , . . . , X0 that fall in the set A.
Observe that for all s, without probability 1,
→ P∗(ω,ρ)
(ω,ρ)
Pj,s s
−1
P −1 2 if P(E{(X−ω − s)2 } ≤ 2Var(s)(1 − ρ)) > 0
= X0 |E{(X−ω −s) }≤2Var(s)(1−ρ)
−1
δ(1,...,1) if P(E{(X−ω − s)2 } ≤ 2Var(s)(1 − ρ)) = 0
(B.2)
∗(ω,ρ) (ω,ρ)
weakly as j → ∞, where Ps denotes the limit distribution of Pj,s , and
P −1 2 denotes the distribution of the vector X0 conditioned on
X0 |E{(X−ω −s) }≤2Var(s)(1−ρ)
−1 2
the event E{(X−ω − s) } ≤ 2Var(s)(1 − ρ). To see this, let f be a bounded continuous
−1
function defined on Rd+ . Then, the ergodic theorem implies that if P(E{(X−ω − s)2 } ≤
2Var(s)(1 − ρ)) > 0, then
1
|1−j +ω| f (Xi )
∗(ω,ρ)
i−1 −s)2 }≤2Var(s)(1−ρ)
i:1−j +ω≤i≤0,E{(Xi−ω
f (x)Pj,s (dx) =
|1−j +ω| |{i:1−j +ω≤i≤0,E{(Xi−ω −s) }≤2Var(s)(1−ρ)}|
1 i−1 2
−1
On the other hand, if P(E{(X−ω − s)2 } ≤ 2Var(s)(1 − ρ)) = 0, then with proba-
(ω,ρ) (ω,ρ)
bility 1, Pj,s is concentrated on (1, . . . , 1) for all j, and f (x)Pj,s (dx) =
f (1, . . . , 1).
∗(ω,ρ)
respect to the limit distribution Ps . Then, using Lemma B.2, we infer from
Equation B.2 that, as j tends to infinity, we have the almost sure convergence
−1
lim b(ω,ρ) (X1−j , s), x0 = b∗ω,ρ (s), x0 ,
j →∞
∗(ω,ρ)
for Ps (almost all x0 ) and hence for PX0 (almost all x0 ). Since s was arbitrary,
we obtain
−1 −1 −1
lim b(ω,ρ) (X1−j , X−& ), x0 = b∗ω,ρ (X−ω ), x0 almost surely, (B.3)
j →∞
∞ −1 −1 −1
fi (x−∞ ) = logh(ω,ρ) (x1−i ), x0 = logb(ω,ρ) (x1−i , x−& ), x0
∞ = (. . . , x , x , x ). Note that
defined on x−∞ −1 0 1
d
∞ −1 (j )
|fi (X−∞ )| = | logh(ω,ρ) (X1−i ), x0 | ≤ | log X0 |,
j =1
∞ ∗ −1
fi (X−∞ ) → bω,ρ (X−ω ), X0 almost surely as i → ∞
n
Wn ((ω,ρ) ) = 1
n logh(ω,ρ) (X1i−1 ), Xi
i=1
n
∞ )
= 1
n fi (T i X−∞
i=1
−1
→ E{log b∗ω,ρ (X−ω ), X0 }
def
= θω,ρ almost surely.
lim inf Wn (B) ≥ sup θω,ρ ≥ sup lim inf θω,ρ almost surely,
n→∞ ω,ρ ω ρ
and
−1
μω (B) = P{X−ω ∈ B}.
Then, for any s ∈ support(μω ), and for all A,
∗(ω,ρ) −1 2
Ps (A) = P{X0 ∈ A|E{(X−ω − s) } ≤ 2Var(s)(1 − ρ)}
−1 2
P{X0 ∈A,E{(X−ω −s) }≤2Var(s)(1−ρ)}
= −1 2
P{E{(X−ω −s) }≤2Var(s)(1−ρ)}
= μω (Ss,2Var(s)(1−ρ) ) Ss,2Var(s)(1−ρ) mA (z)μω (dz)
1
−1
→ mA (s) = P{X0 ∈ A|X−ω = s}
as ρ → 1 and for μω , almost all s by Lebesgue density theorem (see Lemma B.4), and
therefore
∗(ω,ρ) −1
P −1 (A) → P{X0 ∈ A|X−ω }
X−ω
−1
of random variables forms a submartingale, that is, E{Yω+1 |Y−ω ≥ Yω }. To see this,
note that
−1 −1 −1 −1
E{Yω+1 |X−ω } = E{E{logb∗ω+1 (X−ω−1 ), X0 |X−ω−1 }|X−ω }
−1 −1 −1
≥ E{E{logb∗ω (X−ω ), X0 |X−ω−1 }|X−ω }
−1 −1
= E{logb∗ω (X−ω ), X0 |X−ω−1 }
= Yω .
which has a finite expectation. The submartingale convergence theorem (see Stout
1974) implies that a submartingale is convergence almost surely, and supω θ∗ω is finite.
In particular, by the submartingale property, θ∗ω is a bounded increasing sequence,
so that
sup θ∗ω = lim θ∗ω .
ω ω→∞
yields " #
−1 −1
sup θ∗ω = lim E max E{logb(X−ω ), X0 |X−ω }
ω→∞
ω " b(·) #
−1 −1
= E max E{logb(X−∞ ), X0 |X−∞ }
b(·)
= W ∗.
Then
lim inf Wn (B) ≥ sup θω,ρ ≥ sup lim inf θω,ρ = sup θ∗ω = W ∗ almost surely,
n→∞ ω,ρ ω ρ ω
and from the above three parts of proof, we can get that
1
lim log Sn (B) = W ∗ almost surely
n→∞ n
and the proof of Theorem B.1 is finished.
Proof First, if t = 0, then bt satisfies the constraint and is clearly the optimal
solution.
To solve the problem in case of t
= 0, we define the Lagrangian for the
optimization problem (9.2) as
1
L(b, τ, λ) = b − bt 2 + τ(xt · b − ) + λ(b · 1 − 1), (B.4)
2
where τ ≥ 0 is a Lagrange multiplier related to the loss function, λ is a Lagrange
multiplier associated with the simplex constraint, and 1 denotes a column vector of
m 1s. Note that the nonnegativity of portfolio b is not considered, since introducing
∂L
0= = (b − bt ) + τxt + λ1.
∂b
t ·1 t ·1
Multiplying both sides by 1 , we can get λ = −τ xm . Moreover, since x̄t = xm ,
where x̄t is the mean of t-th price relatives, or the market return, we can rewrite λ as
λ = −τx̄t . (B.5)
1 2
L(τ) = τ xt − x̄t 12 − τ2 xt · (xt − x̄t 1) + τ(bt · xt − )
2
1
= − τ2 xt − x̄t 12 + τ(bt · xt − ).
2
∂L
0= = −τxt − x̄t 12 + bt · xt − .
∂τ
Note that in case of zero market volatility, that is, xt − x̄t 12 = 0, we just set τ = 0.
We can summarize the update scheme for the case of t = 0 and the case of t > 0 by
setting τ. Thus, we simplify the notation following Equation 9.1 and show the unified
update scheme.
Proof We derive the solution of PAMR-1 following the same procedure as the
derivation of PAMR. If the loss is nonzero, we get a Lagrangian
1
L(b, ξ, τ, μ, λ) = b − bt 2 + τ(xt · b − ) + ξ(C − τ − μ) + λ(1 · b − 1).
2
Setting the partial derivatives of L with respect to b to zero gives
∂L
0= = (b − bt ) + τxt + λ1,
∂b
t ·1
Multiplying both sides by 1 , we can get λ = −τ xm = −τx̄t . And the solution is
b = bt − τ(xt − x̄t 1).
Next, note that the minimum of the term ξ(C − τ − μ) with respect to ξ is zero when-
ever C − τ − μ = 0. If C − τ − μ
= 0, then the minimum can be made to approach
−∞. Since we need to maximize the dual, we can rule out the latter case and pose
the following constraint on the dual variables, C − τ − μ = 0. The KKT conditions
confine μ to be nonnegative, so we conclude that τ ≤ C. We can project τ to the
interval [0, C] and get
" " ## " #
bt · xt − t
τ = max 0, min C, = min C, .
xt − x̄t 12 xt − x̄t 12
Again, we simplify the notation according to Equation 9.1 and show a unified update
scheme.
Proof We derive the solution similar to the derivations of PAMR and PAMR-1. In
case that the loss is not 0, we can get the Lagrangian,
1
L(b, ξ, τ, μ, λ) = b − bt 2 + τ(b · xt − ) + Cξ2 − τξ + λ(1 · b − 1).
2
Setting the partial derivatives of L with respect to b to zero gives
∂L
0= = (b − bt ) + τxt + λ1,
∂b
t ·1
Multiplying both sides by 1 , we can get λ = −τ xm = −τx̄. And the solution is
b = bt − τ(xt − x̄1).
Setting the partial derivatives of L with respect to ξ to zero gives
∂L τ
0= = 2Cξ − τ =⇒ ξ= .
∂ξ 2C
Proof Since considering the nonnegativity constraint introduces too much complex-
ity, first we relax the optimization problem without it, and later we project the solution
to the simplex domain to obtain the required portfolio.
The Lagrangian for the optimization problem (10.3) is
1 det t
L= log + Tr( −1
t ) + (μ t − μ) −1
t (μ t − μ)
2 det
+ λ(φxt xt + μ xt − ) + η(μ 1 − 1).
Taking the derivative of the Lagrangian with respect to μ and setting it to zero, we
can get the update of μ
∂L
0= = −1
t (μ − μt ) + λxt + η1 =⇒ μt+1 = μt − t (λxt + η1), (B.7)
∂μ
where t is assumed to be nonsingular. Multiplying both sides by 1 , we can get η
1 = 1 − 1 t (λxt + η1) =⇒ η = −λx̄t , (B.8)
where x̄t = 11t x1t denotes the confidence-weighted average of t-th price relatives.
t
Plugging Equation B.8 to Equation B.7, we can get
μt+1 = μt − λ t (xt − x̄t 1). (B.9)
Moreover, taking the derivative of the Lagrangian with respect to and setting it to
zero, we can have the update of :
∂L 1 1
0= = − −1 + −1 + λφxt xt =⇒ −1 −1
t+1 = t + 2λφxt xt .
∂ 2 2 t
(B.10)
Let Mt = μ
t xt be the return mean, Vt = xt t xt be the return variance of the t-th
trading period before updating, and Wt = xt t 1 be the return variance of the t-th
price relative with cash. We can simplify the preceding equation to
λ2 (2φVt2 − 2φx̄t Vt Wt ) + λ(2φVt − 2φVt Mt + Vt − x̄t Wt ) + ( − Mt − φVt ) = 0.
(B.12)
where diag(xt ) denotes a diagonal matrix with the elements of xt on its main diagonal.
Proof Similar to the proof of Proposition 10.1, we relax the optimization problem
without the nonnegativity constraint and project the solution to the simplex domain
to obtain the required portfolio.
Taking the derivative of the Lagrangian with respect to μ and setting it to zero, we
can get the update of μ,
∂L
0= = ϒ −2
t (μ − μt ) + λxt + η1 =⇒ μt+1 = μt − ϒ 2t (λxt + η1),
∂μ
Moreover, taking the derivative of the Lagrangian with respect to ϒ and setting it to
zero, we have
∂L 1 1 xt xt ϒ ϒxt xt
0= = −ϒ −1 + ϒ −2 ϒ + ϒϒ −2
+ λφ , + λφ , .
∂ϒ 2 t 2 t
2 xt ϒ 2 xt 2 xt ϒ 2 xt
xt xt
ϒ −2 −2
t+1 = ϒ t + λφ , .
xt ϒ 2t+1 xt
The preceding two updates can be expressed in terms of the covariance matrix,
xt xt
μt+1 = μt − λ t (xt − x̄t 1), −1 −1
t+1 = t + λφ , . (B.13)
xt t+1 xt
,
−λφVt + λ2 φ2 Vt2 + 4Vt
Ut = . (B.15)
2
The KKT condition implies that either λ = 0, and no update is needed; or the con-
straint in the optimization problem (10.4) is an equality after the update. Substituting
Equations B.13 and B.15 into the equality version of the constraint, after rearranging
in terms of λ, we get
2
φ2 Vt φ4 Vt2 φ2 Vt
λ 2
Vt − x̄t Wt + − +2λ( − Mt ) Vt − x̄t Wt +
2 4 2 (B.16)
+( − Mt )2 − φ2 Vt = 0.
2 φ4 V 2
Let a = Vt − x̄t Wt + φ 2Vt − 4 t , b = 2( − Mt ) Vt − x̄t Wt + φ 2Vt , and c =
2 2
( − Mt )2 − φ2 Vt . Note that we only consider real roots of the quadratic form equa-
tion. Thus, we can obtain γt as its roots (two real roots case: γt1 and γt2 ; one real
root case: γt3 ):
√ √
−b + b2 − 4ac −b − b2 − 4ac c
γt1 = , γt2 = or γt3 = − .
2a 2a b
To ensure the nonnegativity of the Lagrangian multiplier, we project the roots to
[0, +∞):
which corresponds to three cases (two, one, or zero real roots), respectively.
Following the Proof of Proposition 10.1, we can update the diagonal covariance
matrix as
φ
−1 −1
t+1 = t + λ √ diag2 (xt ),
Ut
where diag(xt ) denotes the diagonal matrix with the elements of xt on its main
diagonal.
1 = 1 + λx̃t+1 · 1 − ηm =⇒ η = λx̄t+1 ,
where x̄t+1 denotes the average predicted price relative (market). Plugging the above
equation to the update of b, we get the update of b,
To solve the Lagrangian multiplier, let us plug the above equation to the Lagrangian,
1
L(λ) =λ( − bt · x̃t+1 ) − λ2 x̃t+1 − x̄t+1 12
2
Taking derivative with respect to λ and setting it to zero, we get
∂L − bt · x̃t+1
0= =( − bt · x̃t+1 ) − λx̃t+1 − x̄t+1 12 =⇒ λ= .
∂λ x̃t+1 − x̄t+1 12
Further projecting λ to [0, +∞), we get λ = max 0, x̃ −b−t x̄·x̃t+112 .
t+1 t+1
This section provides some supplementary data and portfolio statistics, which mainly
complement the observations in Section 13.6.
Similar to Table 13.5, Tables C.1, C.2, C.3, and C.4 show some descriptive
statistics on the NYSE (N) dataset, SP500 dataset, MSCI dataset, and DJIA dataset,
respectively.
Table C.5 illustrates the top five average allocation weights of the proposed strate-
gies on the datasets except NYSE (O). Connecting these weights with the descriptive
statistics, we can have similar observations as that of NYSE (O). That is, the pro-
posed algorithms put more weights on the volatile assets so as to exploit the volatility
of the assets, and most of the top average allocation weights of the mean reversion
algorithms are assets with negative autocorrelations, except DJIA.
187
Cum 0.8738
Mean 1.0002
Std 0.0242
Ac 0.0299
NYSE (N)
Asset # 16 10 5 17 11 Asset # 13 4 16 23 20
Anticor 0.15 0.08 0.06 0.06 0.06 CORN 0.15 0.15 0.15 0.07 0.05
Asset # 16 11 5 10 22 Asset # 16 11 10 22 5
PAMR 0.18 0.07 0.06 0.06 0.06 OLMAR 0.18 0.07 0.07 0.06 0.06
TSE
Asset # 18 24 71 79 74 Asset # 51 87 25 71 18
Anticor 0.09 0.08 0.07 0.06 0.06 CORN 0.14 0.13 0.12 0.10 0.05
Asset # 24 18 71 79 74 Asset # 24 18 71 79 32
PAMR 0.12 0.08 0.07 0.04 0.04 OLMAR 0.10 0.09 0.08 0.05 0.04
SP500
Asset # 15 12 19 25 21 Asset # 19 24 15 18 8
Anticor 0.09 0.08 0.07 0.06 0.05 CORN 0.20 0.14 0.14 0.13 0.08
Asset # 15 19 12 18 24 Asset # 15 12 19 24 18
PAMR 0.10 0.08 0.07 0.07 0.06 OLMAR 0.09 0.08 0.07 0.06 0.06
MSCI
Asset # 9 2 7 1 20 Asset # 2 24 1 7 6
Anticor 0.17 0.16 0.14 0.07 0.06 CORN 0.15 0.14 0.11 0.10 0.08
Asset # 24 14 9 16 10 Asset # 14 16 24 9 10
PAMR 0.11 0.10 0.09 0.09 0.07 OLMAR 0.12 0.10 0.09 0.08 0.08
DJIA
Asset # 16 18 26 14 7 Asset # 19 24 15 18 8
Anticor 0.08 0.07 0.07 0.06 0.05 CORN 0.20 0.14 0.14 0.13 0.08
193
Li and Hoi
With the aim to sequentially determine optimal allocations across a set of
assets, Online Portfolio Selection (OLPS) has significantly reshaped the
Selection
financial investment landscape. Online Portfolio Selection: Principles
and Algorithms supplies a comprehensive survey of existing OLPS
principles and presents a collection of innovative strategies that leverage
machine learning techniques for financial investment.
The book presents four new algorithms based on machine learning Principles and Algorithms
techniques that were designed by the authors, as well as a new back-
K23731









