This tutorial provides a gentle introduction to the particle
Metropolis-Hastings (PMH) algorithm for parameter inference in nonlinear
state-space models together with a software implementation in the statistical
programming language R. We employ a step-by-step approach to develop an
implementation of the PMH algorithm (and the particle filter within) together
with the reader. This final implementation is also available as the package
pmhtutorial in the CRAN repository. Throughout the tutorial, we provide some
intuition as to how the algorithm operates and discuss some solutions to
problems that might occur in practice. To illustrate the use of PMH, we
consider parameter inference in a linear Gaussian state-space model with
synthetic data and a nonlinear stochastic volatility model with real-world
We review the state of the art of clustering financial time series and the
study of their correlations alongside other interaction networks. The aim of
this review is to gather in one place the relevant material from different
fields, e.g. machine learning, information geometry, econophysics, statistical
physics, econometrics, behavioral finance. We hope it will help researchers to
use more effectively this alternative modeling of the financial time series.
Decision makers and quantitative researchers may also be able to leverage its
insights. Finally, we also hope that this review will form the basis of an open
toolbox to study correlations, hierarchies, networks and clustering in
financial markets.
In this paper, we build tests for the presence of residual noise in a model
where the market microstructure noise is a known parametric function of some
variables from the limit order book. The tests compare two distinct
quasi-maximum likelihood estimators of volatility, where the related model
includes a residual noise in the market microstructure noise or not. The limit
theory is investigated in a general nonparametric framework. In the presence of
residual noise, we examine the central limit theory of the related
quasi-maximum likelihood estimation approach.
We give complete algorithms and source code for constructing (multilevel)
statistical industry classifications, including methods for fixing the number
of clusters at each level (and the number of levels). Under the hood there are
clustering algorithms (e.g., k-means). However, what should we cluster?
Correlations? Returns? The answer turns out to be neither and our backtests
suggest that these details make a sizable difference. We also give an algorithm
and source code for building "hybrid" industry classifications by improving
off-the-shelf "fundamental" industry classifications by applying our
statistical industry classification methods to them. The presentation is
intended to be pedagogical and geared toward practical applications in
quantitative trading.
In this paper, we introduce quantile coherency to measure general dependence
structures emerging in the joint distribution in the frequency domain and argue
that this type of dependence is natural for economic time series but remains
invisible when only the traditional analysis is employed. We define estimators
which capture the general dependence structure, provide a detailed analysis of
their asymptotic properties and discuss how to conduct inference for a general
class of possibly nonlinear processes. In an empirical illustration we examine
the dependence of bivariate stock market returns and shed new light on
measurement of tail risk in financial markets. We also provide a modelling
exercise to illustrate how applied researchers can benefit from using quantile
coherency when assessing time series models.
In a very high-dimensional vector space, two randomly-chosen vectors are
almost orthogonal with high probability. Starting from this observation, we
develop a statistical factor model, the random factor model, in which factors
are chosen at random based on the random projection method. Randomness of
factors has the consequence that covariance matrix is well preserved in a
linear factor representation. It also enables derivation of probabilistic
bounds for the accuracy of the random factor representation of time-series,
their cross-correlations and covariances. As an application, we analyze
reproduction of time-series and their cross-correlation coefficients in the
well-diversified Russell 3,000 equity index.
We examine the effect of investor attention spillover on stock return
predictability. Using a novel measure, the News Network Triggered Attention
index (NNTA), we find that NNTA negatively predicts market returns with a
monthly in(out)-of-sample R-square of 5.97% (5.80%). In the cross-section, a
long-short portfolio based on news co-occurrence generates a significant
monthly alpha of 68 basis points. The results are robust to the inclusion of
alternative attention proxies, sentiment measures, other news- and
information-based predictors, across recession and expansion periods. We
further validate the attention spillover effect by showing that news
co-mentioning leads to greater increases in Google and Bloomberg search volumes
than unconditional news coverage. Our findings suggest that attention spillover
in a news-based network can lead to significant stock market overvaluations,
and especially when arbitrage is limited.
Precise financial series predicting has long been a difficult problem because
of unstableness and many noises within the series. Although Traditional time
series models like ARIMA and GARCH have been researched and proved to be
effective in predicting, their performances are still far from satisfying.
Machine Learning, as an emerging research field in recent years, has brought
about many incredible improvements in tasks such as regressing and classifying,
and it's also promising to exploit the methodology in financial time series
predicting. In this paper, the predicting precision of financial time series
between traditional time series models and mainstream machine learning models
including some state-of-the-art ones of deep learning are compared through
experiment using real stock index data from history. The result shows that
machine learning as a modern method far surpasses traditional models in
A fundamental problem in studying and modeling economic and financial systems
is represented by privacy issues, which put severe limitations on the amount of
accessible information. Here we introduce a novel, highly nontrivial method to
reconstruct the structural properties of complex weighted networks of this kind
using only partial information: the total number of nodes and links, and the
values of the strength for all nodes. The latter are used as fitness to
estimate the unknown node degrees through a standard configuration model. Then,
these estimated degrees and the strengths are used to calibrate an enhanced
configuration model in order to generate ensembles of networks intended to
represent the real system. The method, which is tested on real economic and
financial networks, while drastically reducing the amount of information needed
to infer network properties, turns out to be remarkably effective$-$thus
representing a valuable tool for gaining insights on privacy-protected
socioeconomic systems.
We propose a novel framework to investigate lead-lag relationships between
two financial assets. Our framework bridges a gap between continuous-time
modeling based on Brownian motion and the existing wavelet methods for lead-lag
analysis based on discrete-time models and enables us to analyze the
multi-scale structure of lead-lag effects. We also present a statistical
methodology for the scale-by-scale analysis of lead-lag effects in the proposed
framework and develop an asymptotic theory applicable to a situation including
stochastic volatilities and irregular sampling. Finally, we report several
numerical experiments to demonstrate how our framework works in practice.
In this paper, we provide non-parametric statistical tools to test
stationarity of microstructure noise in general hidden Ito semimartingales, and
discuss how to measure liquidity risk using high frequency financial data. In
particular, we investigate the impact of non-stationary microstructure noise on
some volatility estimators, and design three complementary tests by exploiting
edge effects, information aggregation of local estimates and high-frequency
asymptotic approximation. The asymptotic distributions of these tests are
available under both stationary and non-stationary assumptions, thereby enable
us to conservatively control type-I errors and meanwhile ensure the proposed
tests enjoy the asymptotically optimal statistical power. Besides it also
enables us to empirically measure aggregate liquidity risks by these test
statistics. As byproducts, functional dependence and endogenous microstructure
noise are briefly discussed. Simulation with a realistic configuration
corroborates our theoretical results, and our empirical study indicates the
prevalence of non-stationary microstructure noise in New York Stock Exchange.
We conduct an extensive evaluation of price jump tests based on
high-frequency financial data. After providing a concise review of multiple
alternative tests, we document the size and power of all tests in a range of
empirically relevant scenarios. Particular focus is given to the robustness of
test performance to the presence of jumps in volatility and microstructure
noise, and to the impact of sampling frequency. The paper concludes by
providing guidelines for empirical researchers about which test to choose in
any given setting.
In this paper, we give a general time-varying parameter model, where the
multidimensional parameter possibly includes jumps. The quantity of interest is
defined as the integrated value over time of the parameter process $\Theta =
T^{-1} \int_0^T \theta_t^* dt$. We provide a local parametric estimator (LPE)
of $\Theta$ and conditions under which we can show the central limit theorem.
Roughly speaking those conditions correspond to some uniform limit theory in
the parametric version of the problem. The framework is restricted to the
specific convergence rate $n^{1/2}$. Several examples of LPE are studied:
estimation of volatility, powers of volatility, volatility when incorporating
trading information and time-varying MA(1).
We prove strong consistency and asymptotic normality of least squares
estimators for the subcritical Heston model based on continuous time
observations. We also present some numerical illustrations of our results.
Researchers developed the Economic Complexity Index (ECI) as a measure of the
overall sophistication of a country's products. They argued that this measure
explains economic growth better than the conventional variables such as human
capital. This paper suggests a simpler measure of production complexity, the
logarithm of product diversification, which has a natural foundation in
information theory: it measures the information needed to encode the knowledge
required to make a country's products. This measure explains well the income
differences between countries. It has a basic link with ECI that is strongly
supported by the data.
In this paper we estimate the mean-variance (MV) portfolio in the
high-dimensional case using the recent results from the theory of random
matrices. We construct a linear shrinkage estimator which is distribution-free
and is optimal in the sense of maximizing with probability $1$ the asymptotic
out-of-sample expected utility, i.e., mean-variance objective function for
several values of risk aversion coefficient which in particular leads to the
maximization of the out-of sample expected utility, to the maximization of the
out-of-sample Sharpe ratio, and to the minimization of the out-of-sample
variance. Its asymptotic properties are investigated when the number of assets
$p$ together with the sample size $n$ tend to infinity such that $p/n
\rightarrow c\in (0,+\infty)$. The results are obtained under weak assumptions
imposed on the distribution of the asset returns, namely the existence of the
fourth moments is only required. Thereafter we perform numerical and empirical
studies where the small- and large-sample behavior of the derived estimator is
investigated. The suggested estimator shows significant improvements over the
naive diversification and it is robust to the deviations from normality.
In this paper we derive the optimal linear shrinkage estimator for the
high-dimensional mean vector using random matrix theory. The results are
obtained under the assumption that both the dimension $p$ and the sample size
$n$ tend to infinity in such a way that $p/n \to c\in(0,\infty)$. Under weak
conditions imposed on the underlying data generating mechanism, we find the
asymptotic equivalents to the optimal shrinkage intensities and estimate them
consistently. The proposed nonparametric estimator for the high-dimensional
mean vector has a simple structure and is proven to minimize asymptotically,
with probability $1$, the quadratic loss when $c\in(0,1)$. When $c\in(1,
\infty)$ we modify the estimator by using a feasible estimator for the
precision covariance matrix. To this end, an exhaustive simulation study and an
application to real data are provided where the proposed estimator is compared
with known benchmarks from the literature. It turns out that the existing
estimators of the mean vector, including the new proposal, converge to the
sample mean vector when the true mean vector has an unbounded Euclidean norm.
Financial markets are notoriously complex environments, presenting vast
amounts of noisy, yet potentially informative data. We consider the problem of
forecasting financial time series from a wide range of information sources
using online Gaussian Processes with Automatic Relevance Determination (ARD)
kernels. We measure the performance gain, quantified in terms of Normalised
Root Mean Square Error (NRMSE), Median Absolute Deviation (MAD) and Pearson
correlation, from fusing each of four separate data domains: time series
technicals, sentiment analysis, options market data and broker recommendations.
We show evidence that ARD kernels produce meaningful feature rankings that help
retain salient inputs and reduce input dimensionality, providing a framework
for sifting through financial complexity. We measure the performance gain from
fusing each domain's heterogeneous data streams into a single probabilistic
model. In particular our findings highlight the critical value of options data
in mapping out the curvature of price space and inspire an intuitive, novel
direction for research in financial prediction.
This paper shows how to carry out efficient asymptotic variance reduction
when estimating volatility in the presence of stochastic volatility and
microstructure noise with the realized kernels (RK) from [Barndorff-Nielsen et
al., 2008] and the quasi-maximum likelihood estimator (QMLE) studied in [Xiu,
2010]. To obtain such a reduction, we chop the data into B blocks, compute the
RK (or QMLE) on each block, and aggregate the block estimates. The ratio of
asymptotic variance over the bound of asymptotic efficiency converges as B
increases to the ratio in the parametric version of the problem, i.e. 1.0025 in
the case of the fastest RK Tukey-Hanning 16 and 1 for the QMLE. The impact of
stochastic sampling times and jump in the price process is examined carefully.
The finite sample performance of both estimators is investigated in
simulations, while empirical work illustrates the gain in practice.
Identifying behavior that is relatively invariant under different conditions
is a challenging task in far-from-equilibrium complex systems. As an example of
how the existence of a semi-invariant signature can be masked by the
heterogeneity in the properties of the components comprising such systems, we
consider the exchange rate dynamics in the international currency market. We
show that the exponents characterizing the heavy tails of fluctuation
distributions for different currencies systematically diverge from a putative
universal form associated with the median value (~2) of the exponents. We
relate the degree of deviation of a particular currency from such an "inverse
square law" to fundamental macroscopic properties of the corresponding economy,
viz., measures of per capita production output and diversity of export
products. We also show that in contrast to uncorrelated random walks exhibited
by the exchange rate dynamics for currencies belonging to developed economies,
those of the less developed economies show characteristics of sub-diffusive
processes which we relate to the anti-correlated nature of the corresponding
fluctuations. Approaches similar to that presented here may help in identifying
invariant features obscured by the heterogeneous nature of components in other
complex systems.
Financial markets provide a natural quantitative lab for understanding some
of the most advanced human behaviours. Among them is the use of mathematical
tools known as financial instruments. Besides money, the two most fundamental
financial instruments are bonds and equities. More than 30 years ago Mehra and
Prescott found the numerical performance of equities relative to government
bonds could not be explained by consumption-based (mainstream) economic
theories. This empirical observation, known as the Equity Premium Puzzle, has
been defying mainstream economics ever since. The recent financial crisis
revealed an even deeper need for understanding financial products. We show how
understanding the rational nature of product design resolves the Equity Premium
Puzzle. In doing so we obtain an experimentally tested theory of product
Using 1-min returns of Bitcoin prices, we investigate statistical properties
and multifractality of a Bitcoin time series. We find that the 1-min return
distribution is fat-tailed, and kurtosis largely deviates from the Gaussian
expectation. Although for large sampling periods, kurtosis is anticipated to
approach the Gaussian expectation, we find that convergence to that is very
slow. Skewness is found to be negative at time scales shorter than one day and
becomes consistent with zero at time scales longer than about one week. We also
investigate daily volatility-asymmetry by using GARCH, GJR, and RGARCH models,
and find no evidence of it. On exploring multifractality using multifractal
detrended fluctuation analysis, we find that the Bitcoin time series exhibits
multifractality. The sources of multifractality are investigated, confirming
that both temporal correlation and the fat-tailed distribution contribute to
it. The influence of "Brexit" on June 23, 2016 to GBP--USD exchange rate and
Bitcoin is examined in multifractal properties. We find that, while Brexit
influenced the GBP--USD exchange rate, Bitcoin was robust to Brexit.
We study asymptotic properties of maximum likelihood estimators of drift
parameters for a jump-type Heston model based on continuous time observations,
where the jump process can be any purely non-Gaussian L\'evy process of not
necessarily bounded variation with a L\'evy measure concentrated on
$(-1,\infty)$. We prove strong consistency and asymptotic normality for all
admissible parameter values except one, where we show only weak consistency and
mixed normal (but non-normal) asymptotic behavior. It turns out that the
volatility of the price process is a measurable function of the price process.
We also present some numerical illustrations to confirm our results.
The liberalization of electricity markets and the development of renewable
energy sources has led to new challenges for decision makers. These challenges
are accompanied by an increasing uncertainty about future electricity price
movements. The increasing amount of papers, which aim to model and predict
electricity prices for a short period of time provided new opportunities for
market participants. However, the electricity price literature seem to be very
scarce on the issue of medium- to long-term price forecasting, which is
mandatory for investment and political decisions. Our paper closes this gap by
introducing a new approach to simulate electricity prices with hourly
resolution for several months up to three years. Considering the uncertainty of
future events we are able to provide probabilistic forecasts which are able to
detect probabilities for price spikes even in the long-run. As market we
decided to use the EPEX day-ahead electricity market for Germany and Austria.
Our model extends the X-Model which mainly utilizes the sale and purchase curve
for electricity day-ahead auctions. By applying our procedure we are able to
give probabilities for the due to the EEG practical relevant event of six
consecutive hours of negative prices. We find that using the supply and demand
curve based model in the long-run yields realistic patterns for the time series
of electricity prices and leads to promising results considering common error
To understand the relationship between news sentiment and company stock price
movements, and to better understand connectivity among companies, we define an
algorithm for measuring sentiment-based network risk. The algorithm ranks
companies in networks of co-occurrences, and measures sentiment-based risk, by
calculating both individual risks and aggregated network risks. We extract
relative sentiment for companies to get a measure of individual company risk,
and input it into our risk model together with co-occurrences of companies
extracted from news on a quarterly basis. We can show that the highest
quarterly risk value outputted by our risk model, is correlated to a higher
chance of stock price decline, up to 70 days after a risk measurement. Our
results show that the highest difference in the probability of stock price
decline, compared to the benchmark containing all risk values for the same
period, is during the interval from 21 to 30 days after a quarterly
measurement. The highest average probability of company stock price decline, is
found at a delay of 28 days, after a company has reached its maximum risk
value. The highest probability differences for a daily decline were calculated
to be 13 percentage points.