Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Added
Creator
Arxiv.org
Jun 30, 2018 Matthias Kümmerer; Thomas S. A. Wallis; Matthias Bethge
texts

eye 13

favorite 0

comment 0

The field of fixation prediction is heavily model-driven, with dozens of new models published every year. However, progress in the field can be difficult to judge because models are compared using a variety of inconsistent metrics. As soon as a saliency map is optimized for a certain metric, it is penalized by other metrics. Here we propose a principled approach to solve the benchmarking problem: we separate the notions of saliency models and saliency maps. We define a saliency model to be a...
Topics: Statistics, Applications, Computing Research Repository, Computer Vision and Pattern Recognition
Source: http://arxiv.org/abs/1704.08615
Arxiv.org
Jun 30, 2018 Stefano Beretta; Mauro Castelli; Ivo Goncalves; Daniele Ramazzotti
texts

eye 18

favorite 0

comment 0

One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions and turned out to be a well-known NP-hard problem and, hence, approximations are required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we...
Topics: Learning, Machine Learning, Statistics, Artificial Intelligence, Computing Research Repository
Source: http://arxiv.org/abs/1704.08676
Arxiv.org
Jun 30, 2018 Kevin Yang
texts

eye 12

favorite 0

comment 0

This paper is the first chapter of three of the author's undergraduate thesis. We study the random matrix ensemble of covariance matrices arising from random $(d_b, d_w)$-regular bipartite graphs on a set of $M$ black vertices and $N$ white vertices, for $d_b \gg \log^4 N$. We simultaneously prove that the Green's functions of these covariance matrices and the adjacency matrices of the underlying graphs agree with the corresponding limiting law (e.g. Marchenko-Pastur law for covariance...
Topics: Probability, Statistics Theory, Statistics, Combinatorics, Mathematics
Source: http://arxiv.org/abs/1704.08672
Arxiv.org
Jun 30, 2018 Linda Mhalla; Miguel de Carvalho; Valérie Chavez-Demoulin
texts

eye 11

favorite 0

comment 0

We propose a vector generalized additive modeling framework for taking into account the effect of covariates on angular density functions in a multivariate extreme value context. The proposed methods are tailored for settings where the dependence between extreme values may change according to covariates. We devise a maximum penalized log-likelihood estimator, discuss details of the estimation procedure, and derive its consistency and asymptotic normality. The simulation study suggests that the...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08447
Arxiv.org
Jun 30, 2018 Ben Athiwaratkun; Andrew Gordon Wilson
texts

eye 18

favorite 0

comment 0

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on...
Topics: Learning, Computing Research Repository, Machine Learning, Computation and Language, Statistics,...
Source: http://arxiv.org/abs/1704.08424
Arxiv.org
Jun 30, 2018 K. P. Harikrishnan; Rinku Jacob; R. Misra; G. Ambika
texts

eye 6

favorite 0

comment 0

The analysis of observed time series from nonlinear systems is usually done by making a time-delay reconstruction to unfold the dynamics on a multi-dimensional state space. An important aspect of the analysis is the choice of the correct embedding dimension. The conventional procedure used for this is either the method of false nearest neighbors or the saturation of some invariant measure, such as, correlation dimension. Here we examine this issue from a complex network perspective and propose...
Topics: Physics, Neurons and Cognition, Data Analysis, Statistics and Probability, Nonlinear Sciences,...
Source: http://arxiv.org/abs/1704.08585
Arxiv.org
texts

eye 8

favorite 0

comment 0

The paper presents a critical introduction to the complex statistical models used in ${}^{14}$C dating. The emphasis is on the estimation of the transit time between a sequence of archeological layers. Although a frequentist estimation of the parameters is relatively simple, confidence intervals constructions are not standard as the models are not regular. I argue that that the Bayesian paradigm is a natural approach to these models. It is simple, and gives immediate solutions to credible sets,...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.08479
Arxiv.org
Jun 30, 2018 Qingyang Li; Dajiang Zhu; Jie Zhang; Derrek Paul Hibar; Neda Jahanshad; Yalin Wang; Jieping Ye; Paul M. Thompson; Jie Wang
texts

eye 6

favorite 0

comment 0

Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.08383
Arxiv.org
Jun 30, 2018 Dieter Hendricks; Stephen J. Roberts
texts

eye 6

favorite 0

comment 0

The process of liquidity provision in financial markets can result in prolonged exposure to illiquid instruments for market makers. In this case, where a proprietary position is not desired, pro-actively targeting the right client who is likely to be interested can be an effective means to offset this position, rather than relying on commensurate interest arising through natural demand. In this paper, we consider the inference of a client profile for the purpose of corporate bond...
Topics: Quantitative Finance, Learning, Computing Research Repository, Machine Learning, Statistics,...
Source: http://arxiv.org/abs/1704.08488
Arxiv.org
Jun 30, 2018 I. Eghdami; H. Panahi; S. M. S. Movahed
texts

eye 9

favorite 0

comment 0

Relying on multifractal behavior of pulsar timing residuals ({\it PTR}s), we examine the capability of Multifractal Detrended Fluctuation Analysis (MF-DFA) and Multifractal Detrending Moving Average Analysis (MF-DMA) modified by Singular Value Decomposition (SVD) and Adaptive Detrending (AD), to detect footprint of gravitational waves (GWs) superimposed on {\it PTR}s. Mentioned methods enable us to clarify the type of GWs which is related to the value of Hurst exponent. We introduce three...
Topics: Physics, Astrophysics, Data Analysis, Statistics and Probability, Solar and Stellar Astrophysics,...
Source: http://arxiv.org/abs/1704.08599
Arxiv.org
Jun 30, 2018 Shonosuke Sugasawa
texts

eye 10

favorite 0

comment 0

Parametric empirical Bayes (EB) estimators have been widely used in variety of fields including small area estimation, disease mapping. Since EB estimator is constructed by plugging in the estimator of parameters in prior distributions, it might perform poorly if the estimator of parameters is unstable. This can happen when the number of samples are small or moderate. This paper suggests bootstrapping averaging approach, known as "bagging" in machine learning literatures, to improve...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08440
Arxiv.org
Jun 30, 2018 Sreelakshmi N; Sudheesh K Kattumannil
texts

eye 4

favorite 0

comment 0

Several attempts were made in the literature to generalize univariate reliability concepts to bivariate as well as multivariate set up. Here we extend the univariate quantile based reliability concepts to bivariate case based on quantile curves. We propose quantile curves based bivariate hazard rate and bivariate mean residual life function and study their uniqueness properties to determine the underlying quantile curve. A relationship between them is also derived.
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08444
Arxiv.org
Jun 30, 2018 Robert J. Adler; Kevin Bartz; Sam C. Kou; Anthea Monod
texts

eye 7

favorite 0

comment 0

We introduce Lipschitz-Killing curvature (LKC) regression, a new method to produce $(1-\alpha)$ thresholds for signal detection in random fields that does not require knowledge of the spatial correlation structure. The idea is to fit observed empirical Euler characteristics to the Gaussian kinematic formula via generalized least squares, which quickly and easily provides statistical estimates of the LKCs --- complex topological quantities that can be extremely challenging to compute, both...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.08562
Arxiv.org
Jun 30, 2018 Jiayi Hou; Anthony Paravati; Ronghui Xu; James Murphy
texts

eye 10

favorite 0

comment 0

Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include 1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and 2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07989
Arxiv.org
Jun 30, 2018 Tobias Glasmachers
texts

eye 4

favorite 0

comment 0

End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.08305
Arxiv.org
Jun 30, 2018 Robert J. Adler; Sarit Agami; Pratyush Pranav
texts

eye 5

favorite 0

comment 0

Under the banner of `Big Data', the detection and classification of structure in extremely large, high dimensional, data sets, is, one of the central statistical challenges of our times. Among the most intriguing approaches to this challenge is `TDA', or `Topological Data Analysis', one of the primary aims of which is providing non-metric, but topologically informative, pre-analyses of data sets which make later, more quantitative analyses feasible. While TDA rests on strong mathematical...
Topics: Other Statistics, Statistics, Applications, Methodology
Source: http://arxiv.org/abs/1704.08248
Arxiv.org
Jun 30, 2018 M. P. Wallace; E. E. M. Moodie; D. A. Stephens
texts

eye 6

favorite 0

comment 0

Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extending G-estimation to the case of non-additive effects, non-continuous outcomes or on model...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08229
Arxiv.org
Jun 30, 2018 Tejal Bhamre; Teng Zhang; Amit Singer
texts

eye 4

favorite 0

comment 0

The missing phase problem in X-ray crystallography is commonly solved using the technique of molecular replacement, which borrows phases from a previously solved homologous structure, and appends them to the measured Fourier magnitudes of the diffraction patterns of the unknown structure. More recently, molecular replacement has been proposed for solving the missing orthogonal matrices problem arising in Kam's autocorrelation analysis for single particle reconstruction using X-ray free electron...
Topics: Computer Vision and Pattern Recognition, Biomolecules, Computing Research Repository, Quantitative...
Source: http://arxiv.org/abs/1704.07969
Arxiv.org
Jun 30, 2018 Gianna Vivaldo; Elisa Masi; Cosimo Taiti; Guido Caldarelli; Stefano Mancuso
texts

eye 9

favorite 0

comment 0

Plants emission of volatile organic compounds (VOCs) is involved in a wide class of ecological functions, as VOCs play a crucial role in plants interactions with biotic and abiotic factors. Accordingly, they vary widely across species and underpin differences in ecological strategy. In this paper, VOCs spontaneously emitted by 109 plant species (belonging to 56 different families) have been qualitatively and quantitatively analysed in order to classify plants species. By using bipartite...
Topics: Physics, Quantitative Biology, Data Analysis, Statistics and Probability, Quantitative Methods
Source: http://arxiv.org/abs/1704.08062
Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.08067
Arxiv.org
Jun 30, 2018 Steven P. Lund; Hari K. Iyer
texts

eye 10

favorite 0

comment 0

The forensic science community has increasingly sought quantitative methods for conveying the weight of evidence. Experts from many forensic laboratories summarize their findings in terms of a likelihood ratio. Several proponents of this approach have argued that Bayesian reasoning proves it to be normative. We find this likelihood ratio paradigm to be unsupported by arguments of Bayesian decision theory, which applies only to personal decision making and not to the transfer of information from...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.08275
Arxiv.org
Jun 30, 2018 Rafael Izbicki; Ann B. Lee
texts

eye 6

favorite 0

comment 0

There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08095
Arxiv.org
Jun 30, 2018 Sarah Fletcher Mercaldo; Jeffrey D. Blume
texts

eye 3

favorite 0

comment 0

Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern mixture kernel submodels (PMKS) - a series of submodels for every missing data pattern that are fit using only data from that pattern - are a computationally efficient remedy for both stages. Here we show that PMKS yield the most predictive algorithm among all standard missing data strategies. Specifically, we show that the expected loss of a forecasting algorithm is minimized when...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.08192
Arxiv.org
Jun 30, 2018 Chunxia Zhang; Yilei Wu; Mu Zhu
texts

eye 7

favorite 0

comment 0

In the context of variable selection, ensemble learning has gained increasing interest due to its great potential to improve selection accuracy and to reduce false discovery rate. A novel ordering-based selective ensemble learning strategy is designed in this paper to obtain smaller but more accurate ensembles. In particular, a greedy sorting strategy is proposed to rearrange the order by which the members are included into the integration process. Through stopping the fusion process early, a...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.08265
Arxiv.org
Jun 30, 2018 Adel Javanmard; Jason D. Lee
texts

eye 14

favorite 0

comment 0

Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$) and assume that the high-dimensional parameters vector is $s_0$ sparse. We develop a general and flexible $\ell_\infty$ projection statistic for hypothesis testing in this model. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal...
Topics: Learning, Statistics Theory, Computing Research Repository, Machine Learning, Applications,...
Source: http://arxiv.org/abs/1704.07971
Arxiv.org
Jun 30, 2018 Martin Saavedra; Tate Twinam
texts

eye 3

favorite 0

comment 0

Historical studies of labor market outcomes frequently suffer from a lack of data on individual income. The occupational income score (OCCSCORE) is often used as an alternative measure of labor market outcomes, particularly in studies of the U.S. prior to 1950. While researchers have acknowledged that this approach introduces measurement error, no effort has been made to quantify its impact on inferences. Using modern Census data, we find that the use of OCCSCORE biases results towards zero and...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.08299
Arxiv.org
Jun 30, 2018 Prateek Jain; Sham M. Kakade; Rahul Kidambi; Praneeth Netrapalli; Aaron Sidford
texts

eye 10

favorite 0

comment 0

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the...
Topics: Learning, Optimization and Control, Statistics Theory, Computing Research Repository, Machine...
Source: http://arxiv.org/abs/1704.08227
Arxiv.org
Jun 30, 2018 Yoshimasa Uematsu; Yingying Fan; Kun Chen; Jinchi Lv; Wei Lin
texts

eye 3

favorite 0

comment 0

Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via...
Topics: Machine Learning, Statistics, Methodology
Source: http://arxiv.org/abs/1704.08349
Arxiv.org
Jun 30, 2018 Yotam Hechtlinger; Purvasha Chakravarti; Jining Qin
texts

eye 100

favorite 0

comment 0

This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure....
Topics: Computer Vision and Pattern Recognition, Learning, Computing Research Repository, Machine Learning,...
Source: http://arxiv.org/abs/1704.08165
Arxiv.org
Jun 30, 2018 Jason M. Klusowski; Dana Yang; W. D. Brinda
texts

eye 11

favorite 0

comment 0

We give convergence guarantees for estimating the coefficients of a symmetric mixture of two linear regressions by expectation maximization (EM). In particular, if the initializer has a large cosine angle with the population coefficient vector and the signal to noise ratio (SNR) is large, a sample-splitting version of the EM algorithm converges to the true coefficient vector with high probability. Here "large" means that each quantity is required to be at least a universal constant....
Topics: Machine Learning, Statistics
Source: http://arxiv.org/abs/1704.08231
Arxiv.org
Jun 30, 2018 Michael R. Smith; Aaron J. Hill; Kristofor D. Carlson; Craig M. Vineyard; Jonathon Donaldson; David R. Follett; Pamela L. Follett; John H. Naegle; Conrad D. James; James B. Aimone
texts

eye 5

favorite 1

comment 0

Information in neural networks is represented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector-matrix multiply when inputs are multiplied by the neural network weights. Conventional processing architectures are not well suited for simulating neural networks, often requiring large amounts of energy and time. Additionally, synapses in biological neural networks are not binary connections, but...
Topics: Neurons and Cognition, Computing Research Repository, Machine Learning, Quantitative Biology,...
Source: http://arxiv.org/abs/1704.08306
Arxiv.org
Jun 30, 2018 Taku Moriyama; Yoshihiko Maesono
texts

eye 8

favorite 0

comment 0

We propose new smoothed median and the Wilcoxon's rank sum test. As is pointed out by Maesono et al.(2016), some nonparametric discrete tests have a problem with their significance probability. Because of this problem, the selection of the median and the Wilcoxon's test can be biased too, however, we show new smoothed tests are free from the problem. Significance probabilities and local asymptotic powers of the new tests are studied, and we show that they inherit good properties of the discrete...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.07977
Arxiv.org
Jun 30, 2018 Matias D. Cattaneo; Michael Jansson; Kenichi Nagasawa
texts

eye 5

favorite 0

comment 0

This note proposes a consistent bootstrap-based distributional approximation for cube root consistent estimators such as the maximum score estimator of Manski (1975) and the isotonic density estimator of Grenander (1956). In both cases, the standard nonparametric bootstrap is known to be inconsistent. Our method restores consistency of the nonparametric bootstrap by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification...
Topics: Statistics Theory, Statistics, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.08066
Arxiv.org
Jun 30, 2018 Maxim Rabinovich; Mitchell Stern; Dan Klein
texts

eye 9

favorite 0

comment 0

Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs are represented as abstract syntax trees (ASTs) and constructed by a decoder with a dynamically-determined modular structure paralleling the structure of the output tree. On the benchmark Hearthstone dataset for code generation, our model obtains 79.2 BLEU and...
Topics: Learning, Computing Research Repository, Machine Learning, Computation and Language, Statistics,...
Source: http://arxiv.org/abs/1704.07535
Arxiv.org
texts

eye 80

favorite 0

comment 0

This paper considers the problem of decentralized optimization with a composite objective containing smooth and non-smooth terms. To solve the problem, a proximal-gradient scheme is studied. Specifically, the smooth and nonsmooth terms are dealt with by gradient update and proximal update, respectively. The studied algorithm is closely related to a previous decentralized optimization algorithm, PG-EXTRA [37], but has a few advantages. First of all, in our new scheme, agents use uncoordinated...
Topics: Numerical Analysis, Learning, Optimization and Control, Distributed, Parallel, and Cluster...
Source: http://arxiv.org/abs/1704.07807
Arxiv.org
Jun 30, 2018 Maxim Rabinovich; Dan Klein
texts

eye 4

favorite 0

comment 0

As entity type systems become richer and more fine-grained, we expect the number of types assigned to a given entity to increase. However, most fine-grained typing work has focused on datasets that exhibit a low degree of type multiplicity. In this paper, we consider the high-multiplicity regime inherent in data sources such as Wikipedia that have semi-open type systems. We introduce a set-prediction approach to this problem and show that our model outperforms unstructured baselines on a new...
Topics: Learning, Computing Research Repository, Machine Learning, Computation and Language, Information...
Source: http://arxiv.org/abs/1704.07751
Arxiv.org
Jun 30, 2018 Alan Riva Palacio; Fabrizio Leisen
texts

eye 7

favorite 0

comment 0

In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. They provide flexible models which are tractable from a computational and theoretical point of view. In this paper, we focus on their use for estimating multivariate survival functions. Our model extends the work of...
Topics: Computation, Statistics, Applications, Methodology
Source: http://arxiv.org/abs/1704.07645
Arxiv.org
texts

eye 4

favorite 0

comment 0

In this paper, we study the stochastic gradient descent (SGD) method for the nonconvex nonsmooth optimization, and propose an accelerated SGD method by combining the variance reduction technique with Nesterov's extrapolation technique. Moreover, based on the local error bound condition, we establish the linear convergence of our method to obtain a stationary point of the nonconvex optimization. In particular, we prove that not only the sequence generated linearly converges to a stationary point...
Topics: Learning, Optimization and Control, Computing Research Repository, Machine Learning, Statistics,...
Source: http://arxiv.org/abs/1704.07953
Arxiv.org
Jun 30, 2018 Abhirup Datta; Sudipto Banerjee; James S. Hodges
texts

eye 7

favorite 0

comment 0

Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects which are modelled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used intrinsic conditional autoregressive model which is singular, and its nonsingular extension which lacks interpretability. We propose a new...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07848
Arxiv.org
Jun 30, 2018 Rushil Anirudh; Jayaraman J. Thiagarajan
texts

eye 7

favorite 0

comment 0

Using predictive models to identify patterns that can act as biomarkers for different neuropathoglogical conditions is becoming highly prevalent. In this paper, we consider the problem of Autism Spectrum Disorder (ASD) classification. While non-invasive imaging measurements, such as the rest state fMRI, are typically used in this problem, it can be beneficial to incorporate a wide variety of non-imaging features, including personal and socio-cultural traits, into predictive modeling. We propose...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.07487
Arxiv.org
Jun 30, 2018 Marc Wiedermann; Jonathan F. Donges; Jürgen Kurths; Reik V. Donner
texts

eye 4

favorite 0

comment 0

Complex networks are usually characterized in terms of their topological, spatial, or information-theoretic properties and combinations of the associated metrics are used to discriminate networks into different classes or categories. However, even with the present variety of characteristics at hand it still remains a subject of current research to appropriately quantify a network's complexity and correspondingly discriminate between different types of complex networks, like infrastructure or...
Topics: Physics, Data Analysis, Statistics and Probability, Physics and Society
Source: http://arxiv.org/abs/1704.07599
Arxiv.org
Jun 30, 2018 E. Castilla; A. Ghosh; N. Martín; L. Pardo
texts

eye 4

favorite 0

comment 0

This paper develops a new family of estimators, MDPDEs, as a robust generalization of maximum likelihood estimator for the polytomous logistic regression model (PLRM) by using the DPD measure. Based on these estimators, the family of Wald-type test statistics for linear hypotheses is introduced and their robust properties are theoretically studied through the classical influence function analysis. Some numerical examples are presented to justify the requirement of a suitable robust statistical...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07868
Arxiv.org
Jun 30, 2018 David L. Woodruff; Stefan Zillmann
texts

eye 5

favorite 0

comment 0

In this note we describe experiments on an implementation of two methods proposed in the literature for computing regions that correspond to a notion of order statistics for multidimensional data. Our implementation, which works for any dimension greater than one, is the only that we know of to be publicly available. Experiments run using the software confirm that half-space peeling generally gives better results than directly peeling convex hulls, but at a computational cost.
Topics: Computation, Statistics
Source: http://arxiv.org/abs/1704.07806
Arxiv.org
Jun 30, 2018 Ziyue Chen; Eloise Kaizar
texts

eye 5

favorite 0

comment 0

Randomized controlled trials (RCTs) provide strong internal validity compared with observational studies. However, selection bias threatens the external validity of randomized trials. Thus, RCT results may not apply to either broad public policy populations or narrow populations, such as specific insurance pools. Some researchers use propensity scores (PSs) to generalize results from an RCT to a target population. In this scenario, a PS is defined as the probability of participating in the...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07789
Arxiv.org
Jun 30, 2018 Maksim Butsenko; Johan Swärd; Andreas Jakobsson
texts

eye 6

favorite 0

comment 0

In this paper, we introduce a wideband dictionary framework for estimating sparse signals. By formulating integrated dictionary elements spanning bands of the considered parameter space, one may efficiently find and discard large parts of the parameter space not active in the signal. After each iteration, the zero-valued parts of the dictionary may be discarded to allow a refined dictionary to be formed around the active elements, resulting in a zoomed dictionary to be used in the following...
Topics: Statistics, Information Theory, Computing Research Repository, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.07584
Arxiv.org
Jun 30, 2018 Toshiki Sato; Yuichi Takano
texts

eye 5

favorite 0

comment 0

This paper is concerned with the nonparametric item response theory (NIRT) for estimating item characteristic curves (ICCs) and latent abilities of examinees on educational and psychological tests. In contrast to parametric models, NIRT models can estimate various forms of ICCs under mild shape restrictions, such as the constraints of monotone homogeneity and double monotonicity. However, NIRT models frequently suffer from estimation instability because of the great flexibility of nonparametric...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07736
Arxiv.org
Jun 30, 2018 Martin Bauer; Sarang Joshi; Klas Modin
texts

eye 5

favorite 0

comment 0

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)---an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of diffeomorphisms. The resulting sampling algorithm is a promising...
Topics: Probability, Statistics Theory, Statistics, Numerical Analysis, Mathematics
Source: http://arxiv.org/abs/1704.07897
Arxiv.org
Jun 30, 2018 Curtis B. Storlie; Terry M. Therneau; Rickey E. Carter; Nicholas Chia; John R. Bergquist; Jeanne M. Huddleston; Santiago Romero-Brufau
texts

eye 4

favorite 0

comment 0

We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-ICU patients using ~200 variables (vitals, lab results, assessments, ...). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07904
Arxiv.org
Jun 30, 2018 Dragan Radulovic; Marten Wegkamp
texts

eye 8

favorite 0

comment 0

We offer an umbrella type result which extends the convergence of classical empirical process on the line to more general processes indexed by functions of bounded variation. This extension is not contingent on the type of dependence of the underlying sequence of random variables. As a consequence we establish the weak convergence for stationary empirical processes indexed by general classes of functions under alpha mixing conditions.
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.07873
Arxiv.org
Jun 30, 2018 Joseph Antonelli; Giovanni Parmigiani; Francesca Dominici
texts

eye 4

favorite 0

comment 0

In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders ($p$) is larger than the number of observations ($n$), then direct control for all these potential confounders is infeasible. Existing approaches for dimension reduction and penalization are for the most part aimed at predicting the outcome, and are not suited for estimation of causal effects. We propose continuous spike and...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07532
Arxiv.org
Jun 30, 2018 Longshaokan Wang; Eric B. Laber; Katie Witkiewitz
texts

eye 6

favorite 0

comment 0

Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We propose a method for constructing a low-dimensional representation of the original decision process...
Topics: Statistics Theory, Statistics, Machine Learning, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.07531
Arxiv.org
Jun 30, 2018 Qiang Liu
texts

eye 5

favorite 0

comment 0

Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on an efficient gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional...
Topics: Machine Learning, Statistics
Source: http://arxiv.org/abs/1704.07520
Arxiv.org
Jun 30, 2018 Vladimír Holý; Karel Šafr
texts

eye 7

favorite 0

comment 0

In national accounts, relations between industries are analyzed using input-output tables. In the Czech Republic these tables are compiled in a five year period. For the remaining years tables must be estimated. Typically, this is done by the RAS method which takes the structure between industries from the last known table and adjusts it to the current industry consumption totals. This approach can also be used for more detailed tables, e.g. quarterly and regional tables. However, the regular...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07814
Arxiv.org
Jun 30, 2018 Mario Chavez; Fanny Grosselin; Aurore Bussalb; Fabrizio De Vico Fallani; Xavier Navarro-Sune
texts

eye 7

favorite 0

comment 0

Objective: The recent emergence and success of electroencephalography (EEG) in low-cost portable devices, has opened the door to a new generation of applications processing a small number of EEG channels for health monitoring and brain-computer interfacing. These recordings are, however, contaminated by many sources of noise degrading the signals of interest, thus compromising the interpretation of the underlying brain state. In this work, we propose a new data-driven algorithm to effectively...
Topics: Physics, Data Analysis, Statistics and Probability, Medical Physics
Source: http://arxiv.org/abs/1704.07603
Arxiv.org
Jun 30, 2018 Snigdha Panigrahi; Nadia Fawaz
texts

eye 4

favorite 0

comment 0

Characterization of consumers has been explored previously in prior works with different goals. The current work presents user personas in a VoD streaming space using a tenure timeline and temporal behavioral features in the absence of explicit user profiles. A choice of tenure timeline caters to business needs of understanding the evolution and phases of user behavior as their accounts age while temporal characteristics are necessary to capture the dynamic aspect of user personas labels. Our...
Topics: Machine Learning, Statistics
Source: http://arxiv.org/abs/1704.07554
Arxiv.org
Jun 30, 2018 Kelvin Guu; Panupong Pasupat; Evan Zheran Liu; Percy Liang
texts

eye 3

favorite 0

comment 0

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning...
Topics: Learning, Machine Learning, Statistics, Artificial Intelligence, Computing Research Repository
Source: http://arxiv.org/abs/1704.07926
Arxiv.org
Jun 30, 2018 Amanda Lenzi; Ingelin Steinsland; Pierre Pinson
texts

eye 7

favorite 0

comment 0

The share of wind energy in total installed power capacity has grown rapidly in recent years around the world. Producing accurate and reliable forecasts of wind power production, together with a quantification of the uncertainty, is essential to optimally integrate wind energy into power systems. We build spatio-temporal models for wind power generation and obtain full probabilistic forecasts from 15 minutes to 5 hours ahead. Detailed analysis of the forecast performances on the individual wind...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07606
Arxiv.org
Jun 30, 2018 Changde Du; Changying Du; Jinpeng Li; Wei-long Zheng; Bao-liang Lu; Huiguang He
texts

eye 5

favorite 0

comment 0

In emotion recognition, it is difficult to recognize human's emotional states using just a single modality. Besides, the annotation of physiological emotional data is particularly expensive. These two aspects make the building of effective emotion recognition model challenging. In this paper, we first build a multi-view deep generative model to simulate the generative process of multi-modality emotional data. By imposing a mixture of Gaussians assumption on the posterior approximation of the...
Topics: Learning, Machine Learning, Statistics, Artificial Intelligence, Computing Research Repository
Source: http://arxiv.org/abs/1704.07548
Arxiv.org
Jun 30, 2018 Feng Nan; Venkatesh Saligrama
texts

eye 6

favorite 0

comment 0

We present a dynamic model selection approach for resource-constrained prediction. Given an input instance at test-time, a gating function identifies a prediction model for the input among a collection of models. Our objective is to minimize overall average cost without sacrificing accuracy. We learn gating and prediction models on fully labeled training data by means of a bottom-up strategy. Our novel bottom-up method is a recursive scheme whereby a high-accuracy complex model is first...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.07505
Arxiv.org
Jun 30, 2018 N. Balakrishnan; E. Castilla; N. Martin; L. Pardo
texts

eye 8

favorite 0

comment 0

This paper develops a new family of estimators, the minimum density power divergence estimators (MDPDEs), for the parameters of the one-shot device model as well as a new family of test statistics, Z-type test statistics based on MDPDEs, for testing the corresponding model parameters. The family of MDPDEs contains as a particular case the maximum likelihood estimator (MLE) considered in Balakrishnan and Ling (2012). Through a simulation study, it is shown that some MDPDEs have a better behavior...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07865
Arxiv.org
Jun 30, 2018 Daniel J. Lizotte; Arezoo Tahmasebi
texts

eye 5

favorite 0

comment 0

We develop and evaluate tolerance interval methods for dynamic treatment regimes (DTRs) that can provide more detailed prognostic information to patients who will follow an estimated optimal regime. Although the problem of constructing confidence intervals for DTRs has been extensively studied, prediction and tolerance intervals have received little attention. We begin by reviewing in detail different interval estimation and prediction methods and then adapting them to the DTR setting. We...
Topics: Machine Learning, Statistics, Methodology
Source: http://arxiv.org/abs/1704.07453
Arxiv.org
Jun 30, 2018 Jack Fitzsimons; Diego Granziol; Kurt Cutajar; Michael Osborne; Maurizio Filippone; Stephen Roberts
texts

eye 7

favorite 0

comment 0

The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art...
Topics: Numerical Analysis, Computing Research Repository, Machine Learning, Information Theory,...
Source: http://arxiv.org/abs/1704.07223
Arxiv.org
Jun 30, 2018 Sahand Negahban; Sewoong Oh; Kiran K. Thekumparampil; Jiaming Xu
texts

eye 28

favorite 0

comment 0

When tracking user-specific online activities, each user's preference is revealed in the form of choices and comparisons. For example, a user's purchase history tracks her choices, i.e. which item was chosen among a subset of offerings. A user's comparisons are observed either explicitly as in movie ratings or implicitly as in viewing times of news articles. Given such individualized ordinal data, we address the problem of collaboratively learning representations of the users and the items. The...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.07228
Arxiv.org
Jun 30, 2018 Ian M Danilevicz; Ricardo S Ehlers
texts

eye 5

favorite 0

comment 0

BDSAR is an R package which estimates distances between probability distributions and facilitates a dynamic and powerful analysis of diagnostics for Bayesian models from the class of Simultaneous Autoregressive (SAR) spatial models. The package offers a new and fine plot to compare models as well as it works in an intuitive way to allow any analyst to easily build fine plots. These are helpful to promote insights about influential observations in the data.
Topics: Computation, Statistics
Source: http://arxiv.org/abs/1704.07414
Arxiv.org
Jun 30, 2018 Ajay Jasra; Kody Law; Carina Suciu
texts

eye 9

favorite 0

comment 0

This article reviews the application of advanced Monte Carlo techniques in the context of Multilevel Monte Carlo (MLMC). MLMC is a strategy employed to compute expectations which can be biased in some sense, for instance, by using the discretization of a associated probability law. The MLMC approach works with a hierarchy of biased approximations which become progressively more accurate and more expensive. Using a telescoping representation of the most accurate approximation, the method is able...
Topics: Computation, Statistics, Numerical Analysis, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.07272
The angle of rotation of any target about the radar line of sight (LOS) is known as the polarization orientation angle. The orientation angle is found to be non-zero for undulating terrains and man-made targets oriented away from the radar LOS. This effect is more pronounced at lower frequencies (eg. L- and P- bands). The orientation angle shift is not only induced by azimuthal slope but also by range slope. This shift increases the cross-polarization (HV) intensity and subsequently, the...
Topics: Physics, Data Analysis, Statistics and Probability, Classical Physics
Source: http://arxiv.org/abs/1704.07372
Arxiv.org
Jun 30, 2018 Liang-Hsuan Tai; Anuj Srivastava; Kyle A. Gallivan
texts

eye 9

favorite 0

comment 0

The problem of estimating trend and seasonal variation in time-series data has been studied over several decades, although mostly using single time series. This paper studies the problem of estimating these components from functional data, i.e. multiple time series, in situations where seasonal effects exhibit arbitrary time warpings or phase variability across different observations. Rather than ignoring the phase variability, or using an off-the-shelf alignment method to remove phase, we take...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07358
Arxiv.org
texts

eye 5

favorite 0

comment 0

Complex computer codes are often too time expensive to be directly used to perform uncertainty, sensitivity, optimization and robustness analyses. A widely accepted method to circumvent this problem consists in replacing cpu-time expensive computer models by cpu inexpensive mathematical functions, called metamodels. For example, the Gaussian process (Gp) model has shown strong capabilities to solve practical problems , often involving several interlinked issues. However, in case of high...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.07090
Arxiv.org
Jun 30, 2018 Caitlin E Buck; Miguel Juarez
texts

eye 9

favorite 0

comment 0

Due to freely available, tailored software, Bayesian statistics is fast becoming the dominant paradigm in archaeological chronology construction. Such software provides users with powerful tools for Bayesian inference for chronological models with little need to undertake formal study of statistical modelling or computer programming. This runs the risk that it is reduced to the status of a black-box which is not sensible given the power and complexity of the modelling tools it implements. In...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07141
Arxiv.org
Jun 30, 2018 Yuki Fujimoto; Toru Ohira
texts

eye 6

favorite 0

comment 0

We present here a new model and algorithm which performs an efficient Natural gradient descent for Multilayer Perceptrons. Natural gradient descent was originally proposed from a point of view of information geometry, and it performs the steepest descent updates on manifolds in a Riemannian space. In particular, we extend an approach taken by the "Whitened neural networks" model. We make the whitening process not only in feed-forward direction as in the original model, but also in the...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.07147
Arxiv.org
Jun 30, 2018 Véronique Maume-Deschamps; Didier Rullière; Khalil Said
texts

eye 3

favorite 0

comment 0

In [16], a new family of vector-valued risk measures called multivariate expectiles is introduced. In this paper, we focus on the asymptotic behavior of these measures in a multivariate regular variations context. For models with equivalent tails, we propose an estimator of these multivariate asymptotic expectiles, in the Fr\'echet attraction domain case, with asymptotic independence, or in the comonotonic case.
Topics: Statistics, Risk Management, Applications, Quantitative Finance
Source: http://arxiv.org/abs/1704.07152
Arxiv.org
Jun 30, 2018 Zhiqiang Tan; Cun-Hui Zhang
texts

eye 6

favorite 0

comment 0

Additive regression provides an extension of linear regression by modeling the signal of a response as a sum of functions of covariates of relatively low complexity. We study penalized estimation in high-dimensional nonparametric additive regression where functional semi-norms are used to induce smoothness of component functions and the empirical $L_2$ norm is used to induce sparsity. The functional semi-norms can be of Sobolev or bounded variation types and are allowed to be different amongst...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.07229
Arxiv.org
Jun 30, 2018 Ashwin Pananjady; Martin J. Wainwright; Thomas A. Courtade
texts

eye 5

favorite 0

comment 0

The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establish their consistency for a large range of input parameters. Finally, we provide an exact algorithm...
Topics: Statistics Theory, Computing Research Repository, Machine Learning, Information Theory, Statistics,...
Source: http://arxiv.org/abs/1704.07461
Arxiv.org
texts

eye 5

favorite 0

comment 0

It is becoming increasingly clear that complex interactions among genes and environmental factors play crucial roles in triggering complex diseases. Thus, understanding such interactions is vital, which is possible only through statistical models that adequately account for such intricate, albeit unknown, dependence structures. Bhattacharya & Bhattacharya (2016b) attempt such modeling, relating finite mixtures composed of Dirichlet processes that represent unknown number of genetic...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.07349
Arxiv.org
Jun 30, 2018 Federico Monti; Michael M. Bronstein; Xavier Bresson
texts

eye 5

favorite 0

comment 0

Matrix completion models are among the most common formulations of recommender systems. Recent works have showed a boost of performance of these techniques when introducing the pairwise relationships between users/items in the form of graphs, and imposing smoothness priors on these graphs. However, such techniques do not fully exploit the local stationarity structures of user/item graphs, and the number of parameters to learn is linear w.r.t. the number of users and items. We propose a novel...
Topics: Numerical Analysis, Learning, Computing Research Repository, Machine Learning, Information...
Source: http://arxiv.org/abs/1704.06803
Arxiv.org
Jun 30, 2018 Florentina Bunea; Yang Ning; Marten Wegkamp
texts

eye 8

favorite 0

comment 0

Variable clustering is one of the most important unsupervised learning methods, ubiquitous in most research areas. In the statistics and computer science literature, most of the clustering methods lead to non-overlapping partitions of the variables. However, in many applications, some variables may belong to multiple groups, yielding clusters with overlap. It is still largely unknown how to perform overlapping variable clustering with statistical guarantees. To bridge this gap, we propose a...
Topics: Statistics Theory, Statistics, Machine Learning, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.06977
Arxiv.org
Jun 30, 2018 Hamzeh Torabi; Sayyed Mahmoud Mirjalili; Hossein Nadeb
texts

eye 5

favorite 0

comment 0

In this paper, a new goodness-of-fit test for a location-scale family based on progressively Type-II censored order statistics is proposed. Using Monte Carlo simulation studies, the present researchers have observed that the proposed test for normality is consistent and quite powerful in comparison with existing goodness-of-fit tests based on progressively Type-II censored data. Also, the new test statistic for a real data set is used and the results show that our new test statistic performs...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06787
Arxiv.org
Jun 30, 2018 Antonio Forcina
texts

eye 7

favorite 0

comment 0

The models considered in this paper are a special subclass of Relational models which may be appropriate when a collection of independence statements must hold even after probabilities are re-scaled to sum to 1. After reviewing the basic properties of these models and deriving some new ones, two algorithms for computing maximum likelihood estimates are presented. Some new light is also thrown on the underlying geometry.
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06762
Arxiv.org
Jun 30, 2018 Daniel J. Eck
texts

eye 111

favorite 0

comment 0

The multivariate linear regression model is an important tool for investigating relationships between several response variables and several predictor variables. The primary interest is in inference about the unknown regression coefficient matrix. We propose multivariate bootstrap techniques as a means for making inferences about the unknown regression coefficient matrix. These bootstrapping techniques are extensions of those developed in Freedman (1981), which are only appropriate for...
Topics: Statistics Theory, Statistics, Methodology, Mathematics
Source: http://arxiv.org/abs/1704.07040
Arxiv.org
Jun 30, 2018 Chao Gao; John Lafferty
texts

eye 10

favorite 0

comment 0

We study the problem of testing for structure in networks using relations between the observed frequencies of small subgraphs. We consider the statistics \begin{align*} T_3 & =(\text{edge frequency})^3 - \text{triangle frequency}\\ T_2 & =3(\text{edge frequency})^2(1-\text{edge frequency}) - \text{V-shape frequency} \end{align*} and prove a central limit theorem for $(T_2, T_3)$ under an Erd\H{o}s-R\'{e}nyi null model. We then analyze the power of the associated $\chi^2$ test statistic...
Topics: Statistics Theory, Computing Research Repository, Social and Information Networks, Statistics,...
Source: http://arxiv.org/abs/1704.06742
Arxiv.org
Jun 30, 2018 Weixin Cai; Nima S. Hejazi; Alan E. Hubbard
texts

eye 8

favorite 0

comment 0

Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variables. In analyzing such hypothesis testing procedures, gains in efficiency and power can be achieved by performing variable reduction on the set of...
Topics: Machine Learning, Statistics, Methodology
Source: http://arxiv.org/abs/1704.07008
Arxiv.org
Jun 30, 2018 Matthias Katzfuss; Jonathan R. Stroud; Christopher K. Wikle
texts

eye 7

favorite 0

comment 0

The ensemble Kalman filter (EnKF) is a computational technique for approximate inference on the state vector in spatio-temporal state-space models. It has been successfully used in many real-world nonlinear data-assimilation problems with very high dimensions, such as weather forecasting. However, the EnKF is most appropriate for additive Gaussian state-space models with linear observation equation and without unknown parameters. Here, we consider a broader class of hierarchical state-space...
Topics: Computation, Statistics, Methodology
Source: http://arxiv.org/abs/1704.06988
Arxiv.org
Jun 30, 2018 Prabhat KC; K. Aditya Mohan; Charudatta Phatak; Charles Bouman; Marc De Graef
texts

eye 7

favorite 0

comment 0

Lorentz Transmission Electron Microscopy (TEM) observations of magnetic nanoparticles contain information on the magnetic and electrostatic potentials. Vector Field Electron Tomography (VFET) can be used to reconstruct electromagnetic potentials of the nanoparticles from their corresponding LTEM images. The VFET approach is based on the conventional filtered back projection approach to tomographic reconstructions and the availability of an incomplete set of measurements due to experimental...
Topics: Physics, Condensed Matter, Materials Science, Computational Physics, Computation, Statistics,...
Source: http://arxiv.org/abs/1704.06947
Arxiv.org
Jun 30, 2018 Mathieu Cliche; David Rosenberg; Dhruv Madeka; Connie Yee
texts

eye 6

favorite 0

comment 0

Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on...
Topics: Information Retrieval, Machine Learning, Statistics, Computing Research Repository, Computer Vision...
Source: http://arxiv.org/abs/1704.06687
Arxiv.org
texts

eye 5

favorite 0

comment 0

In this paper we introduce a new feature selection algorithm to remove the irrelevant or redundant features in the data sets. In this algorithm the importance of a feature is based on its fitting to the Catastrophe model. Akaike information crite- rion value is used for ranking the features in the data set. The proposed algorithm is compared with well-known RELIEF feature selection algorithm. Breast Cancer, Parkinson Telemonitoring data and Slice locality data sets are used to evaluate the...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.06656
Arxiv.org
Jun 30, 2018 Zheng Tracy Ke; Minzhe Wang
texts

eye 7

favorite 0

comment 0

In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. We overcome the...
Topics: Statistics, Methodology
Source: http://arxiv.org/abs/1704.07016
Arxiv.org
Jun 30, 2018 Vincent Brault; Christine Keribin; Mahendra Mariadassou
texts

eye 4

favorite 0

comment 0

Latent Block Model (LBM) is a model-based method to cluster simultaneously the $d$ columns and $n$ rows of a data matrix. Parameter estimation in LBM is a difficult and multifaceted problem. Although various estimation strategies have been proposed and are now well understood empirically, theoretical guarantees about their asymptotic behavior is rather sparse. We show here that under some mild conditions on the parameter space, and in an asymptotic regime where $\log(d)/n$ and $\log(n)/d$ tend...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06629
Arxiv.org
Jun 30, 2018 H. Nadeb; H. Torabi; G. G. Hamedani
texts

eye 3

favorite 0

comment 0

In this paper, we propose several statistics for testing uniformity under progressive Type-I interval censoring. We obtain the critical points of these statistics and study the power of the proposed tests against a representative set of alternatives via simulation. Finally, we generalize our methods for continuous and completely specified distributions.
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06666
Arxiv.org
Jun 30, 2018 Jinghui Chen; Lingxiao Wang; Xiao Zhang; Quanquan Gu
texts

eye 12

favorite 0

comment 0

We consider the phase retrieval problem of recovering the unknown signal from the magnitude-only measurements, where the measurements can be contaminated by both sparse arbitrary corruption and bounded random noise. We propose a new nonconvex algorithm for robust phase retrieval, namely Robust Wirtinger Flow, to jointly estimate the unknown signal and the sparse corruption. We show that our proposed algorithm is guaranteed to converge linearly to the unknown true signal up to a minimax optimal...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository
Source: http://arxiv.org/abs/1704.06256
Arxiv.org
Jun 30, 2018 Jonathan A. Chávez-Casillas; Robert J. Elliott; Bruno Rémillard; Anatoliy V. Swishchuk
texts

eye 6

favorite 0

comment 0

We propose a simple stochastic model for the dynamics of a limit order book, extending the recent work of Cont and de Larrard (2013), where the price dynamics are endogenous, resulting from market transactions. We also show that the conditional diffusion limit of the price process is the so-called Brownian meander.
Topics: Trading and Market Microstructure, Statistics, Applications, Quantitative Finance
Source: http://arxiv.org/abs/1704.06572
Arxiv.org
Jun 30, 2018 Saviz Mowlavi; Themistoklis P. Sapsis
texts

eye 6

favorite 0

comment 0

Stochastic dynamical systems with continuous symmetries arise commonly in nature and often give rise to coherent spatio-temporal patterns. However, because of their random locations, these patterns are not well captured by current order reduction techniques and a large number of modes is typically necessary for an accurate solution. In this work, we introduce a new methodology for efficient order reduction of such systems by combining (i) the method of slices, a symmetry reduction tool, with...
Topics: Physics, Fluid Dynamics, Data Analysis, Statistics and Probability, Dynamical Systems,...
Source: http://arxiv.org/abs/1704.06352
Arxiv.org
Jun 30, 2018 Sam Gross; Marc'Aurelio Ranzato; Arthur Szlam
texts

eye 8

favorite 0

comment 0

Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new...
Topics: Machine Learning, Statistics, Computing Research Repository, Computer Vision and Pattern Recognition
Source: http://arxiv.org/abs/1704.06363
The rates of respiratory prescriptions vary by GP surgery across Scotland, suggesting there are sizeable health inequalities in respiratory ill health across the country. The aim of this paper is to estimate the magnitude, spatial pattern and drivers of this spatial variation. Monthly data on respiratory prescriptions are available at the GP surgery level, which creates an interesting methodological challenge as these data are not the classical geostatistical, areal unit or point process data...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.06492
Arxiv.org
Jun 30, 2018 Markus Bibinger; Christopher Neely; Lars Winkelmann
texts

eye 10

favorite 0

comment 0

An extensive empirical literature documents a generally negative correlation, named the "leverage effect" between asset returns and changes of volatility. It is more challenging to establish such a return-volatility relationship for jumps in high-frequency data. We propose new nonparametric methods to assess and test for a discontinuous leverage effect --- that is, a relation between contemporaneous jumps in prices and volatility --- in high-frequency data with market microstructure...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06537
Arxiv.org
Jun 30, 2018 Matineh Shaker; Deniz Erdogmus; Jennifer Dy; Sylvain Bouix
texts

eye 3

favorite 0

comment 0

We present a method to estimate a multivariate Gaussian distribution of diffusion tensor features in a set of brain regions based on a small sample of healthy individuals, and use this distribution to identify imaging abnormalities in subjects with mild traumatic brain injury. The multivariate model receives a {\em apriori} knowledge in the form of a neighborhood graph imposed on the precision matrix, which models brain region interactions, and an additional $L_1$ sparsity constraint. The model...
Topics: Statistics, Applications
Source: http://arxiv.org/abs/1704.06408
Arxiv.org
Jun 30, 2018 G. S. Rodrigues; D. Prangle; S. A. Sisson
texts

eye 3

favorite 0

comment 0

A new recalibration post-processing method is presented to improve the quality of the posterior approximation when using Approximate Bayesian Computation (ABC) algorithms. Recalibration may be used in conjunction with existing post-processing methods, such as regression-adjustments. In addition, this work extends and strengthens the links between ABC and indirect inference algorithms, allowing more extensive use of misspecified auxiliary models in the ABC context. The method is illustrated...
Topics: Computation, Statistics, Methodology
Source: http://arxiv.org/abs/1704.06374
Arxiv.org
Jun 30, 2018 R. Sharma
texts

eye 5

favorite 0

comment 0

It is shown that the formula for the variance of combined series yields surprisingly simple proofs of some well known variance bounds.
Topics: Other Statistics, Statistics
Source: http://arxiv.org/abs/1704.06292
Arxiv.org
Jun 30, 2018 Guy Martial Nkiet
texts

eye 4

favorite 0

comment 0

This paper deals with asymptotics for multiple-set linear canonical analysis (MSLCA). A definition of this analysis, that adapts the classical one to the context of Euclidean random variables, is given and properties of the related canonical coefficients are derived. Then, estimators of the MSLCA's elements, based on empirical covariance operators, are proposed and asymptotics for these estimators are obtained. More precisely, we prove their consistency and we obtain asymptotic normality for...
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06428
Arxiv.org
Jun 30, 2018 Lucas Lacasa; Wolfram Just
texts

eye 4

favorite 0

comment 0

Visibility algorithms are a family of geometric and ordering criteria by which a real-valued time series of N data is mapped into a graph of N nodes. This graph has been shown to often inherit in its topology non-trivial properties of the series structure, and can thus be seen as a combinatorial representation of a dynamical system. Here we explore in some detail the relation between visibility graphs and symbolic dynamics. To do that, we consider the degree sequence of horizontal visibility...
Topics: Physics, Data Analysis, Statistics and Probability, Nonlinear Sciences, Dynamical Systems, Chaotic...
Source: http://arxiv.org/abs/1704.06467
Arxiv.org
Jun 30, 2018 Dimbihery Rabenoro
texts

eye 4

favorite 0

comment 0

We present a functional form of the Erd\"os-Renyi law of large numbers for Levy processes.
Topics: Statistics Theory, Statistics, Mathematics
Source: http://arxiv.org/abs/1704.06521