Giulia Bertagnolli

Photo

PhD in Math @unitn and @CoMuNe Lab - FBK. Interested in Complex Networks and working on their Geometry!

24 October 2018

MSc

by G. Bertagnolli

Thesis Title: Complex Networks and Statistical Data Depths.

Abstract

A statistical data depth is a measure of depth or outlyingness of a sample with respect to its underlying distribution and it provides a centre-outward ordering of sample points. It can be used to estimate, non-parametrically, location, scale, skewness an other data characteristics. The deepest point can be seen as multivariate Zuo, 2000a or functionalNieto-reyes, 2016 extension of the median; the trimmed regions of percentiles. Therefore depths are extremely important, since they enable us to generalise statistical tools like e.g. boxplots, DD-plots, test of hypothesis to more complex data Liu, 1999. Local versions of depth are also studied Agostinelli, 2011. Defining a depth for networks would mean the possibility to extend to graphs and complex networks these grounded descriptive and inference methods.

The problem with networks is that they do not come with a space, so before applying them the projected Tukey depth (PTD), a newly defined variation on the well-known halfspace Tukey depth, they need to be embedded in space. This can be achieved through a classical multidimensional scaling of a distance matrix. Two options are here available:

The free parameters are , the dimension of the embedding space and , diffusion time in case diffusion maps embedding. If is very small most of the information contained in the data is lost and this results in a random centrality of nodes; whereas if is too large compared to the number of nodes the sample becomes excessively diluted in , so there is not enough information for significant inference and all vertices appear to have the same depth, with some negligible fluctuations. In between we observe a range of values for for which nodes depth ranking remains stable. This, together with evaluation of the statistical content of the reduced sample enables us to select a proper embedding dimension. As shown in Coifman, 2006, plays the role of scale parameter since computing involves summing over all paths between and of length at most . This was further shown in De Domenico, PhysRevLett.118.168301, 2017 by direct application on complex networks: small values uncover the microscale structures and increasing uncovers the macroscale.

back

tags: statistics;networks;