统计学系系列讲座之276-279期

统计学系系列讲座之276期

时间：2017 年 12月18日（星期一）14:30-15:30

地点：史带楼303室

主持人：张新生教授复旦大学管理学院统计学系

主题：Sure Explained Variability and Independence Screening

主讲人：陈敏研究员中国科学院数学与系统科学研究院

简介：陈敏研究员现任中国科学院数学与系统科学研究院统计科学研究中心常务副主任，曾任中国科学院数学与系统科学研究院副院长。目前，他还有《数理统计与管理》主编、《应用数学学报（中文版）》副主编、全国统计方法应用技术标准化委员会主任委员、中国数学会副理事长、中国统计教育学会副会长等多项社会兼职。其主要研究方向为：金融统计与风险管理、非线性时间序列的统计分析、非参数统计估计和检验的大样本理论等。

摘要：In the era of Big Data, extracting the most important exploratory variables available in ultrahigh dimensional data plays a key role in scientific researches. Existing researches have been mainly focusing on applying the extracted exploratory variables to describe the central tendency of their related response variables. For a response variable, its variability characteristic is as much important as the central tendency in statistical inference. This paper focuses on the variability and proposes a new model-free feature screening approach: sure explained variability and independence screening (SEVIS). The core of SEVIS is to take the advantage of recently proposed asymmetric and nonlinear generalized measures of correlation in the screening. Under some mild conditions, the paper shows that SEVIS not only possesses desired sure screening property and ranking consistency property, but also is a computational convenient variable selection method to deal with ultrahigh-dimensional data sets with more features than observations. The superior performance of SEVIS, compared with existing model-free methods, is illustrated in extensive simulations. A real example in ultrahigh-dimensional variable selection demonstrates that the variables selected by SEVIS better explain not only the response variables, but also the variables selected by other methods.

统计学系系列讲座之279期

时间：2017 年 12月21日（星期四）14:30-15:30

地点：史带楼205室

主持人：黄达博士复旦大学管理学院统计学系

主题：Combining multiple observational data sources to estimate causal effects

主讲人：丁鹏博士

加利福尼亚大学伯克利分校

简介：Peng Ding is an Assistant Professor in the Department of Statistics, UC Berkeley. He obtained B.S. in math, B.A. in economics and M.S. in statistics from Peking University, and Ph.D. in statistics from Harvard University. His research interest is causality.

摘要：The era of big data has witnessed the increasing availability of multiple data sources for statistical analyses. As an important example in causal inference, we consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies while still preserve the consistencies of the initial estimators based solely on the validation data. The proposed framework incorporates asymptotically normal initial estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders with the observed variables. Coupled with appropriate bootstrap procedures, our method is straightforward to implement requiring only software routines for existing estimators.

统计学系

2017-12-14