Download as pdf or txt
Download as pdf or txt
You are on page 1of 1


Santa Barbara, CA, 93106•• (805) 689-8609

University of California, Santa Barbara (UCSB) Santa Barbara, CA
Ph.D. candidate in Statistics and Applied Probability GPA: 3.98/4.0 Sept. 2016 – Dec. 2021(Expected)
Wuhan University (WHU) Wuhan, China
B.S. in Statistics GPA: 3.87/4.0 (Rank 1/36) Sept. 2012 – June 2016
Statistical: Causal Inference, Sensitivity Analysis, Observational Studies, Multivariate Analysis, Predictive Modeling,
Machine Learning, Regression Analysis, Bayesian Inference, Data Visualization, Analytical Thinking
Programming/Software: R, Python, SQL, Jupiter, GitHub, STAN, SAS, LaTeX, MATLAB, Excel, Word, PowerPoint
Data Science Intern June 2020 – Sept. 2020
PayPal, San Jose, CA
• Developed and validated cutting-edge, industry leading credit and merchant models using advanced data mining and
machine learning techniques.
Graduate Research/Teaching Assistant Sept. 2016 – Present
Department of Statistics and Applied Probability, University of California, Santa Barbara
• Researching and developing sensitivity analysis methods for observational causal inference, specializing in high-
dimensional multivariate settings that commonly arise in practice but have been explored limitedly.
• Lead lab and discussion sessions for graduate and undergraduate core courses including Advanced Statistical
Methods, Advanced Statistical Theory, Regression Analysis, Bayesian Data Analysis and etc.
Credit Model Enhancement with GAN-Synthetic Data (Internship) June 2020 – Sept. 2020
• Extracted, manipulated and analyzed large credit datasets using tools such as Python., Hadoop, SPARK, R and SQL.
• Validated cutting-edge, industry leading credit models with Light Gradient Boosted Machine (LightGBM).
• Innovatively generated credit data using variants of Generative Adversarial Network (GAN) models including DCGAN,
WGAN and cGAN; creatively employed tools evaluating the quality of synthetic credit data.
• Successfully improved industry leading credit solutions with the GAN-synthetic data by 2%, which beats the
conventional method, Synthetic Minority Oversampling Techniques (SMOTE), especially for the problematic segment.
• Presented project results in a clear and effective manner to the whole data science organization.
Copula-based Sensitivity Analysis for Observational Multivariate Causal Inference Apr. 2019 – Present
• Constructed new sensitivity analysis method for causal inference with Gaussian multivariate treatment (eg. Genome-
Wide Association Studies) and/or outcome by taking advantage of the multivariate structure.
• Extended results to non-Gaussian cases via Gaussian copula and advanced latent variable models such as
Semiparametric Bayesian Gaussian Copula (sbgcop), Logistic Factor Analysis (LFA), Variational Autoencoder (VAE).
• Developed intuitive and interpretable procedure for calibrating sensitivity parameters
• Compiled the project ideas as poster which was presented at 2019 Atlantic Causal Inference Conference (ACIC).
Sensitivity Analysis for Observational Causal Inference with Multinomial Treatment Sept. 2018 – Apr. 2019
• Extended Tukey’s sensitivity analysis framework from cases with binary treatment to multinomial treatment.
• Enabled computationally efjicient identijication of observed potential outcomes by nonparametric methods or jlexible
machine learning methods, such as Bayesian Additive Regression Trees (BART), unperturbed by sensitivity analysis.
• Deduced analytically tractable distribution for missing potential outcomes by adopting adjacent-categories logits
model for treatment assignment mechanism, and created an interpretable method to calibrate the magnitude of
sensitivity parameters via pseudo partial R 2, facilitating the use of prior knowledge or domain expertise.
• Developed TukeySens package realizing the proposed method and visualizing the results with heatmaps, contour plots
and ribbon plots via ggplot2.
2016 Presidential Election Data Analysis (Machine Learning) March 2018
• Predict voter behavior successfully utilizing machine learning tools and interpreted the common wrong prediction.
• Conducted preliminary data analysis including data wrangling and visualization using tidyverse, dplyr and ggplot2.
• Performed dimensionality reduction by PCA and investigated data via principal components and prominent loadings.
• Implemented various classijication models including decision tree, KNN, logistic regression, LDA, QDA with both native
attributes and principal components, and compared models by cross-validation scores.
Abraham Wald Memorial Prize, UCSB 2018
• Nominated and selected by the Faculty in recognition of scholarly excellence in graduate studies.

You might also like