Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

JIAJING

(CAROLINE) ZHENG
Santa Barbara, CA, 93106• jzheng@pstat.ucsb.edu• (805) 689-8609

EDUCATION
University of California, Santa Barbara (UCSB) Santa Barbara, CA
Ph.D. candidate in Statistics and Applied Probability GPA: 3.98/4.0 Sept. 2016 – Dec. 2021(Expected)
Wuhan University (WHU) Wuhan, China
B.S. in Statistics GPA: 3.87/4.0 (Rank 1/36) Sept. 2012 – June 2016
SKILLS
Statistical: Causal Inference, Sensitivity Analysis, Observational Studies, Multivariate Analysis, Predictive Modeling,
Machine Learning, Regression Analysis, Bayesian Inference, Data Visualization, Analytical Thinking
Programming/Software: R, Python, SQL, Jupiter, GitHub, STAN, SAS, LaTeX, MATLAB, Excel, Word, PowerPoint
RELEVANT WORK EXPERIENCE
Data Science Intern June 2020 – Sept. 2020
PayPal, San Jose, CA
• Developed and validated cutting-edge, industry leading credit and merchant models using advanced data mining and
machine learning techniques.
Graduate Research/Teaching Assistant Sept. 2016 – Present
Department of Statistics and Applied Probability, University of California, Santa Barbara
• Researching and developing sensitivity analysis methods for observational causal inference, specializing in high-
dimensional multivariate settings that commonly arise in practice but have been explored limitedly.
• Lead lab and discussion sessions for graduate and undergraduate core courses including Advanced Statistical
Methods, Advanced Statistical Theory, Regression Analysis, Bayesian Data Analysis and etc.
PROJECTS
Credit Model Enhancement with GAN-Synthetic Data (Internship) June 2020 – Sept. 2020
• Extracted, manipulated and analyzed large credit datasets using tools such as Python., Hadoop, SPARK, R and SQL.
• Validated cutting-edge, industry leading credit models with Light Gradient Boosted Machine (LightGBM).
• Innovatively generated credit data using variants of Generative Adversarial Network (GAN) models including DCGAN,
WGAN and cGAN; creatively employed tools evaluating the quality of synthetic credit data.
• Successfully improved industry leading credit solutions with the GAN-synthetic data by 2%, which beats the
conventional method, Synthetic Minority Oversampling Techniques (SMOTE), especially for the problematic segment.
• Presented project results in a clear and effective manner to the whole data science organization.
Copula-based Sensitivity Analysis for Observational Multivariate Causal Inference Apr. 2019 – Present
• Constructed new sensitivity analysis method for causal inference with Gaussian multivariate treatment (eg. Genome-
Wide Association Studies) and/or outcome by taking advantage of the multivariate structure.
• Extended results to non-Gaussian cases via Gaussian copula and advanced latent variable models such as
Semiparametric Bayesian Gaussian Copula (sbgcop), Logistic Factor Analysis (LFA), Variational Autoencoder (VAE).
• Developed intuitive and interpretable procedure for calibrating sensitivity parameters
• Compiled the project ideas as poster which was presented at 2019 Atlantic Causal Inference Conference (ACIC).
Sensitivity Analysis for Observational Causal Inference with Multinomial Treatment Sept. 2018 – Apr. 2019
• Extended Tukey’s sensitivity analysis framework from cases with binary treatment to multinomial treatment.
• Enabled computationally efjicient identijication of observed potential outcomes by nonparametric methods or jlexible
machine learning methods, such as Bayesian Additive Regression Trees (BART), unperturbed by sensitivity analysis.
• Deduced analytically tractable distribution for missing potential outcomes by adopting adjacent-categories logits
model for treatment assignment mechanism, and created an interpretable method to calibrate the magnitude of
sensitivity parameters via pseudo partial R 2, facilitating the use of prior knowledge or domain expertise.
• Developed TukeySens package realizing the proposed method and visualizing the results with heatmaps, contour plots
and ribbon plots via ggplot2.
2016 Presidential Election Data Analysis (Machine Learning) March 2018
• Predict voter behavior successfully utilizing machine learning tools and interpreted the common wrong prediction.
• Conducted preliminary data analysis including data wrangling and visualization using tidyverse, dplyr and ggplot2.
• Performed dimensionality reduction by PCA and investigated data via principal components and prominent loadings.
• Implemented various classijication models including decision tree, KNN, logistic regression, LDA, QDA with both native
attributes and principal components, and compared models by cross-validation scores.
HONORS & AWARDS
Abraham Wald Memorial Prize, UCSB 2018
• Nominated and selected by the Faculty in recognition of scholarly excellence in graduate studies.

You might also like