Who Am I

Graduated from Columbia University, I got my master degree in statistics there. I am currently working as a data scientist in CVS Health. I performed statistical analysis, data mining, data visulization and provide insights to the data in my daily work. I am a motivated person with outstanding oral and written communication skills, analytic skills and data mining skills.

Coursework: Data Mining, Machine Learning, Linear Regression, Advanced Data Analysis

Specialties:Computer proficiency in R; Python(including pandas, numpy, etc.); SQL; Teradata; Hive; Hadoop; Tableau; Bash; SAS; SPSS; Microsoft Word, PowerPoint, Excel

Things I Have Done

Education

Columbia University, Graduate School of Arts and Sciences, New York, NY
Sept 2013 - Feb. 2015
MA in Statistics, GPA: 3.8/4.0
Relevant Coursework: Linear Regression Analysis, Data Mining, Machine Learning, Time Series, Math Finance, Marketing Research, Advanced Data Analysis

Hangzhou Dianzi University, College of Economics, Hangzhou, China
Sept 2009 - June 2013
Bachelor Degree in Economics, GPA: 4.0/5.0, Top 10% in Class
Relevant Coursework: Securities Investment, Sampling Theory, Accounting of National Economy

Project Experience

Who Will Subscribe A Term Deposit -- A Study of Direct Marketing Campaign, Columbia University, New York, NY

  • Organized and managed project team by scheduling time frame and inspiring team members to generate new ideas
  • Implemented Classification algorithms to identify the main factors for the success of marketing campaign
  • Conducted intensive research on imputing missing data and fixing imbalanced data
  • Prepared presentation materials and wrote report about the findings

Cultivating the Regular -- A Study of Epicurean Management, Columbia University, New York, NY

  • Used Regression Analysis and statistical test to find out the difference in expectations for regular and non-regular customers for a restaurant
  • Assisted in designing questions and sampling procedures in the survey
  • Conducted background research for the restaurant by performing in-depth interviews with manager and clients
  • Wrote analytical part for the final paper and prepared presentation materials

Zhejiang Provincial Marketing Research Competition, Hangzhou Dianzi University, Hangzhou, China
Project: Satisfaction Rate of Residents Towards Online Train Tickets Purchasing System

  • Designed readable questionnaire for people of varying educational backgrounds
  • Conducted research in finding the elements that can affect satisfaction rate using factor analysis, logistic and linear regression through SPSS
  • Wrote 40-page report identifying the elements and presented to 200 judges, professors and students

Relevant Quantitative Experience

CVS Health, Woonsocket, RI
April 2017 - Present
Senior Data Scientist/Manager

  • Implemented PTB and Cross-Sell model to build up category/sub-category level product recommendation system
  • Expended spatial-temporal model for category/sub-category product recommendation

ownerIQ, Boston, MA
May 2015 - Mar. 2017
Data Scientist

  • Provided statistical consultancy to internal teams and solved business problem via statistical analysis and data mining methodology, such as statistical testing, correlation analysis, regression analysis, random forest etc.
  • Programmed web-data crawlers in python to collect data from NOAA, a weather report website, and perform analysis on how severe weather alerts will change online users browsing activities using anomaly detection and spatial time series analysis and has been implemented in weather optimization campaign
  • Performed demographic analysis of online audience for online retailers using hive, MySQL and python and visualized the data via tableau to turn it into a data product
  • Developed and designed decaying algorithms to assign weights to opportunities based on distance and geo-weight the delivery of online advertisement for drive-in store campaign, resulting in 85% of the impression delivered within 20 miles around store and increase conversion rate by 30%
  • Built up a general regression model to forecast the offline units sold for retailers based on their online units sold, which explains 98% of the variance of dependent variable
  • Upgraded the former CPA predictor for different types of campaign by introducing new features and building up a regression model based on Random Forest, increasing the explained variance for dependent variable to 73%
  • Designed and developed the automated process of generating data products reports using intensive Hive, Python, SQL, Bash programming, reducing the processing time for one data product by 6 hours
  • Participated in software code reviews in Bash, Hive, SQL, Python
  • Created Tableau template to visualize data using graphs including bar plot, geo heat maps, line plots, etc. and perform exploratory data analysis to summarize and find the patterns behind the data set
  • Supported daily data pull requests from various teams and attended client meetings to present data products

Columbia Business School, New York, NY
Mar. 2015 - May 2015
Research Assistant

  • Managed data received from clients, writing queries in SQL to combine and select data required by professors
  • Implemented Time Series Analysis including DID methods, seasonal ARIMA model and intervention analysis and statistical methods to study the impact of newly implemented incentive schemes on revenue for clients and identified the impact of the schemes on the distribution of wages for their employees

HERMES CAPITAL ADVISORY, New York, NY
Oct. 2014 - Dec. 2014
Software Development Intern

  • Implemented text mining algorithms to classify and cluster articles of companies, including using Python to extract useful words, transform text into vectors, and utilize topic models and LDA
  • Crawled articles and data from companies' main page and product and service page using Python
  • Performed data manipulation and data clean using Python and Mongo DB

CROWDNETICS, New York, NY
Summer 2014
Data Engineer Intern

  • Designed and built a system which designates which loan applicants should be approved or not using logistic regression and ensemble algorithms using extensive R programming
  • Created loan and interest rate prediction models for p2p market using tree based classification algorithm through R
  • Performed data manipulation and data clean in R and applied exploratory data analysis (EDA) on dataset
  • Documented the complete process flow to describe project development, logic, and implementation, coding

Additional Experience

NEW CHANNEL ENGLISH SCHOOL, Fuzhou, China
Summer 2013

Teaching Assistant

  • Gave presentations and taught students the strategies of preparing for TOFEL
  • Analyzed teacher effectiveness on students' IELTS and TOFEL performance using t-test and ANOVA through SAS

Certification

  • SAS Base Programming for SAS 9 (License Number: BP039314v9)
  • SAS Advanced Programming for SAS 9 (License Number: AP012890v9)

Skills & Interest

  • Computer: R (Proficient), Python, Hive, Hadoop, Tableau, Bash, SPSS, SAS, MATLAB, EVIEWS, ITSM, SQL, VBA, MS Suite (Advanced Excel Skills), Google Docs
  • Languages: Native Mandarin
  • Interests: Guitar, Tennis

Leadership Experience

Columbia University Asia-Pacific Development Society, New York, NY
Dec 2013 - Feb 2015

Vice President of Operations

  • Planned and organized several forums, including financial summit in Columbia University, informational sessions, including McKinney & Company
  • Led members of organization to contact and invite the honored guests, select and layout the activity sites and keep on-site order
  • Negotiated with university administration officer on space and logistics for Chinese cultural forums and celebrity lectures

Contact Me