Ree Jung Kim

Seoul, South Korea. | reejung.kim@gmail.com


Machine Learning / Deep Learning Applications

Semantic segmentation

View repository
  • Objective: Seek to answer what's in the image, and where in the image is the class located
  • Model/Algorithm used: pascal voc, ade20k

Object detection

Application for object detection View repository
  • Objective: detect and classify objects
  • DL framework: OpenCV
  • Model/Algorithm used: YOLO v3

Time series

View repository

Bankruptcy prediction

View code

  • Objective: Predict brankruptcies
  • Model candidates: random forest, gradientboost, xgboost, lightGBM classifier
  • Model selected: XGB classifier
  • Feature reduction algorithm: Recursive feature elimination with cross validation (RFECV)
  • Hyperparameter optimization: RandomizedSearchCV
  • Evaluation metrics: confusion matrix, ROC, accuracy, precision, recall score, f1-score, support
  • Model interpretation: feature importance, Shapley values, Morris Sensitivity

Housing price prediction

View repository

  • Objective: Predict housing prices
  • Data collection: Kaggle data set
  • Data Preprocessing: label encoder, robust scaler
  • Model/Algorithm used: random forest regression, support vector regression, gradient boosting regression
  • Hyperparameter optimization: RandomizedSearchCV
  • Evaluation metrics: mean absolute error (mae), mean squared error (mse), mean absolute percentage error (MAPE)
  • Challenge: dimensionality reduction
  • Model interpreation: tree feature importance, Shapley Additive Explanations(SHAP), Local Interpretable Model-Agnostic Explanations(LIME)

Spam email detection

View repository

  • Objective: Spam classification based on NLP
  • Data collection: Kaggle data set
  • Data Preprocessing: spelling correction, remove stopwords, stem extraction (lemmatization), tokenizer, padding
  • Model/Algorithm used: Tensorflow - Sequential, Embedding, Simple RNN

Fraud detection

View repository

Outlier detection

View repository

  • Objective: detect/remove outliers
  • Model/Algorithm used: Cook's distance

Market basket analysis

View repository
  • Objective: Recommend items based on retail purchase data
  • Data collection: UCI
  • Model/Algorithm used: apriori function


Exploratory Data Analysis / Visualization

EDA & Visualization

Vision

View repository

Optimization

View repository

    demo:

  • Objective: Enable users to choose data range and easily get the optimal weight of portfolio
  • Data collection: Stock data from yahoo using pandas data reader
  • Model/Algorithm used: Markowitz portfolio theory
  • Deployment method: Heroku, Streamlit

Experience

Big Data Research & Analysis

ZIGBANG

Conducted research and applied ML/DL algorithms and advanced statistical methods to analyze real estate housing market to provide insights and improve decision making system.

Prototyped, developed, and implemented research analysis on B2B web service to enable consumers in construction and financial industry to intuitively understand real estate markets.

Developed ETL scripts in Python to load data from multiple sources and to clean and transform data for analysis.

Dveloped interactive visualized dshboards to enable dta exploration by clients.

Python, MySQL, Presto, Kubeflow, Airflow, Git, API, Tableau, AppScript, Slackbot.

Jun 2022 - Present

Seoul, Republic of Korea

Senior Data Analyst

Welcome Saving Digital Bank

Developed alternative credit scoring system and oversaw model validation.

Developed and implemented dashboards to support business decisions.

Python, Oracle SQL, Tableau, Air Bridge.

May 2021 - May 2022

Seoul, Republic of Korea

Data Analyst Consultant

Ernst & Young (EY)

Built solutions to extract and transform complex data sets to validate data and check information accuracy and integrity.

Delivered tax advisory report for private equity executives and partners.

Python, VBA, Power Pivot, Power Query, HTML, CSS.

June 2019 - March 2021

London, U.K.

Business Intelligence intern

Meero

Implemented regression analysis on time series data and client segmentation analysis.

Extracted and transformed raw data stored in SQL into actionable insights and business strategies using R.

MySQL, Tableau, VBA

April 2018 - August 2018

Paris, France

Data Specialist

Mini Pharmacy Enterprise Inc.

Designed dashboards using QlikSense and PowerBI and eased the presentation of key business metrics.

Created VBA modules to automate procedure of tracking and calculating balance status of account receivables.

QlikView, QlikSense, PowerBI, VBA

November 2015 - October 2016

L.A., C.A., U.S.A.


Education

École Polytechnique

Masters 1 (completion of 1st year)
Economics, Data Analytics & Corporate Finance
September 2017 - September 2018

Palaiseau, France

University of California, Berkeley

B.A.
Economics, Sociology

Dean’s Honors list for Fall semester 2013.

August 2010 - May 2014

Berkeley, C.A., U.S.A.

Van Nuys Senior High

Math & Science magnet

Honors list

August 2017 - June 2010

Van Nuys, C.A., U.S.A.


Skill set

Programming/Query Languages
ML/DL frameworks
Visualization libraries
    Seaborn, Bokeh, Matplotlib, PyDeck, WordCloud, Interpret, sweetviz
XAI
    SHAP, LIME, graphiz
NLP libraries
    NLTK, Konlpy
Image processing libraries
BI tools
Other skill set
Languages
    English, Korean, French
Certificates
    SQL Developer. Completion ID: SQLD-044001484. Expires: April, 2024.
    Google Analytics Individual Qualification. Completion ID: 106382292. Expires: February 19, 2023.
    Certificate of Completion: Object-oriented programming with Java - an introduction. Completed in March 2019.