I am a fourth-year Ph.D. student at UCSB CS and UCSB NLP group, advised by William Yang Wang. During my Ph.D., I did summer internships at AWS AI Labs, Apple MLR, and Microsoft Research. Prior to UCSB, I worked as a research scientist at NAVER and studied at Seoul National University. My main research area is machine learning for natural language processing, focusing on enhancing efficiency and reliability of language models through developing advanced algorithms for retrieval augmentation, adaptive computation, and model training.

News:

  • [Oct 2024] EM-MIA paper is on arXiv.
  • [Sep 2024] I finished my summer internship at AWS AI Labs.
Load more

Interests

  • Efficient Language Models
  • Retrieval
  • Sequence Generation

Education

  • M.S. in Electrical and Computer Engineering, 2017

    Seoul National University

  • B.S. in Electrical and Computer Engineering / Mathematical Science (double major), 2014

    Seoul National University

  • High School (early graduation), 2010

    Seoul Science High School

Experience

 
 
 
 
 

Applied Scientist Intern

AWS AI Labs

Jun 2024 – Sep 2024 Seattle, WA, United States

Mentors:

Worked on:

  • Data Contamination Detection for Large Language Models
 
 
 
 
 

Research Intern

Apple MLR

Jun 2023 – Sep 2023 Cupertino, CA, United States

Mentors:

Worked on:

  • Efficient Long Context Modeling
 
 
 
 
 

Research Intern

Microsoft Research

Jun 2022 – Sep 2022 Redmond, WA, United States

Mentors:

Worked on:

  • Efficient Long Document Summarization
 
 
 
 
 

Ph.D. Student

University of California, Santa Barbara

Sep 2021 – Present Santa Barbara, CA, United States

Adivisor:

Working/Worked on:

  • Efficient Open-domain Question Answering
  • Retrieval-augmented Language Models
  • Attribution of Language Modles
  • Long Context Modeling
 
 
 
 
 

Research Scientist

NAVER Clova & AI LAB

Sep 2017 – Aug 2021 Seongnam, Republic of Korea

Worked on:

  • Fundamental Research for Big LMs

  • Language Representation by Clova (LaRva)

    • Pre-trained Language Models for Korean/Japanese
    • Korean Question Answering ( KorQuAD)
    • Knowledge Distillation of BERT
    • Memory-Augmented Language Models
    • Efficient Transformer Inference
    • Data Augmentation for NLP Models
  • Context Center AI ( CCAI)

    • Natural Language Understanding (Intent Classification & Slot Filling)
    • Automatic Speech Recognition Error Correction
    • End-to-End Spoken Language Understanding
    • Efficient Dialog State Tracking
  • Korean Grammatical Error Correction

  • Language Model based Query Auto-Completion

  • LINE Sticker Reply Recommendation

  • Community Question Answering based on Query Similarity

 
 
 
 
 

Data Scientist

Devsisters

May 2017 – Jul 2017 Seoul, Republic of Korea

Worked on:

  • User Action Modeling for Churn Prediction
  • Customer Service Automation
 
 
 
 
 

Master’s Student

Seoul National University

Mar 2015 – Aug 2017 Seoul, Republic of Korea

Worked on:

  • Language Model based Intrusion Detection System
  • RNA/Protein Secondary Structure Prediction
 
 
 
 
 

Research Intern

Seoul National University

Dec 2012 – Aug 2014 Seoul, Republic of Korea

Worked on:

  • Biological Sequence Classification
  • Parallel Programming for Biological Sequence Alignment

Publications

$*$ denotes eqaul contribution

Detecting Training Data of Large Language Models via Expectation Maximization

Novel membership inference attack framework for large language models via an expectation maximization algorithm

Towards Standardizing Korean Grammatical Error Correction: Datasets and Annotation

Collection of three datasets and developing a new automatic error-type annotation tool for Korean grammatical error correction - ACL 2023

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Improving dense phrase retrieval with unfied loss and hard negatives based on efficient validation using subcorpus - Findings of EMNLP 2022

Consistency Training with Virtual Adversarial Discrete Perturbation

Virtual adversarial training with discrete token perturbation for text classification - NAACL 2022

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Dynamic sequence length reduction for a TinyBERT model - ENLSP NeurIPS Workshop 2021

NASCUP: Nucleic Acid Sequence Classification by Universal Probability

Classification method for nucleotide sequences using compact context-tree models and universal probability from information theory - IEEE Access 2021

SSMix: Saliency-based Span Mixup for Text Classification

Token-level mixup approach based on saliency information - Findings of ACL 2021

Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search

Framework of training any transformer with length drop and using it for anytime prediction with a multi-objective evolutionary search - ACL 2021

Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding

Knowledge distillation from text BERT to speech model by matching sequence-level contextualized representations in pretraining and predicted logits in finetuning - ICASSP 2021

ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding

Speech-text cross-modal pretraining with cross-modal masked language modeling (CM-MLM) and cross-modal conditioned language modeling (CM-CLM) - ICASSP 2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Projecting out the radial component to mitigate the decay of effective step sizes for scale-invariant weights when updating with momentum-based optimizers - ICLR 2021

Large Product Key Memory for Pretrained Language Models

Improving accuracy and speed trade-off when finetuning pretrained language models by using large product key memory and mitigating a catastrophic drift with initialization and residual memory - Findings of EMNLP 2020

Efficient Dialogue State Tracking by Selectively Overwriting Memory

Decomposition of open-vocabulary dialog state tracking to state operation prediction and slot value generation, achieving excellent joint goal accuracy with very efficient computation - ACL 2020

Subword Language Model for Query Auto-Completion

Utilization of subword language model for faster query auto-completion with a retrace algorithm and a reranking method by approximate marginalization - EMNLP-IJCNLP 2019

Mimicry Resilient Program Behavior Modeling with LSTM based Branch Models

Anomaly detection robust to mimicry attacks with language modeling of branch sequences - S&P 2018 DLS Workshop

Training IBM Watson using Automatically Generated Question-Answer Pairs

Examination on IBM Watson training with manually labeled and automatically generated question-answer pairs - HICSS 2017 (IBM Best Technology Paper Honorarium)

LSTM-Based System-Call Language Modelling and Robust Ensemble Method for Designing Host-Based Intrusion Detection System

System-call language-modeling approach for designing anomaly-based host intrusion detection systems and novel ensemble method to enhance the precision

Academic Activities

Organizing Committee

  • SustaiNLP 2022 @ EMNLP 2022, SustaiNLP 2023 @ ACL 2023

Reviewer/Program Committee

  • *CL/NLP Conferences/Workshops
    • ACL 2020/2023, EMNLP 2020/2021/2022/2023, NAACL 2021/2024, COLM 2024
    • EACL 2023/2024, COLING 2020/2022/2024, SustaiNLP 2020/2021, SUKI 2022, SoCalNLP 2022
    • ACL Rolling Review 2021-
    • NEJLT 2024-
  • ML/AI Conferences/Journals
    • ICLR 2021/2022/2023/2024/2025, NeurIPS 2016/2021/2022/2023/2024, ICML 2020/2021/2022/2023
    • AAAI 2017/2023/2024, TMLR 2022-

Volunteer

  • EMNLP 2022, ICML 2022, NAACL 2022, ACL 2021, NAACL 2021, ICLR 2021

Departmental Service

Presentations

  • Poster Presentation at SoCal NLP Symposium 2022, 18 Nov 2022
  • Guest Lecture at UNIST, Efficient Natural Language Processing, 14 Sep 2022
  • Invited Talk at LG AI Research, Reducing Sequence Length for Efficient Transformer Inference, 17 Nov 2021
  • Poster Presentation at ALPS 2021, 21 Jan 2021
  • Invited Talk at Korea University, Efficient Natural Language Processing, 27 Nov 2020
  • Lecture at DEVIEW, Efficient BERT Inference, 25 Nov 2020
  • Invited Talk at Lomin, Recent Trends in Natural Language Processing, 14 Nov 2020
  • Guest Lecture at Yonsei University, Pretrained Language Models for Natural Language Processing, 14 Oct 2020

Teaching

  • Teaching Assistant, Problem Solving with Computers II, UCSB, Fall 2024
  • Teaching Assistant, Machine Learning, UCSB, Spring 2024
  • Teaching Assistant, Machine Learning, UCSB, Winter 2024
  • Teaching Assistant, Problem Solving with Computers I, UCSB, Fall 2023
  • Teaching Assistant, Machine Learning, UCSB, Winter 2023
  • Teaching Assistant, Machine Learning, Seoul National University, Spring 2016
  • Tutor, Programming Methodology, Seoul National University, Spring 2014
  • Problem Setter, Korean Olympiad in Informatics (KOI), 2010 – 2014
  • Student Coach, Training Camp for International Olympiad in Informatics (IOI), 2010 – 2014

Research Mentor

  • Alan Wang, Westlake High School (now Undergraduate at Carnegie Mellon University), Jun 2022 – Jul 2022
  • Sandra Ravishankar, Mountain View High School (now Undergraduate at Duke University), Jun 2022 – Jul 2022
  • Soyoung Yoon, Undergraduate at KAIST, Jul 2020 – Jan 2021 (now Ph.D. student at SNU)
  • Jungsoo Park, M.S. Student at Korea University, Jul 2020 – Jan 2021 (now Ph.D. student at Georgia Tech)
  • Sungbin Kim, M.S. Student at Inha University, Feb 2020 – Feb 2021 (now at LG Uplus)
  • Tae-Hwan Jung, Undergraduate at Kyung Hee University, Dec 2019 – Jun 2020
  • Bumju Kwak, Undergraduate at Seoul National University, Apr 2019 – Aug 2019 (now at Kakao Corp.)
  • Kyungwoo Song, Ph.D. Student at KAIST, Oct 2018 – Dec 2018 (now Assistant Professor at Yonsei University)

Awards

SustaiNLP 2021 Workshop

  • Best Paper Award

ACM International Collegiate Programming Contest (ACM-ICPC)

  • Asia Daejeon Regional: Gold Prize, 2013; Special Prize, 2012; Special Prize, 2011

Korean Collegiate Mathematical Competition

  • Division 1 (for math major): Silver Prize, 2013; Bronze Prize, 2012
  • Division 2 (for non-math major): Gold Prize, 2011

Korea Olympiad in Informatics (KOI)

  • Gold Prize, 2008

Korean Mathematical Olympiad (KMO)

  • Silver Prize, 2008

Scholarship

University of California, Santa Barbara

  • GSA Conference Travel Grant, 2022
  • Academic Excellence Fellowship, 2021

Seoul National University

  • Graduate Student Scholarship, 2015 – 2016
  • Partial Scholarship, 2011 – 2014

Korean Foundation for Advanced Study (KFAS)

  • Undergraduate Scholarship, 2011 – 2013

Korea Student Aid Foundation (KOSAF)

  • National Science and Engineering Scholarship, 2012