Publications

(2024). Detecting Training Data of Large Language Models via Expectation Maximization.

Preprint Code

(2023). Towards Standardizing Korean Grammatical Error Correction: Datasets and Annotation.

Preprint Code

(2022). Bridging the Training-Inference Gap for Dense Phrase Retrieval. Findings of EMNLP 2022.

Preprint Poster Slides Video

(2021). Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length. ENLSP NeurIPS Workshop 2021.

Preprint Code Poster

(2021). NASCUP: Nucleic Acid Sequence Classification by Universal Probability.

Preprint Code Dataset

(2021). SSMix: Saliency-based Span Mixup for Text Classification. Findings of ACL 2021.

Preprint Code

(2020). AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights.

Preprint Code Slides

(2019). Efficient Dialogue State Tracking by Selectively Overwriting Memory.

Preprint Code Video

(2018). Mimicry Resilient Program Behavior Modeling with LSTM based Branch Models.

Preprint

(2016). Training IBM Watson using Automatically Generated Question-Answer Pairs.

Preprint