Dongha Lee

Assistant Professor
Data & Language Intelligence Lab. @ Yonsei
Department of Aritifial Intelligence
Yonsei University

Adjunct Professor
Graduate School of Aritifial Intelligence
POSTECH

Biography

I am an assistant professor in the Department of Artificial Intelligence at Yonsei University. I received my Ph.D. in Computer Science from POSTECH where I was advised by Prof. Hwanjo Yu. During my Ph.D., I was fortunate to work as a visiting scholar at UT Health, under the supervision of Prof. Xiaoqian Jiang. After completing my Ph.D., I worked as a postdoctoral research fellow at the UIUC with my advisor, Prof. Jiawei Han.

Research Interest

LLMs with Parametric and Non-Parametric Knowledge
Information Retrieval & Recommender Systems
Data Intelligence for Real-World Applications

Publications

Hippo-Video: Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting
Jeongeun Lee, Youngjae Yu, Dongha Lee
COLM, 2025
paper / code

Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval
Sangam Lee, Ryang Heo, SeongKu Kang, Dongha Lee
COLM, 2025
paper / code

LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models
{Jieyong Kim, Tongyoung Kim}, Soojin Yoon, Jaehyung Kim, Dongha Lee
Preprint (arXiv), 2025
paper / code

MT-RAIG: Novel Benchmark and Evaluation Framework for Retrieval-Augmented Insight Generation over Multiple Tables
{Kwangwook Seo, Donguk Kwon}, Dongha Lee
ACL, 2025
paper / code

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
{Sunghwan Kim, Dongjin Kang}, Taeyoon Kwon, Hyungjoo Chae, Dongha Lee, Jinyoung Yeo
ACL, 2025
paper / code

Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation
{Jieyong Kim, Hyunseo Kim}, Hyunjin Cho, SeongKu Kang, Buru Chang, Jinyoung Yeo, Dongha Lee
SIGIR, 2025
paper / code

Can Large Language Models be Effective Online Opinion Miners?
Ryang Heo, Yongsik Seo, Junseong Lee, Dongha Lee
Preprint (arXiv), 2025
paper / code

Towards Personalized Conversational Sales Agents with Contextual User Profiling for Strategic Action
{Tongyoung Kim, Jeongeun Lee}, Soojin Yoon, Seonghwan Kim, Dongha Lee
Preprint (arXiv), 2025
paper / code

How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee, Heejae Chon, Joonwon Jang, Dongha Lee*, Hwanjo Yu*
Preprint (arXiv), 2025
paper / code

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, Jinyoung Yeo
ICLR, 2025
paper / code

Towards Lifelong Dialogue Agents via Relation-aware Memory Construction and Timeline-augmented Response Generation
{Kai Tzu-iunn Ong, Namyoung Kim}, Minju Gwak, Hyungjoo Chae, Taeyoon Kwon, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo
NAACL, 2025
paper / code

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu
NAACL Findings, 2025
paper / code

Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts
Soojin Yoon, Sungho Ko, Tongyoung Kim, SeongKu Kang, Jinyoung Yeo, Dongha Lee
WSDM, 2025
paper / code

Improving Scientific Document Retrieval with Concept Coverage-based Query Set Generation
SeongKu Kang, Bowen Jin, Wonbin Kweon, Yu Zhang, Dongha Lee, Jiawei Han, Hwanjo Yu
WSDM, 2025
paper / code

Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems
Sunghwan Kim, Tongyoung Kim, Kwangwook Seo, Jinyoung Yeo, Dongha Lee
Preprint (arXiv), 2024
paper / code

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths
Sangam Lee, Ryang Heo, SeongKu Kang, Susik Yoon, Jinyoung Yeo, Dongha Lee
Preprint (arXiv), 2024
paper / code

Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching
Seoyeon Kim, Huiseo Kim, Chanjun Park, Jinyoung Yeo, Dongha Lee
Preprint (arXiv), 2024
paper / code

Bag of Tricks for Diabetic Retinopathy and Diabetic Macular Edema Classification in Ultra-Widefield Imaging
Hyeonmin Kim, Chanyang Seo, Wonyoung Seo, Yunnie Cho, Ohhyun Kwon, Dongha Lee
MICCAI Challenge: Ultra-Widefield Fundus Imaging for Diabetic Retinopathy (UWF4DR) , 2024 **First Place Award**
paper / code

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
Yeongbin Seo, Dongha Lee, Jinyoung Yeo
NeurIPS, 2024
paper / code

Taxonomy-guided Semantic Indexing for Academic Paper Search
SeongKu Kang, Yunyi Zhang, Pengcheng Jiang, Dongha Lee, Jiawei Han, Hwanjo Yu
EMNLP (Oral), 2024
paper / code

Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering
{Sungho Ko, Hyunjin Cho}, Hyungjoo Chae, Jinyoung Yeo, Dongha Lee
EMNLP (Oral), 2024
paper / code

Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization
Kwangwook Seo, Jinyoung Yeo, Dongha Lee
EMNLP Findings, 2024
paper / code

Make Compound Sentences Simple to Analyze: Learning to Split Sentences for Aspect-based Sentiment Analysis
{Yongsik Seo, Sungwon Song, Ryang Heo}, Jieyong Kim, Dongha Lee
EMNLP Findings, 2024
paper / code

CACTUS: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
{Suyeon Lee, Sunghwan Kim, Minju Kim}, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo
EMNLP Findings, 2024
paper / code

Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation
Seonghyeon Lee, Suyeon Kim, Joonwon Jang, HeeJae Chon, Dongha Lee, Hwanjo Yu
EMNLP Findings, 2024
paper / code

SC-Rec: Enhancing Generative Retrieval with Self-Consistent Reranking for Sequential Recommendation
Tongyoung Kim, Soojin Yoon, Seongku Kang, Jinyoung Yeo, Dongha Lee
Preprint (arXiv), 2024
paper / code

Graph Signal Processing for Cross-Domain Recommendation
Jeongeun Lee, SeongKu Kang, Won-Yong Shin, Jeongwhan Choi, Noseong Park, Dongha Lee
Preprint (arXiv), 2024
paper / code

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
{Dongjin Kang, Sunghwan Kim}, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo
ACL, 2024 **Outstanding Paper**
paper / code

VERIFINER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models
{Seoyeon Kim, Kwangwook Seo}, Hyungjoo Chae, Jinyoung Yeo, Dongha Lee
ACL, 2024
paper / code

PEARL: A Review-driven Persona-Knowledge grounded Conversational Recommendation Dataset
{Minjin Kim, Minju Kim}, Hana Kim, Beong-woo Kwak, Soyeon Jeon, Hyunseo Kim, SeongKu Kang, Youngjae Yu, Jinyoung Yeo, Dongha Lee
ACL Findings, 2024
paper / code / dataset

Self-Consistent Reasoning-based Aspect-Sentiment Quad Prediction with Extract-Then-Assign Strategy
{Jieyong Kim, Ryang Heo}, Yongsik Seo, SeongKu Kang, Jinyoung Yeo, Dongha Lee
ACL Findings, 2024
paper / code

Exploring Language Model’s Code Generation Ability with Auxiliary Functions
Seonghyeon Lee, Sanghwan Jang, Seongbo Jang, Dongha Lee, Hwanjo Yu
NAACL Findings, 2024
paper / code

RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
Seonglae Cho, Myungha Jang, Jinyoung Yeo, Dongha Lee
NAACL Demo, 2024
paper / code

Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim, Dongha Lee*, SeongKu Kang, Sukang Chae, Sanghwan Jang, Hwanjo Yu*
CVPR, 2024
paper / code

Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems
SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, Hwanjo Yu
ACM Transactions on Recommender Systems, 2024
paper / code

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy
SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, Jiawei Han
WWW, 2024
paper / code

Evidentiality-Aware Retrieval for Overcoming Abstractiveness in Open-Domain Question Answering
Yongho Song, Dahyun Lee, Myungha Jang, Seung-won Hwang, Kyungjae Lee, Dongha Lee, Jinyoung Yeo
EACL Findings, 2024
paper / code

Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement
Hana Kim, Kai Tzu-iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo
EACL, 2024
paper / code

Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Beomseok Sohn, Yongsik Sim, Dongha Lee, Jinyoung Yeo
AAAI, 2024
paper / code

Multi-Domain Recommendation to Attract Users via Domain Preference Modeling
Hyunjun Ju, SeongKu Kang, Dongha Lee, Junyoung Hwang, Sanghwan Jang, Hwanjo Yu
AAAI, 2024
paper / code

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo
EMNLP, 2023
paper / code

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding
Susik Yoon, Dongha Lee, Yunyi Zhang, Jiawei Han
SIGIR, 2023
paper / code

SCStory: Self-supervised and Continual Online Story Discovery
Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han
WWW, 2023
paper / code

Distillation from Heterogeneous Models for Top-K Recommendation
SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, Hwanjo Yu
WWW, 2023
paper / code

Topology-Specific Experts for Molecular Property Prediction
Suyeon Kim, Dongha Lee, SeongKu Kang, Seonghyeon Lee, Hwanjo Yu
AAAI, 2023
paper / code

Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation
Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu, Jiawei Han
EMNLP Findings, 2022
paper / code

Mitigating Viewpoint Sensitivity of Self-supervised One-class Classifiers
Hyunjun Ju, Dongha Lee, SeongKu Kang, Hwanjo Yu
Information Sciences, 2022
paper / code

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
Seonghyeon Lee, Dongha Lee, Seongbo Jang, Hwanjo Yu
ACL, 2022
paper / code

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters
Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu
WWW, 2022
paper / code

Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering
SeongKu Kang, Dongha Lee, Wonbin Kweon, Junyoung Hwang, Hwanjo Yu
WWW, 2022
paper / code

Personalized Knowledge Distillation for Recommender System
SeongKu Kang, Dongha Lee, Wonbin Kweon, Hwanjo Yu
Knowledge-Based Systems, 2022
paper / code

Out-of-Category Document Identification Using Target-Category Names as Weak Supervision
Dongha Lee, Dongmin Hyun, Jiawei Han, Hwanjo Yu
ICDM, 2021
paper / code

Learnable Structural Semantic Readout for Graph Classification
Dongha Lee, Suyeon Kim, Seonghyeon Lee, Chanyoung Park, Hwanjo Yu
ICDM, 2021
paper / code

Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
Dongha Lee, Sehun Yu, Hyunjun Ju, Hwanjo Yu
ICCV, 2021
paper / code

Out-of-manifold Regularization in Contextual Embedding Space for Text Classification
Seonghyeon Lee, Dongha Lee, Hwanjo Yu
ACL, 2021
paper / code

Bootstrapping User and Item Representations for One-Class Collaborative Filtering
Dongha Lee, SeongKu Kang, Hyunjun Ju, Chanyoung Park, Hwanjo Yu
SIGIR, 2021
paper / code

Learnable Dynamic Temporal Pooling for Time Series Classification
Dongha Lee, Seonghyeon Lee, Hwanjo Yu
AAAI, 2021
paper / code

Multi-class Data Description for Out-of-distribution Detection
Dongha Lee, Sehun Yu, Hwanjo Yu
KDD, 2020
paper / code

Generating Sequential Electronic Health Records using Dual Adversarial Autoencoder
Dongha Lee, Hwanjo Yu, Xiaoqian Jiang, Deevakar Rogith, Meghana Gudala, Mubeen Tejani, Qiuchen Zhang, Li Xiong
Journal of the American Medical Informatics Association (JAMIA), 2020
paper / code

Harmonized Representation Learning on Dynamic EHR Graphs
Dongha Lee, Xiaoqian Jiang, Hwanjo Yu
Journal of Biomedical Informatics, 2020
paper / code

Convolutional Neural Networks with Compression Complexity Pooling for Out-of-Distribution Image Detection
Sehun Yu, Dongha Lee, Hwanjo Yu
IJCAI, 2020
paper / code

PUMAD: PU Metric Learning for Anomaly Detection
Hyunjun Ju, Dongha Lee, Junyoung Hwang, Junghyun Namkung, Hwanjo Yu
Information Sciences, 2020
paper / code

Scalable Disk-based Topic Modeling for Memory Limited Devices
Byungju Kim, Dongha Lee, Jinoh Oh, Hwanjo Yu
Information Sciences, 2020
paper / code

OCAM: Out-of-core Coordinate Descent Algorithm for Matrix Completion
Dongha Lee, Jinoh Oh, Hwanjo Yu
Information Sciences, 2020
paper / code / webpage

Large-Scale Matrix and Tensor Completion based on Out-of-Core Approaches
Dongha Lee
Ph.D. Dissertation, 2020
paper

Semi-Supervised Learning for Cross-Domain Recommendation to Cold-Start Users
SeongKu Kang, Junyoung Hwang, Dongha Lee, Hwanjo Yu
CIKM, 2019
paper

Action Space Learning for Heterogeneous User Behavior Prediction
Dongha Lee, Chanyoung Park, Hyunjun Ju, Junyoung Hwang, Hwanjo Yu
IJCAI, 2019
paper / code

Fast Tucker Factorization for Large-scale Tensor Completion
Dongha Lee, Jaehyung Lee, Hwanjo Yu
ICDM, 2018
paper / code / webpage

Disk-based Matrix Completion for Memory Limited Devices
Dongha Lee, Jinoh Oh, Christos Faloutsos, Byungju Kim, Hwanjo Yu
CIKM, 2018
paper / webpage

DualSentiNet: Dual Prediction of Word and Document Sentiments Using Shared Word Embedding
Dongha Lee, Hyunjun Ju, Jung-Mi Park, Kye-Yoon Kim, Hwanjo Yu
IMCOM, 2018
paper

Compressing Model for Matrix Factorization with Quantization Using k-means Clustering
Junsu Cho, Dongha Lee, Hwanjo Yu
KDBC, 2017

GeoVideoIndex: Indexing for Georeferenced Videos
Dongha Lee, Jinoh Oh, Woong-Kee Loh, Hwanjo Yu
Information Sciences, 2016
paper

Teaching

Undergraduate

[CSI3106] Software Engineering, 2023F
[AAI3120] Machine Learning, 2023S, 2024S, 2025S
[AIC2110] Introduction to Data Science, 2024S, 2025S

Graduate

[AAI5009] Recommender Systems and Information Filtering, 2023S
[AAI5013] Advanced Data Mining, 2023F, 2024F

Work Experience

University of Illinois at Urbana-Champaign (UIUC), United States
Postdoctoral Research Fellow, 2021.07 - 2023.02
Department of Computer Science
Advisor: Prof. Jiawei Han

Pohang University of Science and Technology (POSTECH), South Korea
Postdoctoral Researcher, 2020.03 - 2021.06
Department of Computer Science and Engineering
Advisor: Prof. Hwanjo Yu

University of Texas Health Science Center at Houston (UT Health), United States
Visiting Scholar, 2018.09 - 2019.02
School of Biomedical Informatics
Advisor: Prof. Xiaoqian Jiang

Education

Pohang University of Science and Technology (POSTECH), South Korea
Ph.D. in Computer Science and Engineering, 2015.03 - 2020.02
Large-scale Matrix and Tensor Completion based on Out-of-core Approaches
Advisor: Prof. Hwanjo Yu

Technical University of Berlin (TU Berlin), Germany
B.S. in Computer Science, 2013.10 - 2014.02
Exchange Student

Pohang University of Science and Technology (POSTECH), South Korea
B.S. in Computer Science and Enginnering, 2011.03 - 2015.02
Summa Cum Laude (Ranked 1st in the Department)

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.