Who am I?

Hi there, my name is Zihao Tang (唐子豪). Welcome to my website!

I am currently a student at Zhejiang University, working on my Master's Degree advised by A.P. Kun Kuang and Prof. Fei Wu. I am also engaged in an internship at MSRA.

Research Interests

I used to work in the field of Model Compression (Data-Free Knowledge Distillation, Out-of-Domain Knowledge Distillation, etc.), Domain Adaptation, and Large-Small Model Collaboration.

Currently, I am committed to LLM (Large Language Model), especially in reinforcement learning.

Distributionally Robust Optimization For Language Modeling

Backgrounds Datasets for training language models (LMs) are typically sampled from a mixture of many domains. For example, the Pile, a large publicly available dataset, is composed of 24% web data...

Jul 30, 2024 Paper Reading

Optimizing Language Models for Human Preferences is a Causal Inference Problem

Backgrounds The demerits of DPO: Need a reference model: resource consumption is doubled. Preference dataset needs annotation by human or LLM. The objective of DPO does not necessarily ali...

Jun 16, 2024 Paper Reading

Token-level Direct Preference Optimization

[ICLM'24] Finetuning pretrained LLMs is essential to align them with human values and intentions. The overall process is often done with pairwise comparisons and KL divergence against a reference ...

Jun 2, 2024 Paper Reading

SimPO: Simple Preference Optimization with a Reference-Free Reward

For background about DPO, readers are kindly referred to TDPO. Drawbacks of DPO: need a reference model the reward formulation is not directly aligned with the metric used to guide generatio...

Jun 2, 2024 Paper Reading

KL Divergence: Forward vs Reverse?

Discussions about the choices of forward / reverse KL divergence.

May 30, 2024 Technical Tips