PEFT-RLHF

ML

Overview

About the project

Finetuned the FLAN-T5 model for dialogue summarization

Date
March 20, 2024
My Role
Developer

Low Rank Adaptation Representation Reparamatization (LoRA) and Reinforcement Learning with Human Feedback (RLHF)

The Low-Rank Adaptation of Large Models (LoRA)1 is a parameter efficient fine-tuning method as it freezes the original LLM parameters, injects a pair of low rank decomposition matrices. The dimensions of the matrices are set so that their product is the same dimensions of the original LLM. This smaller matrices is updated during training. During Inference, the low rank matrices are multiplied and added to the original LLM weights.

LoRA.png

LoRA process

In the project, I used the LoRA method to fine tune the FLAN-T5 model2 for dialouge summarization. Using the DialogSum dataset, the fine-tuned model achieved improvement in all the ROUGE score metrics category.

|   Metric   | Percentage Improvement |
|:-----------:|:-----------------------:|
|   Rouge1    |          17.5%          |
|   Rouge2    |           8.7%          |
|   RougeL    |          12.4%          |
| RougeLsum |          12.3%          |

Utilizing the transformers TRL library, I fine-tuned the model to detoxify summaries. Using the Proximal Policy Optimization4, in addition with KL-Divergence to ensure the updated policy does not deviate far from the original policy. I made use of META's AI RoBERTa based hate speech model5.

RL-workflow.png
RL process

The RL-finetuned model achieved a 54% average increment in non-toxic score over the baseline/peft model.

View source code on Github

Reference

  1. LoRA: Low-Rank Adaptation of Large Language Models, J. Hu et al.
  2. google/flan-t5-base from 🤗
  3. Transformers Reinforcement Learning
  4. Proximal Policy Optimizations Algorithms, Schulman et al
  5. facebook/roberta-hate-speech from 🤗
  6. Generative AI with LLMs