”Unlearning”techniques aim to make generative AI models forget specific and undesirable information acquired during training,such as sensitive private data or copyrighted material.However,these techniques can significantly degrade a model’s performance,making it much less capable of answering basic questions.This finding comes from a new study co-authored by researchers from the University of Washington(UW),Princeton,the University of Chicago,USC,and Google.
According to Weijia Shi,a researcher on the study and a Ph.D.candidate in computer science at UW,current unlearning methods are not yet practical for real-world use.There are no efficient techniques that allow a model to forget specific data without a considerable loss in utility.
Generative AI models,like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B,function as statistical systems predicting words,images,speech,and other data.These models are trained on vast amounts of examples from public websites and datasets.The AI learns patterns and context from this data,making informed guesses based on the likelihood of certain data occurring.For example,given an email ending in“Looking forward…”,the model might suggest“…to hearing back,”based on the patterns it has learned.
The training data for most generative AI models comes from public sources,and developers argue that this practice falls under fair use.However,copyright holders often disagree,leading to lawsuits from authors,publishers,and record labels.
Unlearning techniques have gained attention due to these copyright issues.Google and several academic institutions launched a competition last year to create new unlearning approaches.Unlearning could also be useful for removing sensitive information,like medical records or compromising photos,from existing models in response to requests or government orders.
Current unlearning techniques rely on algorithms designed to steer models away from certain data,aiming to influence the model’s predictions so that it rarely outputs the data to be forgotten.To test the effectiveness of these algorithms,Shi and her collaborators devised a benchmark called MUSE(Machine Unlearning Six-way Evaluation).MUSE evaluates an algorithm’s ability to prevent a model from regurgitating training data and eliminate the model’s knowledge of that data.
MUSE tests whether an unlearned model can forget specific information from sources like the Harry Potter series and news articles.For example,it checks if the model can still recite a snippet from”Harry Potter and The Chamber of Secrets”or answer questions about the scene after the unlearning process.The benchmark also assesses whether the model retains related general knowledge,such as knowing that J.K.Rowling is the author of the Harry Potter series.
The study found that while unlearning algorithms could make models forget certain information,they also reduced the models’overall utility,presenting a trade-off.Shi explained that knowledge in models is intricately entangled,making effective unlearning challenging.For example,removing copyrighted Harry Potter books from a model’s training could also impact the model’s knowledge of freely available content from the Harry Potter Wiki.
Currently,there are no solutions to this problem,highlighting the need for further research in this area.
For now,vendors betting on unlearning as a solution to their training data woes appear to be out of luc.Perhaps a technical breakthrough will make unlearning feasible someday.But for the time being,vendors will have to find another way to prevent their models from saying things they shouldn’t.