Nicholas Carlini | |
Field: | Computer Security |
Work Institutions: | Google DeepMind |
Alma Mater: | University of California, Berkeley (PhD) |
Doctoral Advisor: | David Wagner |
Website: | nicholas.carlini.com |
Thesis Title: | Evaluation and Design of Robust Neural Network Defenses |
Thesis Year: | 2018 |
Nicholas Carlini is a researcher affiliated with Google DeepMind who has published research in the field of computer security and machine learning. He is known for his work on adversarial machine learning.
Nicholas Carlini obtained his Bachelor of Arts in Computer Science and Mathematics from the University of California, Berkeley in 2013.[1] He then continued his studies at the same university, where he pursued a PhD under the supervision of David Wagner, completing it in 2018.[2] [3]
Nicholas Carlini is known for his work on adversarial machine learning. In 2016, he working alongside David Wagner to develop the Carlini & Wagner attack, a method of generating adversarial examples against machine learning models. The attack was proved to be useful against defensive distillation, a popular mechanism where a student model is trained based on the features of a parent model to increase the robustness and generalizability of student models. The attack gained popularity when it was shown that the methodology was also effective against most other defences, rendering them ineffective.[4] [5] In 2018, Carlini demonstrated a attack against Mozilla Foundation's DeepSpeech model where he showed that by hiding malicious commands inside normal speech input the speech model would respond to the hidden commands even when the commands were not discernible by humans.[6] [7] In the same year, Carlini and his team at UC Berkeley showed that out of the 11 papers presenting defenses to adversarial attacks accepted in that year's ICLR conference, seven of the defenses could be broken.[8] More recently, he and his team have worked on large-language model creating a questionnaire where humans typically scored 35% whereas AI models scored in the 40 percents, with GPT-3 getting 38% which could be improved to 40% through few shot prompting. The best performer in the test was UnifiedQA a model developed by Google specifically for answer question and answer sets.[9] Recently, Carlini has also developed methods to cause large language models like ChatGPT to answer harmful questions like how to construct bombs.[10] [11]
He is also well known for his work studying the privacy of machine learning models. In 2020 he showed for the first time that large language models would memorize some text data that they were trained on. For example, he found that GPT-2 could output personally identifiable information.[12] He then lead an analysis of larger models and studied how memorization increased with model size. Then, in 2022 he showed the same vulnerability in generative image models, and specifically diffusion models, by showing that Stable Diffusion could output images of people's faces that it was trained on.[13] Following on this, Carlini then showed that ChatGPT would also sometimes output exact copies of webpages it was trained on, including personally identifiable information.[14] Several of these studies have since been referenced by the courts in debating the copyright status of AI models.[15]
Carlini received the Best of Show award at the 2020 IOCCC for implementing a tic tac toe game entirely with calls to printf, expanding on work from a research paper of his from 2015. The judges commented on his submission "This year’s Best of Show (carlini) is such a novel way of obfuscation that it would be worth of a special mention in the (future) Best of IOCCC list!"[16]