Machines that Think Like Lawyers: Issues, Methods, and Illustrations from Privacy Policies

Series

ACLE Law & Economics Seminars
Speaker(s)

Florencia Marotta-Wurgler (NYU, United States)
Field

Organizations and Markets

Location

University of Amsterdam, Roeterseiland campus building A, room A3.01
Amsterdam
Date and time

January 09, 2024
13:00 - 14:15

Abstract

Privacy policies govern firms’ collection, use, sharing, and security of personal information of consumers. These rich and complex legal documents reflect contractual terms as well as mandated disclosures and compliance with data protection regimes such as the European Union’s GDPR and California’s CCPA. Privacy policies tend to be detailed and lengthy making it difficult for lay consumers to understand the terms and for regulators to police firm behavior. Our project joins recent efforts to classify the terms in privacy policies and automate their analysis using machine learning.

Machine learning relies on human-coded examples to train, adjust, and test the capabilities of AIs. Until very recently, AIs ability to process large, unstructured texts were very limited; datasets designed for legal tech reflect this by focusing on short phrases and simple legal concepts. In the past year, AI technology has made breathtaking strides in its ability to process text, largely using AI systems containing large language models (LLMs). To date, contract interpretation using LLMs has relied on untested AIs trained on generic, non-legal datasets.

Our paper makes three contributions. First, it introduces an approach and toolset for labeling privacy policies capable of generating datasets tailored for training and testing this new class of higher-capability AIs’ ability to process legal documents. We employ a granular coding approach that aims to capture the nuances inherent in contracts and to map coded terms against relevant legal benchmarks across the U.S. and the E.U. This hand-coded data set could be used as a benchmark against which to measure machine-generated coding. Second, it aims to demonstrate how a dataset generated using our approach can be used to test and modify LLMs. We offer some preliminary results in the case of privacy policies, where we “tune” LLMs to label key aspects of privacy policies and automate our coding process in a way that is more consistent with legal practice. Third, we make our data and tools publicly available for others to use and extend.

Find all information about the seminar and registration on the website of the University of Amsterdam.

Read about the speaker here.