Multidimensional Data Reversion: E-Discovery Evolution—From Paper Trails to AI | Ropes & Gray LLP

Welcome to Multidimensional Data Reversion, a four-part podcast series from Ropes & Gray’s Insights Lab where data analysis intersects with the law. On this first episode, join Shannon Capone Kirk, managing principal and global head of Ropes & Gray’s advanced e-discovery and AI strategy group, and David Yanofsky, director of data insights, analytics and visualization at the R&G Insights Lab, as they delve into the evolving world of e-discovery, data governance and the impact of generative AI (“GenAI”) on legal practices. Discover how technology is transforming the way legal professionals handle electronic data and learn about best practices for managing information in today’s digital age.
Transcript:
David Yanofsky: Hello, and welcome to Multidimensional Data Reversion, a show where we are digging into where data analysis intersects with the law. I’m David Yanofsky, director of data insights, analytics and visualization in the R&G Insights Lab.
Shannon Capone Kirk: And I’m Shannon Capone Kirk, managing principal and global head of Ropes & Gray’s advanced e-discovery and AI strategy group.
David Yanofsky: Shannon, I’m so glad we get to have this conversation because I get to ask you the question that I’ve been meaning to ask you for years now, which is, “What is e-discovery?”
Shannon Capone Kirk: Great question. Thank you, David. “E-discovery” is a term where I think it’s important to walk through the history, and then, how e-discovery has become somewhat of a Hydra-headed monster to encompass pretty much everything. But let’s go back in our time machine portal to roughly around 2006. People will quibble when the beginning was, but for me, I think that the biggest change and what is e-discovery starts to emerge around 2006. If you were practicing, like when I started out in 1998—so, that is eight years before 2006—we didn’t really conduct discovery in electronic means primarily as we do today. There were times, in fact, months at a time, that I and my junior associate colleagues would be in a warehouse somewhere with paper documents, or we would be roaming halls, going through random file cabinets trying to find relevant documents for production in litigation—that was how it was. I know a lot of people know this, but if you didn’t really come up through the law starting in the 1990s through 2000, you may not know that, but it really was the case that that was the situation. Anyway, around 2006, folks start using email more, and more, and more. People are using Blackberries. I do miss those days—it was a lot simpler then. And there became this sense that, “You know what? This is a different animal.” We didn’t really have a handle then on how to extract that data, search it, and put it in a tool that you could actually review in an organized way.
At my prior firm, another partner and I were dealing with this electronic data more and more, and it seemed to be the case that all of the case teams would come to us, so we started a committee. It was the first-ever—at that firm, and, I think, at a lot of firms, actually—e-discovery committee. Also, at that exact same time, there was what’s known as a bellwether e-discovery case, Zubulake, and it’s a series of opinions, actually. I don’t know who first coined the term “e-discovery,” but with that case and then with rules changes that came out around 2008 to deal with what’s called electronically stored information (“ESI”) in the Federal Rules of Civil Procedure, it just sort of became a short-term way of referring to discovery but with an electronic component—you marry it together, and you get e-discovery.
When it started out, it was really just a way for litigators to address, “How do we deal with all these emails,” whether they were in a live system or in backup tapes. Keep in mind, too, way back then, it was rare to have cloud-based email, so we’re talking about email servers—physical boxes at clients. It then, obviously, grew exponentially to encompass all forms of electronic data and, “How do you preserve it, extract it, and review it for litigation or investigations?” Databases, cloud-based systems, you name it—really anything electronic. But then, it morphed again, and it has morphed, and it continues to morph. And now, it encompasses machine learning. Nowadays, we’re starting to employ generative artificial intelligence (“GenAI”), but it also encompasses cybersecurity issues and data breaches, and information governance. In other words, how do we handle a large client’s electronic data, whether it’s employee-based or mass database systems? How do we handle that and manage all of that before we get to litigation, before we get to an investigation? How do we control it? It’s sort of like the digital version of the show Hoarders—how do we clean house and keep order in a way that envisions the downstream cost and risk of having too much data?
David Yanofsky: It’s funny. You immediately went to hoarding. I immediately thought of librarians. What portion of companies out there are librarians, and what portion are hoarders?
Shannon Capone Kirk: I have a few examples in my mind, and they are the best examples of both ends of that spectrum. I can’t put a percentage on it, but I am thinking of a company that I’m very aware of, and they are sterile—they are probably the equivalent of the Library of Congress in terms of organization.
David Yanofsky: What makes their approach so different? What’s the culture that they have that makes them so good at that?
Shannon Capone Kirk: Culture. That’s such a great way to put it, because that’s what it is—it’s a culture. It’s a mindset that when they generate data, they think of that data’s entire life cycle, so from creation, through storage and just actual use in real time, and then, how much of that data they really do have to keep in backup for disaster recovery. And then, “What does it mean for when you get to litigation? How much is it going to cost to have to review this, and why are we keeping it around if we don’t have a duty to preserve it and we don’t have a use to save it?” That is a mindset and a culture, and it takes a lot of discipline because in most large organizations, you have sometimes hundreds of thousands of people, so it’s really got to be a core philosophy that folks take seriously. But one thing is if there’s no return on investment (“ROI”) on it, it tends to get pushed to the side. There’s not much ROI in cleaning house and staying on top of cleaning your digital house. That ROI is only ever recognized when you get into massive litigation or an investigation, where it’s now costing you millions of dollars to collect and review that information. Some organizations have gone through that pain point, and they get it, and they start to change culture, but we’ve had a couple of clients that start out that way, and maybe that’s because they have leaders that have come from other organizations where it went wrong.
David Yanofsky: I want to follow a bit of my own personal curiosity here. You mentioned that some companies, even when they keep their data in the most organized, Library of Congress-like fashion, say, “Why are we keeping some of this stuff around?” Has that conversation changed now as companies start thinking about large language models and generative AI in how long they want to keep information that could be used to generate custom AI models for their business?
Shannon Capone Kirk: Depends on whom you’re talking to. If you’re talking to research and development (“R&D”), if you’re talking to folks with more of a creative task on hand—creating something or manufacturing something—or folks in sales who want to have a competitive edge, they want the most data they can have. They want the most information they can have, especially with large language models (“LLMs”). If you’re talking with legal or compliance, it’s a different story LLMs—it’s, “How long do we have to keep X, Y, Z information, and where, and who has access?” So, you have both tensions, but it is definitely more stark now, and I don’t know that that’s a negative thing, actually. What for me is a positive thing about LLMs coming on the scene is highlighting that tension that’s been there for as long as I’ve been practicing. But it’s happening earlier, which is helpful because that conversation almost always in the past happened after a system was implemented or after data had been retained too long, whereas now, legal and compliance are being brought in earlier to think about all of the risks around GenAI and LLMs being able to have access to company data from the jump.
David Yanofsky: I want to get back to the beginning a little bit here. It sounds like e-discovery is, from its original incarnation, sort of the 2006 version. The 2006 version of e-discovery is now just what we call “discovery.” Now, all of our documents are natively digital. All of our communications are almost all natively digital—even the phone records, the metadata may be captured, or all of our phone calls may be recorded, depending on the companies that we work at and the requirements on them. So, what’s the difference today when we talk about e-discovery versus discovery?
Shannon Capone Kirk: Believe it or not, this can be somewhat of a hot-button or dramatic question and answer. There’s actually a lot of drama in e-discovery if you wanted to step into our little world—it’s like with any micro-micro-microcosm, there’s always drama. I would say, for me, I don’t think there is a difference. Go and read your local bar rules and requirements, especially your local courts, which almost all require litigators to have a basic understanding of electronic data and what it means for your clients. So, for me, e-discovery is discovery. As an e-discovery professional, I also have to deal with paper discovery. When we have complex litigation—and we have this a lot, actually— going back decades, the hardest part of that is dealing with the paper documents.
David Yanofsky: So much larger scope of information that’s coming in through discovery now combined with the technology that now exists to parse, to search, to filter that information. The time it takes to review a single document is probably an order of magnitude less than it used to be, but the amount of data, information, documents that you’re getting could be an order of magnitude greater than it used to be.
Shannon Capone Kirk: Correct.
David Yanofsky: What are the things that still take a lot of time, versus what are the things that used to take a lot of time and no longer do?
Shannon Capone Kirk: What has technology done to make a review of large bodies of data faster, and how? Let’s go back in our time machine—we would collect a bunch of emails, and there was really no order to them in terms of review other than you’d run search terms. Back in the day, without technology, you really had no way to organize them other than, “These are the documents that hit on this term, or this term, or this term.” And that was mainly it. We once had a case that took several months to just negotiate the search terms. In today’s time frame, if we essentially fast-track our way through that process and use technology, we’re already done with the review and production in the amount of time that it used to take to just negotiate search terms. So, we’re back in our time machine— we’ve negotiated search terms, and we have, let’s say, 500,000 documents. We would then hire, typically, an army of contract attorneys because most clients don’t want to pay firm rates to have somebody churn through 500,000 documents. We’re talking, on average, anywhere from 30 to 50 documents an hour—it’s a lot of time. And if you have to review, one by one, 500,000 documents, well, you need a whole army of contract attorneys to do what’s called “first-level review.” Then, you’ve got to have time for QC of that by the firm attorneys, you’ve got to build in time for privilege review, and you’ve got to build in time for preproduction QC.
It was a major endeavor parsing out about 100 documents a batch to contract attorneys in no particular order at all. Sally is looking at a set of 100 documents, and Joe is looking at a set of 100 documents, and there’s no efficiency in that whatsoever where maybe they’re looking at the same chain, but Joe’s is slightly different. So, you have no efficiency, number one, and you definitely have inconsistencies between Sally and Joe. Imagine how many inconsistencies you’re going to have between Sally and Joe over 500,000 documents, then you’ve got to QC it. That is why it cost a lot of money and took a lot of time. Then, we started to get what lawyers called “technology-assisted review”—sometimes, it’s called “predictive coding.” Sometimes, people, I believe inaccurately, called it “AI.” The best analogy I can come up with is Pandora, where, when you listen to music, you are coding Pandora’s machine learning algorithm. You’re saying, “I, thumbs up, love this song by Ray LaMontagne.” And then, you give a thumbs down, “I absolutely hate this song by The Pretenders.” You’re training the algorithm. You, the human, are telling the algorithm, “I love Ray LaMontagne. That’s how it works with machine learning. So, instead of just doling out random piles of mishmash documents to Sally and Joe contract attorneys with search term hits, you’re giving Sally and Joe what we call the “highest-scored documents. In other words, these are the songs in my analogy—these are the documents that the human subject matter expert is saying are most likely going to be the songs, i.e., the documents that you, Sally and Joe, are going to like or find responsive. So, you can see how much more efficiency and consistency you have in that scenario.
David Yanofsky: If you’re a founder, you’re starting up your company and you think that as you go into the world, you are going to get sued, you are going to get investigated, and you want to make sure that your rocketship doesn’t have to slow down to accommodate those speed bumps, what are you doing? How are you setting up your systems?
Shannon Capone Kirk: With the exponential growth of GenAI, which is just going to proliferate beyond our imaginations in 2025 and beyond, if I’m starting a company now, I am going to stop with the whole, “We want to make sure that our employees feel comfortable with their technology and, therefore, they’re allowed to buy their own devices, use their own devices, and bring your own devices (“BYOD”).” And I would stop with the notion that IT shouldn’t have a good budget. I don’t think it’s helpful to an organization in this day and age to be thinking in terms of cutting IT spend at all. I think it is critical that we allow for there to be enterprise-wide budgeting and enterprise-wide attention to information governance, because it is not just an IT issue, and I’ve seen things go the wrong way when we don’t view data governance as an enterprise issue. You see it in mobile devices. You see it in laptops. You see it in cloud services. I would have as a core principle of my organization that information governance is a core need of the business, and I would make sure that I had control of my data. It’s not just from the risk of litigation and the cost of litigation and investigations. It’s cybersecurity. It’s ensuring data privacy. It’s ensuring compliance with contractual obligations if you’re contracting with third parties and have access to their data.
David Yanofsky: A two-phone future?
Shannon Capone Kirk: Well, I just spent the last two years helping over 200 clients deal with the issues around the SEC sweep and the phones, and so, perhaps you’re hitting me at a time when I’ve just gone through that. That doesn’t impact all industries or all companies, and there’s a spectrum of solutions to this. I’m not saying there’s no validity to BYOD programs across the board—so nobody quote me as saying that—but I am saying that if I started a company, I would be thinking about that and whether it was appropriate for my organization from day one.
David Yanofsky: Amazing. Well, that’s going to be it for Multidimensional Data Reversion for today. On our next episode, we will be talking about data analytics, so be sure to subscribe wherever you get your podcasts. I’m David Yanofsky.
Shannon Capone Kirk: And I’m Shannon Capone Kirk. Thank you for listening.