If you're looking for a quick way to become overwhelmed and discouraged, google "ediscovery" or "ediscovery resources."
You'll quickly be drowning in long-winded blog posts and whitepapers full of alphabet-soup acronyms, technical language, and legalese. Mainly, it's software or service providers whose primary intention is to increase their google search result rankings; helping prospects or clients is secondary.
I am not an attorney; I'm an MBA that was recruited into the eDiscovery industry to help scale a services business and launch a software business. The kicker? My first day was literally the day Indiana shut down due to COVID. So my industry training and onboarding consisted of reading articles and blog posts I found on google, listening to podcasts, watching Youtube videos, downloading whitepapers, and consuming every digital resource I could find. I literally spent my first 30 days in this industry doing nothing other than consuming as much digital content as I could (I continue to read industry news or analysis at least 30 minutes/day). There are some really incredible resources out there, and I'm incredibly grateful to the many, many people in the industry who took time to speak with me over Teams and Zoom as I was learning.
My goal here is to compile a plain-language, no-nonsense resource that is accessible to business and IT leaders (who are key stakeholders in eDiscovery), as well as paralegals and attorneys who are not as well-versed in eDiscovery. To that end, you'll see a fair amount of bullet points and lists instead of long-winded paragraphs. If you're looking for traditional legal-style writing, you've come to the wrong place.
By no means is this an exhaustive resource. Dense textbooks have been written about this topic. But reading this will give you enough information to be conversant with practitioners, and can guide you to dig into other areas you may need to learn more about. This resource will be updated and expanded over time, so bookmark it and check back regularly. Comments, suggestions, and constructive criticism is welcome.
Table of Contents
- What is eDiscovery?
- Why does eDiscovery matter to me?
- How big of a market is eDiscovery? Who are the key providers?
- How do the economics of the industry work?
- What is the EDRM?
- When is a duty to preserve required?
- What are the roles and responsibilities of counsel?
- What does the FRCP say about eDiscovery?
- New factors in document review
- Foundational case law
What is eDiscovery?
Most commonly, eDiscovery is the process by which parties to litigation or a regulatory investigation determine what data is relevant (and must be produced to opposing counsel) and what information is irrelevant or protected by attorney-client privilege and does NOT need to be produced to opposing counsel.
While litigation and regulatory investigations predominate, other valuable use cases include:
- Data breach response
- M&A due diligence
- Complex bankruptcy proceedings
- Internal investigations
Why Does eDiscovery Matter to Me?
You certainly don't need to be an expert, but you should understand some core concepts.
Discovery accounts for 70% of the cost of litigation, so it has the potential to blow a massive hole in your legal department's budget.
Plus, if your organization is proactive about data mapping and implements defensible data destruction plans, you can:
- Drive down ongoing data hosting costs
- Dramatically reduce downstream discovery costs (especially on major cost drivers like document review and data hosting)
- Reduce dark data that offers bad actors entry points into your network
Look, I'm an MBA, not a JD, I'll be candid with you: disciplined data management and eDiscovery isn't sexy and it won't immediately generate massive returns for your shareholders, but it can absolutely become a very expensive nightmare if you aren't careful. This is especially true considering the Cambrian explosion of data creation (emails, tweets, Slack, Teams, Zoom, etc.).
Lucky you, at the nexus of business leaders who want to maximize revenue and the legal department who wants to minimize risk.
You are responsible with the architecture and maintenance of the policies set forth by the organization, including data preservation, destruction, and legal holds. The hardware and software controls are your responsibility, as is assisting the legal team when data collection is necessary. Your ability to troubleshoot and make proactive suggestions can help avoid catastrophe when it's too late to adjust course.
An attorney lacking the required competence for the eDiscovery issues in the case at issue has three options:
- acquire sufficient learning and skill before performance is required;
- associate with or consult technical consultants or competent counsel; or
- decline the client representation
Committee on Professional Responsibility and Conduct Formal Opinion No. 2015-193, California Bar
While the requirements and standards of eDiscovery may be relatively new to the legal profession, an attorney's core ethical duty of competence remains constant. The Federal Rules of Civil Procedure also contains extensive sections with respect to eDiscovery.
How Big is the eDiscovery Market? Who Are the Key Providers?
Globally, eDiscovery is projected to be a $25 billion per year industry by 2025. That revenue is a combination of software and services.
Image source: #GreatExpectations, Part V: Cloudy with a Chance of Digital Disruption (220), Jae Um, Legal Evolution.
Software performs functions such as collecting ESI (electronically stored information) from disparate data sources, processing ESI (that is, converting file types so that all information can be reviewed in one platform), and document review.
Document review is typically the most expensive component of eDiscovery, so it was the most problem many early entrepreneurs aimed to solve. Key players (in alphabetical order) include:
Relativity is the 800 lb gorilla that has spawned an entire cottage industry. Channel partners host the on-prem version of their software and commonly layer in professional services to complement the data hosting; developers customize the platform and license their technology through channel partners or directly to law firms, corporations, and government entities.
Examples of other software providers include categories like:
- Information Governance: Exterro, Onna
- Legal holds: PageFreezer, Zapproved
- Translation: RWS, Divergent Language Solutions
As these companies have grown and taken on more investment capital, many of them have expanded to provide other services, but these generally represent the areas in which they first earned success.
Neither of these lists is exhaustive by any stretch of the imagination, but they are examples of names you'll hear time and again.
Private equity has made a huge push in eDiscovery for the last 3-4 years and consolidation is happening quickly. Previously, services were highly specialized and highly localized; now, there are a handful of companies growing quickly through merger & acquisition, with many attempting to build an end-to-end solution capable of handling entire matters internally.
Traditionally, discovery was part and parcel of a law firm's engagement. Young associates would spend time in corporate conference rooms or in a windowless warehouse by the airport poring through bankers boxes stuffed full of paper documents. As eDiscovery evolved in the 1990s through the 2010s, it fueled the rise of ALSPs, that is, Alternative Legal Service Providers. While law firms initially viewed (and some still do) ALSPs as competitors, the market is shifting toward strategic alliances; ALSPs can handle the work that requires higher technical proficiency and commoditized, process-based operations, while law firms can focus on more strategic legal counsel.
ALSPs come in various flavors, including captive (typically founded by and associated with a large law firm; initially formed to serve practice groups, but sometimes perform business development directly with corporate law departments or government agencies) and independent ALSPs (independent businesses who service law firms, corporate legal teams, and government agencies).
Services arrangements may be project-specific or in more of a managed services model, and vendors perform include each phase of the EDRM, which you can read about below.
I won't bother with a list of names here because consolidation and rebranding is happening so quickly I'd be making updates almost every other day.
How Do the Economics of the Industry Work?
The billable hour model takes a lot of bullets, but if it has one thing going for it, it's easy to understand. Hourly rate multiplied by hours worked equals the invoice (I know I'm oversimplifying a bit here, but you get the drift).
eDiscovery economics are anything but straightforward.
First, there's frequently a distinction between the users of eDiscovery and those paying for the software and services. Anytime these parties are different, paranoia and confusion are easily sewn. After all, the pressure to be frugal diminishes when someone else is footing the bill, right?
To a certain degree, that's a truth embedded in human nature. However, many eDiscovery providers in both the software and services spaces are actually doing a pretty impressive job of connecting the value of their tool to the pricing model.
Common formats invoices may take include:
- Billable hour (frequently seen in consulting engagements and document review projects)
- Per document charge (alternative during document review projects, gaining traction among software providers)
- Fee per GB per month (traditional hosting model, but in the process of evolving)
- Flat fee (may be applied in various instances, e.g. defensible destruction policy engagement)
Often, these services are assessed to a law firm, who in turn passes them through to their client, usually without a markup. In other instances, corporate legal teams have direct relationships with providers and negotiate directly.
Insourcing some aspects of eDiscovery work has gained traction as among in-house teams as the sources of data has exploded (e.g. collaboration platforms like Teams, Zoom, Slack, etc.) and software has made it easier for them to create workflows, monitor progress, etc. But many of the more technically complex or laborious pieces (e.g. data collections and large review projects) are still outsourced to ALSPs.
What is the EDRM?
The EDRM may refer to two things:
- A conceptual model that was first released in 2005; it's the foundational model for the services and processes that collectively comprise "eDiscovery"
- There is actually an incorporated entity also called EDRM. Its revenue comes from law firm and corporate sponsors who are then able to distribute content (whitepapers, podcasts, webinars, etc.) via EDRM's large email and social media reach.
Image source: EDRM.net
The steps of the EDRM are as follows:
- Information Governance: proactively managing your ESI can significantly reduce downstream discovery costs. Addressing things like data mapping and defensible destruction policies can massively reduce major cost drivers like document review and hosting costs.
- Identify: where does potentially relevant data live? Who has access to it?
- Preserve: legal hold; removal of any automated document destruction
- Collect: forensically-sound methods preserve metadata (which may be key to subsequent discovery disputes)
- Process: reducing the ESI and converting file types, if necessary, so that all information can be reviewed in one piece of software
- Review: what is relevant? what is protected by privilege?
- Analyze: of the relevant, non-privileged materials, what narrative emerges? Content, context, patterns, etc.
- Produce: these documents are presented to both parties counsel in native or non-native format
- Present: data is used during depositions, hearings, trials, etc.
When is a Duty to Preserve Required? What Is Involved?
"Before a party can be sanctioned or held liable for failing to preserve evidence, the party must have been under a duty to preserve the evidence."
Reinbold v. Harris, 2000 U.S. Dist. LEXIS 16643, *3-4 (S.D. Ind. Nov. 7, 2000).
So, when does the duty arise, what is counsel’s responsibility and what must be preserved? That question is best answered by another citation:
"Whenever litigation is reasonably anticipated, threatened or pending against an organization, that organization has a duty to undertake reasonable and good faith actions to preserve relevant and discoverable information. This duty arises at the point in time when litigation is reasonably anticipated."
See Fujitsu Ltd. v. Federal Express Corp., 247 F.3d 423, 436 (2d Cir. 2001).
These triggers may take multiple forms, including:
- Settlement demand enclosing proposed complaint (Salvatore v. Pingel, 2009 U.S. Dist. LEXIS 37905 (D.Col. 2009))
- Pre-filing communications between litigants (Goodman v. Praxair Servs., 632 F.Supp. 494 (D.Md. 2009))
- Demand letter (Id.)
- Series of cases in the industry (Adams & Assocs. v. Dell, Inc., et al., 621 F.Supp.2d 1173 (N.D. UT 2009))
- Text messages (Clear-View Technologies, Inc., v. Rasnick et al, No. 5:2013cv02744 - Document 196 (N.D. Cal. 2015))
Reasonably accessible ESI, including email, shared drives, communication and collaboration tools, etc. must be preserved. This requires a legal hold, which should exhibit these characteristics:
- Be in writing with clear instructions
- Encompass all company personnel who may possess potentially relevant information (including key IT personnel)
- Include “active” collection from “key players”
- Encompass all possible sources of potentially relevant information (may include legacy data including archives, back up tapes, former employee data, etc.)
- Require affirmative responses from all relevant personnel
- Be monitored
- Include procedures for periodic reissuance
- Be followed
What are the Roles and Responsibilities of Counsel?
Initially, assess if there are any eDiscovery needs and issues. If so there are:
- Implement appropriate ESI preservation procedures, including informing the client of the legal requirement to take actions to preserve evidence potentially relevant to the issues raised in the litigation (that is, issue a legal hold)
- Analyze and understand ESI systems and storage (where is the data stored?)
- Identify Custodians of relevant ESI (who has access to it?)
- Develop, perform, and apply appropriate ESI-related searches and limitations
- Collect responsive ESI in a manner that preserves the integrity of the ESI (that is, preserves the metadata)
- Advise the client as to available options for collection and preservation of that ESI
- Engage in competent and meaningful meet-and-confer with opposing counsel concerning an eDiscovery plan (Rule 26(f) of the Federal Rules of Civil Procedure)
- Produce responsive ESI in a recognized and appropriate manner
You'll notice this is essentially the steps of the EDRM, although there is some nuance. That's because the Federal Rules of Civil Procedure (FRCP) has some things to say about eDiscovery.
What Does the FRCP Say About eDiscovery?
When the FRCP was first published in 1938, it's safe to say no one was concerned about identifying, collecting, processing, reviewing, and hosting data from cloud storage devices.
In 2006, ESI was first codified as a discoverable data source. But case law regarding spoliation varied across jurisdictions (was spoliation a result of negligence, or did it require bad faith?), so a "collect everything" mentality to avoid potentially responsive documents meant costs exploded astronomically.
In 2015, rules were revised to emphasize proportionality, encourage cooperation between adverse parties, and mitigate motions gamesmanship by avoiding penalties in most instances when parties have demonstrated reasonable decision making and important data can still be produced.
Image source: Exterro
So, what does the FRCP actually say about eDiscovery? Here is a synopsis:
Rule 26(f): The "Meet and Confer" Conference for Discovery
This is when parties come together to agree on topics including:
- A list of custodians
- The data sources to be searched
- Timeline for completion
- Format in which the final production will be issued (e.g. native, near-native, PDF, etc.).
- Cost sharing
- Claw back agreement
- ESI software to be utilized (e.g. document review software and protocols)
At the heart of this rule is its emphasis on proportionality. If there's a $100,000 dispute, making demands that will generate a $90,000 discovery bill probably won't be received well by the bench. The court makes its decision on the following factors:
- Importance of the issues at stake
- Amount in controversy
- Parties' access to relevant information
- Parties' resources
- Importance of discovery in resolving the issues
- Cost/benefit analysis of the proposed discovery
Rule 34: Producing Documents and Electronically Stored Information
This rule attempts to drive parties toward specific requests and objections and to avoid petty gamesmanship involving motions using boilerplate language like "this is overly broad and unduly burdensome" or "this is vague and ambiguous."
Why is it overly broad? What about the request is unduly burdensome? Is that position justified by the factors counsel weights in Rule 26(f)?
If a claim is vague or ambiguous, you don't file a motion, you pick up the phone and call opposing counsel.
This all drives back to Rule 1 of the FRCP, which is to "secure the just, speedy, and inexpensive determination of every action and proceeding."
Rule 37(e): Failure to Preserver Electronically Stored Information
If electronically stored information that should have been preserved in the anticipation or conduct of litigation is lost because a party failed to take reasonable steps to preserve it, and it cannot be restored or replaced through additional discovery, the court:
(1) upon finding prejudice to another party from loss of the information, may order measures no greater than necessary to cure the prejudice; or
(2) only upon finding that the party acted with the intent to deprive another party of the information’s use in the litigation may:
(A) presume that the lost information was unfavorable to the party;
(B) instruct the jury that it may or must presume the information was unfavorable to the party; or
(C) dismiss the action or enter a default judgment.
Mistakes are inevitable. eDiscovery is not about perfect, it is about defensibility.
Can you demonstrate that you took reasonable steps to govern data?
Did you implement a legal hold as soon as you anticipated - or were served with - litigation?
Were your Rule 34, 12, and 26 processes documented, and - where appropriate - agreed to by opposing counsel?
Bottom line? Be reasonable.
New Factors in Document Review
"Document review" is legacy language, a relic of a paper-based era when documents were physically marked or stacked into piles indicating their relevance or privilege. So how do you account for things like emojis, online collaboration tools, the IoT, and deepfakes during document review?
And in spite of incredible advances in review platforms to categorize emails and other written documents based on machine learning, how does that apply to image-based, audio-based, or network-based information?
Despite marketing hype, human review isn't disappearing anytime soon.
Roughly 8.5 billion texts are sent per day in the Unites States alone, equating to an average of 32 texts per person per day.
Approximately 70% of Americans who text use emojis, and more than 700 million emojis are used every day in Facebook posts.
The discovery issue is that forensic tools don't always capture emojis, and different platforms (e.g. Apple, Android) can display the exact same emoji in different ways.
Eric Goldman is attempting to track every U.S. court opinion in Westlaw and Lexis that references emojis. While no major substantive rulings on emojis have been issued yet, it's likely only a matter of time. Emojis during discovery were a hot topic for CLEs during the summer of 2020.
Online Collaboration Tools
These web-based apps offer teams the ability to work remotely on the same platform by utilizing services such as instant messaging, to-do lists, file sharing, scheduling, etc. Examples include:
- Microsoft Teams
- Google docs
Their ability to streamline workflows and improve productivity are legendary or debatable, depending on who you ask, but they do often lead to a dramatic reduction in email communication. And incidentally, people speak far more casually in these platforms than they do via email.
While they have certainly benefited teams and companies in a variety of ways, collaboration tools come with their own problems, some of which have legal consequences. For example:
- they can be misused to harass co-collaborators
- many contain both public and private chat channels, which can make identification and collection of relevant information difficult
- chat messages and other features are sometimes encrypted and can often be created or deleted in discreet ways
- not designed for corporate data management or compliance
Beyond that, these apps present discovery-specific challenges, including
- What counts as a "document" for the purpose of discovery?
- Who should be listed as the custodian?
- How should legal holds be applied and successfully enforced?
- How can information exported from the application be reviewed and utilized?
The Internet of Things is the concept of connecting any device with an On/Off switch to the Internet (and/or to each other).
- Smarts speakers (amazon Echo, Google Home)
- Wearables (Fitbit, Apple Watch, Garmin)
- Cell phones
- Coffee makers
- Washing machines
While many consumers have fallen in love with the convenience, discovery counsel is aware of some challenges, including:
- How do you access and preserve data?
- Potential for falsifying or hacking the device
- Each IoT device may generate its own proprietary data format (leading to increased collections and processing costs)
- Difficulty exporting information from some IoT devices to be reviewed and analyzed
Deepfakes are defined as utilizing artificial intelligence to take an existing image, video, or audio file and replace them with fake or misleading content that is difficult, if not impossible, to detect.
This area of technology is still incredibly new (and terrifying). While I don't know enough to write extensively on the topic, some of the chief concerns are how to detect deepfakes, and the age-old concern that the means to produce them outpaces the means to detect them.
The negative societal impact of deepfakes far outstrips potential concerns about he said-she said in a courtroom; they have a true tendency to destabilize countries or regions. Stay tuned, and stay vigilant.
Foundational Case Law
Zubulake v. UBS Warburg (2004)
Court: U.S. District Court, Southern District of New York
Judge: Shira Scheindlin
Significance: landmark opinion set the foundation for balancing tests on recovering and reviewing ESI from backup tapes. Her seven factor test includes:
- Extent to which the request is specifically tailored to discover relevant information;
- Availability of such information from other sources;
- Total costs of production compared to the amount in controversy;
- Total costs of production, compared to the resources available to each party;
- Relative ability of each party to control costs and its incentive to do so;
- Importance of the issues at stake in the litigation; and
- Relative benefits to the parties of obtaining the information.
Victor Stanley v. Creative Pipe (2008)
Court: U.S. District Court, District of Maryland
Judge: Paul Grimm
Significance: Defendant deemed to waive privilege because they:
- Did not work out a privilege protocol with opposing counsel
- Did not prove its process for determining privilege was reasonable
- Declined to use a clawback agreement
Macia v. Mayflower Textile Services (2008)
Court: U.S. District Court, District of Maryland
Judge: Paul Grimm
Significance: court wrote that eDiscovery costs were increasing exponentially because of opposing counsel's failure to meet and agree on processes and standards. The FRCP would later be amended to emphasize the importance of the meet and confer phase.
Pension Committee of the University of Montreal Pension Plan v. Banc of America Securities (2010)
Court: U.S. District Court, Southern District of New York
Judge: Shira Scheindlin
Significance: court ruled that counsel's failure to issue a defensible legal hold is gross negligence. Case resulted in monetary sanctions and adverse inference (that is, instructing the jury to assume lost evidence is bad for the party that failed to produce it). Judge Scheindlin noted that sanctions should be moderate and proportional to the case and offense.
Rimkus Consulting Group v. Cammarata (2010)
Court: US. District Court, Southern District of Texas
Judge: Lee Rosenthal
Significance: split emerges among the federal bench in when sanctions are appropriate; in this case, court rules bad faith must be present. More than a decade later, this is still no consistent standard among the federal courts as to when and how sanctions should be applied.
Da Silva Moore v. Publicis Groupe & MSL Group (2012)
Court: US. District Court, Southern District of New York
Judge: Andrew Peck
Significance: first ruling in which predictive coding is explicitly approved (and upheld upon appeal).
More Recent Case Law
eDiscovery case law continues to evolve, and there are a number of good sources you can follow to stay informed. Generally, topics include:
- Self-collection of eDiscovery (generally frowned upon)
- TAR (technology assisted review) and whether you can compel another party to use it (generally, no)
- Responsibilities of counsel (who continue to receive reprimands from courts for not adhering to the FRCP)
Here are some key terms you'll hear over and over. You can also download a copy here.
- Batching: dividing data sets into groups for processing or review
- Bates Number/Bates Stamp: a unique identifier for a specific document
- Boolean Search: search using terms such as AND/OR/NOT to connect multiple keywords or phrases with a single query
- Clawback Agreement: if you accidentally produce privileged documents, the opposing side has to return them and cannot use them against you or claim they aren't privileged anymore
- Coding: in layman's terms, this means using tags to categorize documents.
- Container file: single file (usually compressed) that contains other files or documents
- Cost Shifting: while the producing party typically bears the costs, in some instances courts can order the requesting party to cover some or all of the costs. This is often applied as a remedy for overly-broad (disproportional) requests
- Culling: reducing a large data set to remove junk. Parties should agree on culling criteria.
- Custodian: a person who has access to potentially relevant information
- Data Mapping: the process of identifying what data you have, where it is stored, and who has access to it. Helpful proactive measure for responding to discovery requests.
- Deduplication: reduction of redundant documents, e.g. consolidating an email thread
- Document: this is legacy language from when discovery was paper-based; a document could be a Word doc, email, text thread, or any other type of ESI
- EDRM: Electronic Discovery Reference Model; a visual model of the steps taken in the eDiscovery produce to produce a defensible product
- ESI: Electronically Stored Information; common examples include email, Slack, Zoom, Teams, Microsoft Office documents, text messages, etc.
- Family: a group of connected documents (e.g. an email and its attachment)
- File Slack/unallocated areas: leftover space on a drive where a file is stored (free space may indicate tampering, and files may or may not be recoverable)
- Forensics: digital evidence
- Harvesting: another term for collecting data
- Hash Value: a unique identifier for a specific document
- Identification: which custodians have possess data, storage locations, etc.
- Information Governance: proactive management of ESI to manage risk and cost
- Keyword Search: using important terms to find relevant documents
- Legal Hold/Litigation Hold: formal notice to not delete or destroy any information based on expected or actual legal proceedings
- Legacy Data: information stored in an old or obsolete manner (e.g. backup tapes)
- Load file: database file that allows processed data to be loaded into a document review software
- Meet and Confer: Rule 26(f) of the FRCP states parties must agree on discovery protocol to mitigate subsequent discovery disputes relating to cost, scope, production format, etc.
- Metadata: data about data; e.g. author, created date, edited date in a Microsoft Excel document
- Native format: the file in which something was originally created (e.g. .xls for an Excel doc, instead of a PDF image of the file)
- OCR: optical character recognition
- Processing: converting files into a format that permit for easier review
- Production: delivery of documents and ESI to other parties in litigation or investigation
- Parent/Child: relationship between related documents, e.g. an email (parent) and an attachment (child)
- Predictive coding: document review software learns from human reviewer decisions and categorizes unreviewed documents accordingly. Can significantly review duration and expense of a project; has been repeatedly upheld in case law
- Redaction: removal of sensitive information
- Review: examining documents to determine whether they are relevant or privileged
- Spoliation: destruction or alteration of ESI. Intentional spoliation can result in sanctions.
- Structured data: categorized information (e.g. database)
- TAR: technology assisted review; that is, software platforms that use machine learning to help cull down data sets by automating whether a document is responsive, non-responsive, or needs further review
- Threading: logical sequence of correspondence (e.g. narrative form of multiple email threads that removes duplicates)
- TIFF: image file (like JPEG, PNG, or GIF)
- Unitization: converting large files into individual documents (e.g. a 500 PDF of single-page documents may be unitized into 500 individual documents for review)
- Unstructured data: information not stored in a database (e.g. powerpoint decks, emails)