GUEST POST: eDiscovery software and eDiscovery skills – wolves in sheep’s clothing

Home / Uncategorized / GUEST POST: eDiscovery software and eDiscovery skills – wolves in sheep’s clothing


1 November 2017 – Today’s column is from our intrepid pundit and friend Jonathan Maas, head of the eponymous Maas Consulting Group. As veteran readers know, among his many activities in the world of managing electronic and hard copy data is his daily collection of articles, posts and sources of information on Twitter in the areas of forensic technology and discovery which he shares via email blasts entitled “BONG!” which we have distributed weekly to our subscriber base for years.
*  *  *  *  *  *  *  *  *  *  *  *
eDiscovery software and eDiscovery skills: 
wolves in sheep’s clothing
About a year ago I started drafting an article exploring how expertise in eDiscovery in civil litigation was a transferrable skill (working title: Find, Filter and Free the Facts).  I did not have the opportunity to finish it as I got caught up in other things but I was looking at how every area of life that involves data (which is, well, every area of life) benefits from things that can identify, isolate, preserve, collect, collate, analyse and defensibly conclude or prove facts from that data.  EDRM anyone?

eDiscovery?  What’s that?

These “things” have been more prevalent and honed in the niche but very successful field of eDiscovery/eDisclosure.  Briefly, the process of discovery (as it’s known in the US) or disclosure (in the UK) in common law jurisdictions is a fundamental part of civil litigation procedure.  This procedure is tightly governed by the Federal Rules of Civil Procedure in the US and by the Civil Procedure Rules in the UK.  Regardless of jurisdiction, the outcome is broadly the same: they require the trial to be conducted on a level playing field with each party having had complete prior access to all the available evidence and no planned last minute surprises.

To achieve this level playing field each party needs to find (“discover”) and sift through all hard copy and electronic documents in their “power, possession or control” and produce (“disclose”) the potentially relevant ones to each other before trial.   A party’s documents are relevant to the matters in dispute regardless of whether they are harmful to or supportive of its own position in that dispute.  Failure to find and produce copies of known classes of documents or the production of tampered documents can lead the judge to infer a strong likelihood of foul play by the failing party.  Parties also have a duty not to produce reams of clearly irrelevant evidence to each other.  The trial is conducted using shared bundles of the evidence on which the parties will be relying.

The “EDRM” to which I referred above is the Electronic Discovery Reference Model, which has become the industry’s standard approach to electronic discovery.  It has evolved, and will continue to evolve, over time but will still essentially look like the diagram below.  With little variation it can equally apply to the process of discovering hard copy documents.

Discovery is but a small part of the entire litigation process.  Despite that, it is likely to be the single most expensive purchase a litigating party will incur on its path to justice.  Completing discovery indicates very clearly that you are serious in the pursuit or defence of your case.

As indicated by the above Model (and click here for a more comprehensive description of the various stages), the process serves a number of purposes but, essentially,

  • it allows you to be clear about your own position based on the evidence you have collected;
  • it allows you to see how your opponent’s evidence supports or counters your position;
  • it allows you to obtain witness testimony through sworn depositions or statements on targeted facets of your case supported by contemporaneous evidence; and
  • it allows the court to do away with areas of agreement and focus on those areas most in need of judicial intervention.
For more detail on the what and why of lawyers’ use of eDiscovery software please see my article Legal AI vs eDiscoveryOf relevance here is that in discovery the cost increases as you move to the right of the Model because, at that point, humans really need to read things, and humans with legal training are not traditionally cheap workers.

The pain point has always been volume: the more there is the more it costs, the longer it takes and the more obscure the facts.  I have always said that technology is the solution to the problem caused by technology.  To reduce the sheer volume of things entering the process you first need to use technology to winnow out the rubbish.  To use your expensive assets wisely you then need to make sure they are focused on the most important, controversial things that humans still need to read.  Being human, it also makes sense if you can feed your expensive assets with things that share themes (for instance, everything that references a particular product or project) so they develop expertise in that area to speed up their comprehension and therefore improve their analysis of and deduction from the facts.

My point is that whilst eDiscovery software started as more “back office” (processing, hosting and presenting data for human review), it has developed over the past few years to include a wide offering of highly advanced analytics.  The sheep have grown fangs.

eDiscovery Analytics

Today’s eDiscovery software is about picking data apart: it allows an incredible amount of control over a myriad of raw data types, regardless of volume.  It herds cats.  The software has been designed to separate the relevant from the irrelevant, and the privileged from the relevant, so that a party can discharge its legal discovery obligations to the satisfaction of the court … and help win its case.  It includes the ability to defend any part of the process through functionality like audit trails and security permissions.

eDiscovery software can allow obscure facts to be laid bare, a story to be uncovered where there appeared to be no story.  It can take words, phrases, concepts and emotions and make connections between them to a level we have never seen before.  And it does this with data forensically collected from a plethora of devices: computers, tablets, mobile phones, PDAs, cameras, SatNavs and the like.

eDiscovery software does not discriminate.  If it can turn data into searchable text, it can analyse it.  Telephone recordings, video files, Bloomberg chat, emails, spreadsheets, audio files, photographic EXIF data, GPS co-ordinates, Facebook pages, Twitter feeds are all as one to the software.  Want to cut to a specific point in a trial transcript to watch the defendant’s mannerisms as a difficult question is fielded?  If it has been recorded, you can do it.

Even non-searchable images can be searched for specific skin tones and the like.  The tools developed for converting text to searchable text are also improving over the years (I know of software that claims to be able to convert old typewritten documents and faxes into searchable text with incredible accuracy).  I have watched as various media records of the same sequence of events have been analysed to provide impressions of those events from different points of view (for instance, to unravel a car accident).

eDiscovery software never sleeps.  It relentlessly chugs away in the background, grinding through terabytes of data to deliver up answers to questions you never thought of asking, offering up connections you never expected to see.

Artificial Intelligence

What’s so cool is that eDiscovery software can now be taught what to look for, how to do things, and go on to teach itself from that initial tutoring.  There is a massive upsurge in artificial intelligence, or machine learning, where the software is increasingly being used to the right of the above Model to find the meaty stuff you need.  Taught what’s interesting and what’s not for each particular use-case it can bubble to the surface data of likely interest without any further human intervention.

As an aside, it is interesting to note that the huge Enron Corpus dataset and  Truth Tobacco Industry Documents archive have long been used by eDiscovery vendors to develop and test their advanced analytics capabilities and to demonstrate the same to the global legal community.

Time to Howl

It doesn’t surprise me that the ability of eDiscovery software to plough through massive amounts of incredibly mixed data has come to the attention of other fields, like journalism and cyber security.  In investigative reporting, from whistleblower dumps (such as the Panama Papers or Wikileaks) to FOIA requests, journalists and investigators need to analyse large data sets quickly and accurately.  They will make full use of the true power of eDiscovery analytics in ways that those involved in civil litigation never will, for the reasons I give below.  Most recently, Logikcull and iCONECT have independently loaded the JFK Papers into their eDiscovery tools to let people explore how software such as theirs can blow open such a varied data set.

My colleague Gregory Bufithis reminded me of this in his deliberately provocative article of 18 October 2017, written immediately following the heinous execution of his close friend and investigative journalist Daphne Caruana Galizia on Malta: The Panama Papers, eDiscovery … and a Murder in the afternoon sun.  In that article he describes the use of eDiscovery software called Nuix that enabled the International Consortium of Investigative Journalists (“ICIJ”) to piece together the information contained in the 11.5 million files (2.6 Terabytes) of the Panama Papers.

Ms Galizia led the ICIJ’s investigation into corruption in Malta.  In that first article Gregory introduced the eDiscovery community to this use of “their” technology and invited them to visit the International Journalism Festival in Perugia, Italy, in April 2018.  I certainly hope to be there.  He tells me there has been serious interest from eDiscovery vendors wanting to explore alternative markets for their wares (not surprising, really!).  The wool is turning into fur.

In the area of cyber security I have learned that companies like Decipher Forensics have teamed up with cyber security experts like FireEye.  Decipher Forensics is well known in the field of digital forensics (mobile phone forensics as well as complex data recovery) and that expertise led them to apply the same eDiscovery/forensics techniques to incident response and incident response preparedness and planning.

Gregory followed up his initial article on 23 October here. I say I am not surprised that eDiscovery vendors are, like heliotropic flowers, turning to look at the sun: an unexpected and welcome new use of their tools.  As I say, I believe lawyers have limited use for these advanced analytics in civil litigation so it only sensible for eDiscovery vendors to explore new markets for their sophisticated software.

Not for Civil Litigators
I have spent my entire professional life in this field and I believe these advanced analytics are not for civil litigators.  Firstly, in my experience civil matters rarely depend on a single “whodunit” moment, the “smoking gun”.  They tend to be about the relentless passage of events or non-events over time that led a frustrated organisation with no choice but to seek succour through the courts.  In document-intensive matters legal teams just grind through the documents day after day.  eDiscovery software simply (but vitally) helps to reduce that pain,  shorten the time required to complete the task and reduce the cost.



Secondly, I believe analytics will have a limited, but nonetheless extremely valuable, application in civil litigation due to the proportionality test.

Finally, I think these advanced analytics are not for civil litigators because cases are conducted to a pretty aggressive timetable imposed, and agreed to, by people who tend to have little notion of these things.  I know that investigative journalism and cyber security are also extremely time-critical but the goal there is to get to the incontrovertible truth despite the cost, not to arrive at the most persuasive argument within a fixed budget.

In consideration here are:
  • the small likelihood of finding, or even needing to find, possible hidden evidence of wrong-doing when there’s usually so much visible evidence already in play;
  • the requirement to be proportionate; and
  • the “normal” eye-watering cost of Big Ticket civil litigation wrapped up in a fixed budget.


For these reasons advanced analytical tools of this nature will not have a greater place in the conduct of civil litigation.  They will obviously remain, in one guise or another, extremely valuable tools to help control cost and focus attention when faced with the ever-growing problem of data volume and its concomitant opacity.  But I see them, in this context, more as administrative tools delivering much-needed efficiencies.  Which is a disappointment, but civil litigation is more about raking over existing facts and advancing persuasive arguments based on those facts, or lack thereof.  It is less about raking over the coals to find out if any unknown but possible facts ever existed.

Wolf in the Belly

In criminal and state investigations where the investigators must necessarily look under every rock to ensure justice is done, and where investigators’ pockets must necessarily be deep enough to achieve that, eDiscovery software that hunts through Big Data to find unexpected connections and suggest common themes will thrive.  They will be developed to be better and quicker at what they do, and cheaper to use.  They will probably even, like in the Tom Cruise film Minority Report, be used to prevent the commission of crime (a concept I know the Internal Affairs functions of numerous police departments are keen to explore).



Gregory opens up fantastic opportunities to introduce eDiscovery software to investigative journalists and cyber security specialists where, as proven by the ICIJ’s work with Nuix on the Panama Papers, there are enormous opportunities for facts to be found, filtered and freed.  I look forward to this future.

 *  *  *  *

This article was written by Jonathan Maas of The Maas Consulting Group.  Jonathan is an internationally renowned 35-year veteran speaker, writer and practitioner in the field of discovery and evidence management


Related Posts