Home

A Trend in Legal Technology: Data-Enabled Case Evaluation

David Milstein, MD, Head of Legal Technology, Guggenheim Partners

David Milstein, MD, Head of Legal Technology, Guggenheim Partners

What’s coming down the pike in terms of legal technology is case outcome forecasting. It is already being done, of course. First of all, plaintiff lawyers have always made a cost-benefit analysis before they take a case, wherein the probability of getting a desired judgement necessarily factors highly. Their business model depends on their ability to correctly predict outcomes.

Secondly, in big litigations, lawyers on both sides of the bar are running mock trials with focus groups to pick the optimum arguments to make. So, case prediction is already being done via two different methods: expert opinion, and crowd-sourcing. But it is possible to subject cases to a third method: statistical modeling via decision tree analysis. That’s the disruptive technology that’s really going to impact the legal field, and it is going to be huge.

Already, Supreme Court cases are being modeled in this way. At a high level, how it works is by classifying cases according to various dimensions, or factors; then the model is trained by running a large number of samples and observing the correlation of outcomes with specific factors. The best models currently get about 70 percent of the cases correct, compared to about 75 percent for the best human predictors.

The power of data is scale. Once you have a classification system, the marginal cost of passing an additional case through an algorithm is effectively zero. So, in the near future, people will be looking for ways to run these models on current or prospective cases.

The problem is the effort of classification; of getting the data. You don’t want to have to classify things yourself–that’s a lot of work! What you want in the field of big data is to leverage structured information that was already being created anyway. That’s the reason for the existence of the Enterprise Data Warehouse. Enterprises chuck all the data they have in a big pile, and hopefully let data scientists connect the dots into a pretty picture. So, where to get this data about cases?

“The power of data is scale. Once you have a classification system, the marginal cost of passing an additional case through an algorithm is effectively zero”

Well, the best source of data on federal cases is the Administrative Office of the US Courts. Federal case information is (unless rarely sealed) is a matter of public record, and thus available for download. However, current policy is to charge a ‘per page’ fee for downloaded material, a legacy of the paper-era policy of providing paper Xerox copies at cost at the courthouse. This fee has long frustrated social scientists, and in fact there have been a number of attempts increasingly successful attempts to get around the policy and aggregate lots of data on federal cases.

In Bankruptcy cases, the ascribing of enormous numbers of factors to each case, neatly bundled as one big data set–we are almost completely there. The forms for filing a petition for bankruptcy have been fully translated to machine-readable format, an XML schema. There is no need for the documents themselves at this point–the PDFs of the forms are just a jacket that the data can put on, for visual display to users like court clerks, and lawyers, and case trustees. Entities exist who will consolidate this data, and leverage it to great predictive effect, when they can get hold of it.

As a side note-Bankruptcy is very mechanistic. The Bankruptcy process is essentially just following an algorithm as to how to divvy up remaining assets according to circumstance, according to the Bankruptcy Code. The data already exists, as noted above. And the annotated bankruptcy code exists, a halfway step of interpreting the legal code into natural language (English). The next step would be to translate that algorithm into computer code. It’s all a bunch of conditional logic statements. Automation of the entire process is possible–including the negotiation between creditors and the disposition of the case.

But I digress. More generally, in all civil cases, the cat will soon be completely out of the bag as well. Within a decade there will be more than one service provider offering to tell clients strategic advice like how cases are likely to turn out, what their chances on appeal are, and perhaps even tactical advice like what motions should be filed when and what arguments (case citations) would best be deployed for a given purpose.

And do not suppose that the only customers of such services would be lawyers. Already, third parties are buying cases from law firms, exchanging a discounted expected value in cash for the law firm’s rights to the eventual payout if any.

As an outgrowth of the increasing availability of case data, and statistical models trained by that data, third parties are going to be using increasingly sophisticated analytics to fund litigations they see as having good potential for getting a large judgement. I would expect this industry to grow very significantly in the short term.

However, over time, one could even envision potential litigants themselves availing themselves of such a service, especially if both litigants were corporate entities with significant resources. Once the discovery phase has taken place, and large volumes of information about the dispute are available, it would seem eminently logical that both sides would look to a case predictor service and see what their chances were. Arguably, such services would position themselves as pre-arbiters, suggesting settlements based on probabilistic outcomes.

In the farther future, I think the question will in fact become: ‘Why not let an algorithm decide the case itself, then?’ Once you can train the machine to arrive at the correct decision (defining ‘correct’ as predictive, meaning the model can predict what would have happened), why wait months (or years) to get in front of a judge? In fact there’s a social justice argument to be made –long wait times and high cost have meant the virtual disappearance of the ‘ordinary people’ torts from federal court, and necessitated the class action suits, which are in some ways sub-optimum. Of course, at this point we are talking about a really significant technology disruption of the legal process itself–policy considerations are going to be brought to bear. Certainly the judges are not about to let themselves be replaced by robots! Not if they can help it. But it will be very interesting to see how it all plays out.