See No Evil, Hear No Evil, Speak No Evil-Managing Audio in E-Discovery
By Jim Vint, MD-Legal Technology Solutions Practice and Leader of the eDiscovery Practice, Navigant
Document requests involving audio recordings are becoming more prevalent, especially when the requesting party is the CFTC or the SEC. Regulation in Dodd-Frank requires corporations to retain potentially relevant oral communications such as telephone conversations, recorded meeting minutes, squawk box conversations, and both business and personal voicemail recordings. Previously, simply listening to these recordings to verify their content was sufficient to constitute a comprehensive and defensible review strategy; however, expansive data retention policies, enhanced by regulation, are making that strategy less practical in today’s legal environment. In addition, identifying the speakers in each audio recording presents another challenge–who was speaking and giving instructions? Did the other speaker confirm it and who then was that confirming speaker? Was one of the key business managers under investigation on that call?
Imagine you have preserved 50 terabytes of audio that now require review, attempting to listen to each recording would be cost prohibitive. The financial services industry is a great example of this. Considering squawk boxes can record 24 hours a day, 7 days a week, a corporation with 50 terabytes of audio could be facing 920,000 hours (assuming 18 audio hours per GB) of review time. This would be the equivalent of listening to the War and Peace audio book 13,000 times. In response, legal teams are looking to leverage technology to more effectively identify relevant information within audio recordings.
Automatic Speech recognition (“ASR”) - the translation of spoken words into text-is the technology that legal teams are turning to because it enables keyword searching and other text analytics to target potentially relevant content within audio recordings. With ASR, large teams of reviewers no longer have to sit and listen to each recording to identify relevant content. ASR is relatively new to the legal space, and while its results are not perfect, it can be used defensibly to implement keyword search when combined with a rigorous quality control sampling plan.
There are two main methods of ASR that the legal community’s technology applications use: phonetic searching and automatic transcription. Both can be beneficial but it is important to consider their respective strengths and weaknesses:
The phonetic searching method converts an audio recording into a phonetic representation of its words. The representation is comprised of a series of phonemes-the basic units of a language’s phonology. When a recording contains the same sound as a phoneme string (search term translated into its phonemes) that recording is identified as a hit. Phonetic searching generally returns results quickly and does not require a comprehensive dictionary to define the keywords and/or industry specific language. It does return false positives when the sound of the keyword is similar to other words in the recording. Take for example the keyword term “Maine”, recordings with the following words may return as hits because they have the same sound as “Maine”: “mainstream”, “main”, “mane”.
"ASR technology is helping legal teams implement more effective audio review strategies to comply with requests from regulators and other parties"
The automatic transcription method uses a speech recognition engine comprised of a language model, an acoustic model, and a dictionary. The models and dictionary work together to evaluate the recording’s sounds and create a text transcript. A major strength of automatic transcription is that it generates a text transcription of the recording that can provide benefits beyond keyword search. With a transcript, reviewers can read the contents of the recording instead of having to listen to them, speeding up the review. This is also helpful if the legal team is still developing its keyword list and is not certain what content to search for. The models and dictionary may need to be manually adjusted to achieve acceptable results, making this method more complicated to implement than phonetic searching.
Legal teams today utilize both phonetic searching and automatic transcription methods, however the automatic transcription method is better poised for future innovation. Advanced text analytics techniques such as text categorization (predictive coding), clustering, and conceptual search have been helping increase review efficiency and reduce legal spend for many years now and these same techniques can be applied to audio recordings, provided that the reviewing legal team has a text transcript. Text categorization could help answer questions like:
• Which recordings should be reviewed first?
• Can recordings that are likely irrelevant be outsourced to lower cost review teams or excluded from review altogether?
• How much is review going to cost?
Compliance teams could use text categorization in conjunction with clustering, concept search and sentiment analysis to flag and review communications that trigger risk areas or portend litigious activities, particularly for consumer action or product liability.
ASR technology is helping legal teams implement more effective audio review strategies to comply with requests from regulators and other parties. A strategy that uses keyword search, enabled by ASR, can help target the right recordings to review and reduce the cost to review them. Keyword search is a step in the right direction but it is just the first step. As with email, additional analytics will be required to address growing data volumes. Soon there will be even more effective ways to target the most critical audio recordings for review so that legal teams can prioritize their understanding of the merits of the case, tied to key custodians and communications, and spend less time sifting through unorganized volumes of content which have less relevance to the case. Starting to sound familiar?
A few years ago, a similar concern centered around structured data. How do you identify what you need, extract it without corruption and integrity loss, review it and produce it? What are the pitfalls and where is the solutions base? These were all questions discussed in the White Paper from the Richmond Law Journal. The technology and knowledge to facilitate the management of audio in e-discovery – the next wave of communication relevant to dispute resolution–is here, begging one question…. what’s next?