To many, the definition of advanced analytics used in the litigation/dispute context seems to be defined merely as predictive coding. Although predictive coding has been an established breakthrough within dispute technology, Vista believes that limited definition is just basically scratching the surface. Our suite of custom technology and service enhancements can be used to improve speeds, perform advanced quality control, assist in fact pattern development and otherwise augment the important work that attorneys perform.
Contained within everything we do is the expertise of our data scientists and engineers. We believe that this expertise is the difference between Vista and our competitors. Products, by definition, use one algorithm and one approach to solve a problem. We review in detail our clients’ data, the mix of algorithms that can be used and all parameter settings in order to create an optimized solution. We feel that the best a product can deliver is the median of solutions (one approach for all data/problems) when what is required is the optimized solution.
Automatic Privilege Prediction
Most complex litigation has very long lists of keywords that identify potentially privileged documents. Putting together the keywords is a lengthy and costly endeavor. Vista Analytics has developed a process that uses easily identified keywords (like law firm domain names) without training labels from the attorneys and then iteratively expands documents to identify privilege by using predictive features such as writing style, network topology, semantics and so forth. This technique can be used as a final QC or can be inserted into the standard privilege techniques, which allows our clients to confidently both defend their privilege process and also ensure that privilege documents are not produced.
Explainable Predictive Coding
Many firms are using predictive coding to augment human review in terms of responsiveness or other coding decisions. Vista routinely provides these services. We use a combination of ensemble learning either through multiple algorithms or through multiple time periods to adjust for concept drift, comprehensive model tuning, along with unlimited computing power through cloud computing and attestation credentials though appropriate testing to achieve superior results. As mentioned in the introductory section above, this process is not possible through the purchase of products but requires a service approach with some of the leading minds in this industry.
In contrast to vendors on the market who provide a black-box predictive coding solution, our predictive coding solution is transparent and explainable. Not only do we provide the workflow process, results, and testing statistics, we also highlight what part of the document makes it responsive. The highlighting process is unique to the Vista offering and allows much faster human review and understanding of the document population.
Entity Extraction and Typing
Vista has developed a proprietary entity extraction and typing pipeline that can automatically find high quality entities, such as personal names, organizational names, locations, brands, dollar, and then assign corresponding types to entities given a large number of documents. Our solution does not require training labels and can be customized to highly specific domains. For example, for documents in the medical domain, our pipeline can automatically extract and assign types to medical treatments, diseases, symptoms and etc.
The process of extraction allows for the review and manipulation of entities as though they were in a metadata field. The value in the extraction comes from the understanding that Search is very valuable but only if you know the words you are searching for. Entity extraction provides for an understanding of the contents of text without prior understanding, which enables counsel to understand the contents of a document population in a much faster manner than any other technique or technology.
By clustering documents with similar content together the process of understanding the documents and moving quickly through a review can be greatly enhanced. Imagine a cluster of 100 documents discussing a single issue – after an attorney reviews the first 2 or 3 documents the act of reviewing the additional 98 becomes simple and fast. The clustering also allows individual attorneys to become a quick expert on the topic, while reducing both cost and timelines.
Similarity scores allow for each document in a database to receive a numeric value indicating how similar the document is to another document or to another set of documents. This can greatly increase quality control and testing. In a recent matter, Vista was requested to identify any outstanding privilege documents. This request came after a lengthy privilege identification process. We foldered the top 500 documents that looked most similar to the already identified privilege documents. An attorney reviewed the 500 documents and found no additional privilege. This provided a level of comfort for the client and attorney team.
Fact Pattern Development
For each litigation matter, Vista provides a proprietary analytics dashboard that facilitates fact pattern development outside the responsive review process.
A timeline anomaly detection looks for abnormal changes in the frequency and type of communication (adjusting for non-substantive emails such as a company outing). The anomalies are plotted along a timeline and can be accessed to immediately take you to the communication.
Sentiment analysis has been frequently used in various analytics situations but has been rarely used in litigation. The basic idea is for the analysis to identify the sentiment of a document. Does the document provide a positive or negative view of the subject matter? Using sentiment analysis provides the ability to more quickly identify positive and negative documents for use in motion practice, depositions and ultimately trial.
Link analysis provides insights on the relationships between people in an organization and key entities mentioned in their documents. A heatmap is used to highlight the intensity of communication. Link analysis allows you to quickly identify who are frequently communicating with each other and what unique terms they are using in their communications
At Vista, our in-house team of researchers and data scientists developed a series of advanced statistical testing and sampling techniques to ensure accurate representation of the data and highly accurate and defensible results. Most practitioners understand the importance of test and validation sets used in predictive coding. It is imperative however, that the test and validation sets accurately represent the total population of documents. This gives our clients the confidence to make high quality representations about their documents and data in front of the judiciary or opposing parties.