Xiao He, Author at Prosearch

The Imperative for Responsible AI Guidelines

Xiao He — Fri, 05 Apr 2024 17:47:53 +0000

The Imperative for Responsible AI Guidelines

From revolutionizing industries to reshaping legal practices, AI is poised to redefine the way we live and work. But the power of AI also carries risks. Amid all the excitement, there is a growing recognition of the need for responsible AI guidelines and practices.

Responsible AI involves addressing potential biases, discrimination, privacy breaches, and other negative impacts that AI systems might inadvertently create. It also ensures transparency, fairness, and accountability in AI algorithms and decision-making processes.

Why the Need for Responsible AI?

Across our culture and economy, potential risks of AI have been identified. A few examples:

Discrimination and Bias

AI systems are not immune to the biases present in the data they are trained on. This raises concerns about discriminatory outcomes. Responsible AI guidelines should emphasize the need for unbiased algorithms and continuous monitoring to identify and rectify any unintended biases.

AI has gained traction in hiring processes, posing the challenge of algorithmic biases and potential discrimination. Responsible AI guidelines can provide a framework for fair and ethical hiring practices, ensuring that AI tools complement human decision-making rather than perpetuating biases.

Fairness in Lending

AI plays a crucial role in lending decisions, yet it must be implemented responsibly to avoid reinforcing existing inequalities. Guidelines should advocate for fairness and transparency in AI-driven lending practices, ensuring that all individuals have equal access to opportunities.

Plagiarism, Fakes, and Misinformation

As AI systems generate content, there’s an increased risk of plagiarism. Responsible AI guidelines should address the ethical use of AI-generated content, emphasizing the importance of originality and proper attribution.

Lack of Notice and Transparency

Users often lack awareness of how AI systems operate. Guidelines should mandate clear communication on the use of AI, providing users with transparency about when and how AI is employed to make decisions that impact them.

Unique Challenges in the Legal Industry

The legal industry faces distinct challenges in adopting AI, as initial cases in the courts have revealed. Issues such as the lack of transparency, validation, and quality controls highlight the necessity for guidelines tailored to the legal landscape.

Benefits of Adopting Ethical Principles

Organizations earnestly adopting trustworthy and ethical principles stand to benefit by mitigating reputational and financial damage. Ethical practices reinforce trust among employees and stakeholders, fostering a positive organizational culture.

ProSearch Principles of Responsible AI

Recognizing the need, the ProSearch team has put together our own Principles of Responsible AI.

Responsible AI is a risk management-focused approach that advocates for informed caution in AI deployment. It involves establishing foundational principles and guardrails, with a focus on notice, transparency, accuracy, and accountability.

ProSearch is committed to building new applications with responsibility and usefulness top of mind. To that end, our work aligns with these principles:

Practical Utility and Value

We focus on creating solutions that provide real-world value. Every ProSearch AI solution is designed to help our clients solve a specific legal or compliance challenge.

Fairness

We work to reduce unfair biases in AI models by thoughtfully designing AI solutions, carefully curating training data, and thoroughly testing models. Fairness is prioritized by proactively mitigating biases through inclusive data practices and rigorous testing.

Reliability

Our approach ensures AI systems perform consistently in different scenarios through robust training, monitoring, and testing. This confirms dependable performance in different situations.

Transparency

We are committed to clearly communicating what our solutions can and cannot do and how clients’ data is stored and processed by ProSearch. Advocating transparency about actual capabilities, limitations, and data handling practices is crucial.

Privacy and Security

We prioritize the design of AI solutions that protect privacy and are secure from intrusions. We collaborate closely with clients’ IT and compliance teams to ensure we align with ISO, cybersecurity frameworks, and data privacy regulations. We commit to building and communicating processes that protect the handling of data by any stakeholders interacting with the system directly or indirectly.

Accountability

ProSearch is dedicated to ensuring proper functioning of AI solutions. Most importantly, incorporating meaningful human oversight throughout the entire life cycle of an AI system, from development to deployment, maintains accountability. We commit to assessing the impact of incorrect predictions and, whenever possible, designing systems with human-in-the-loop review processes.

Adaptability

These principles guide our development of technologies and workflows and underscore our commitment to ProSearch clients and partners. As AI continues to evolve, we expect to evolve these principles over time, but always with the goal of driving positive change in the legal technology community.

As we continue to innovate with AI technologies in the legal industry, ProSearch Responsible AI guidelines serve as a compass, steering us toward ethical and trustworthy practices. In the legal realm, where the stakes are so high, adopting and living by these principles are a necessity. As new technologies and capabilities emerge, we’ll revisit the principles from time to time. By embracing responsible AI, ProSearch is paving the way for a future where technological advancements align with human values.

The post The Imperative for Responsible AI Guidelines appeared first on Prosearch.

A Data Science Approach to Keyword Searching

Xiao He — Wed, 07 Mar 2018 18:18:43 +0000

I recently read the January 3^rd order in the In Re Broiler Chicken Antitrust Litigation matter. In eDiscovery circles, this case is gaining interest because of its document intense discovery involving some of the nation’s largest chicken producers.

Like many who read Special Master Maura Grossman’s order, I was pleased to see the level of detail to which the industry is now discussing methods used in analyzing data during discovery. Specifically, the order went into great detail about the validation and quality control measures parties must employ to demonstrate that their search protocols were adequate. However, unlike many others, the major takeaway for me was not the importance of using TAR or predictive coding to locate relevant documents. Rather, the biggest revelation for me was the attention paid to search terms. Grossman treated TAR and search terms on even playing fields, not tipping her hat to the productivity of one over the other. Search terms are not going away, even in today’s world of Artificial Intelligence and machine learning. In fact, a great set of search terms can complement a well-defined TAR process quite nicely.

A New Era of Search Terms

In Re Broiler Chicken ushers in a new era in the use of search terms. In identifying documents for discovery, parties used to take an “educated guess” approach, choosing and agreeing to terms that seemed beneficial. In many instances, this still happens today. Two parties at a meet and confer look at the lists and, in essence, shake hands over a list of words, oftentimes without knowing how those terms will actually perform across a data set.

Now, don’t get me wrong, much thought often goes into these educated guesses, search term revisions, and hand shake meetings. Data custodians are interviewed, subject matter experts weigh in, and software gurus hone lists. Sometimes even a key word “hit list” is generated from a processing tool and used to select search terms. But at the end of the day, most parties underestimate the power and importance of creating a compelling and accurate set of highly relevant search terms through true statistical validation. (Note: Using a key word hit list from a processing tool is not true validation, especially when randomly creating term combinations with the syntax that you think will return the desired relevant documents.)

A Data-driven Approach to Searching

Legal teams must have a methodology for selecting and validating the results of search terms. Most of the time, organizations need expert assistance in truly understanding their data, and any analysis of search terms should be informed by actual data. For example, in analyzing search terms for clients, ProSearch developed an automated, iterative process, working in collaboration with data scientists, linguists, and attorneys with subject matter expertise in the case. The result is a highly relevant set of search terms, backed up by statistics. This process can include predictive coding, but can be leveraged solely to create the most effective set of search terms.

But, how do you go about applying a data-driven approach to key word selection? This is where sampling plays the starring role. These sampling methods form the backbone of the statistical analysis. Data points are gathered with a myriad of sampling strategies, including:

Sampling against a population that is most unique (e.g., de-duplicated, consolidated, depending on the client’s preference)
Stratified sampling
Term sampling
Uncertainty sampling (if including predictive modeling in the process)
Random sampling

Obviously, the more you can automate the sampling iterations, the more effective you will be at determining the most appropriate key terms. Also, at ProSearch, we take process documentation seriously, so no opposing party can ever claim that key words were chosen in a “black box” that no one really understands, the end result being a highly effective set of search terms that are also defensible.

The bottom line? Using a data science schema, no longer are legal teams guided by what they think is best, but rather they are informed by what they know is best. And the results are intuitive – cost and risk reduction without sacrificing defensibility.

Validation Protocols that Apply for Keyword Searching and TAR

Developing the search terms is one task at hand. But, the work does not stop there. A legal team needs to prove that their search term schema is performing at adequate levels of effectiveness. This is where a validation methodology comes into play.

Grossman’s validation proposal has the potential for changing what it means to be conducting a complete and adequate document review using search terms. The longest part of the January 3^rd order focuses on results, based on validation, rather than what steps were taken in a workflow process. A process will not necessarily determine the adequacy of a review, but rather validation metrics will show if a review is insufficient or inadequate.

Grossman’s validation schema is very similar, at a high level, to the approach we take at ProSearch in recall calculation. If anything, the order highlights the difficulties that go into using recall for validation, and ensuring a certain level of recall gets only harder when the richness (percentage of relevant documents) in a document set is low.

For example, Grossman’s order puts a stake in the ground to the volume of documents needed for validation – 3,000 documents – no matter the richness in a document set. Albeit, Grossman’s order leverages a relatively large sample from the unreviewed population of documents to ensure that error margins for calculating remaining richness are relatively small.

However, when it comes to calculating recall, it’s not just the size of the sample taken; it’s about the size of the unreviewed population compared to the reviewed population. For example, if there is an overall collection of 1 million documents, having a remaining richness of 1% in an unreviewed population of 900,000 vs. 90,000 would have a dramatic impact on both recall as well as error margins for recall.

Also, the Grossman order does not take into account the variability when interpreting production requests, even on the same review team. Also, at the beginning of a review the team’s definition of relevance changes as the understanding of the case improves. Both factors impact the efficacy of the blind test. Admittedly there must be a process in which the review team doing the broader review can learn what it means to be relevant via a subject matter expert.

The End Game: Fulfill Your Discovery Obligation

At the end of the day legal teams are simply trying to fulfill their discovery obligation and produce the desired relevant documents. Done correctly – mission accomplished!

Along the way, it helps to complete the mission at the lowest possible cost accompanied by a defensible methodology. Have we simply left search terms in the dust because we have been told that AI/TAR is less expensive and more accurate than search terms? Did we miss an opportunity to refine the process of using search terms? It’s hard to say, but one thing is certain, while no single process can account for every data set being readied for review, a strong set of search terms is an extremely powerful tool, stand-alone or in tandem with TAR. I applaud Special Master Maura Grossman for recognizing the importance of data science applications and the importance of statistical validation.

Xiao He, Ph.D. Data Scientist, Linguistics, Analytics, & Data Science (LADS)

Dr. Xiao He is a data scientist on the Linguistics, Analytics, & Data Science team at ProSearch. In addition to implementing the Technology Assisted Review solution for ProSearch, Xiao develops custom solutions and workflows, and researches machine learning applications in eDiscovery. Xiao received his Ph.D. in Linguistics with emphases in experimentation and statistics from the University of Southern California, Los Angeles, and B.A. in Psychology from the University of California, Berkeley. Prior to joining ProSearch, Xiao worked as an assistant professor of linguistics and quantitative analysis at the University of Manchester, United Kingdom.

[activecampaign form=1]

The post A Data Science Approach to Keyword Searching appeared first on Prosearch.