2022 is officially behind us, and it was certainly a big year for AI development. Did you know that adoption of AI more than doubled since 2017? Whether this surprises you or not, this edition of Voices of Trusted AI includes a valuable snapshot of the state of AI as it stands now and looking toward the future.
Generative models continue to be an important topic when it comes to trusted AI, and the articles below show just how critical human evaluation is for the safe use of these models.
We hope you enjoy this edition of Voices of Trusted AI, and we invite you to share your thoughts with our team.
P.S. Don’t forget to check out "Team Panda Picks" at the end for a list of articles curated by our team of data scientists.
What Do We Mean by Trusted AI?
Trusted AI is the discipline of designing AI-powered solutions that maximize the value of humans while being more fair, transparent, and privacy-preserving.
What it's about: This year’s McKinsey Global Survey on AI results show that adoption of AI has more than doubled since 2017, even though the proportion of companies using AI has remained between 50% and 60%. The report is separated into four sections: a review of the last five years of AI adoption, impact, and spend; AI leaders; AI talent; and an explanation of the research.
The first point states that AI adoption has more than doubled, peaking at 58% in 2019, with this year’s figure standing at 50%. The average number of AI capabilities, such as robotic process automation and computer vision, used by organizations doubled from 1.9 in 2018 to 3.8 in 2022. Notably, natural-language text understanding is the third most common AI capability embedded in products or business units. For the past four years, optimization of service operations has stayed the most commonly adopted AI use case.
Secondly, the level of investment in AI has increased to reflect the rise in use. The report also shares that the specific areas in which companies see value from AI have evolved since the most reported revenue effects are found in marketing and sales, product and service development, and strategy and corporate finance. Finally, there has been no significant increase in organizations’ reported mitigation of risks relating to AI.
The ensuing sections on AI leaders and AI talent share that AI high performers, or the companies seeing the greatest financial returns from AI, continue to overtake competitors. These leaders in AI are making bigger investments in AI, engaging in advanced practices to fuel faster AI development, and exhibiting signs of coming out strong in the competitive market for AI talent. Overall, hiring AI talent remains a challenge for all organizations, and the tech talent shortage does not show signs of improving. Some alternatives to hiring new talent that are mentioned in the report include reskilling and upskilling. Also, there is definitely room to do better when it comes to gender, racial, and ethnic diversity on AI teams.
Why it matters: This report, fueled by a diverse range of responses, is a fairly accurate picture of how AI is being handled and viewed across most industries today.
As an AI-powered (or AI-interested) organization, it is important to understand the current state of affairs in AI, and learn from how it has evolved in the last few years. In several instances, the numbers are noteworthy. For example, the report states that 63% of respondents expect their organizations’ investment in AI to increase over the next three years.
Perhaps most relevant is that while AI use has grown, there have been no considerable increases in mitigation of AI-related risks from 2019 to now. Cybersecurity is the top risk that organizations are working to mitigate, according to 51% of respondents. The next risk is regulatory compliance, trailing behind at just 36%. What happens when AI use increases, but risk mitigation isn’t catching up? That is a question we all need to consider.
What it's about: In this article from Stanford University, experts benchmark 30 leading language models across several scenarios and metrics to explore their capabilities and risks. Language models, gaining their power from massive quantities of language data, represent the shift towards foundation models. Foundation models are machine learning models that can be customized to a diverse range of tasks. While many language models are used, they are not compared in a unified way.
As a new benchmark approach, the Center for Research on Foundation Models developed a holistic evaluation tool comprised of three elements:
Coverage and recognition of incompleteness. Holistic evaluation should clarify which major scenarios and metrics are missing in the evaluation of language models.
Multi-metric measurement. While benchmarking in AI usually focuses on accuracy, holistic evaluation should represent many desiderata.
Standardization. Since the language model in general is the object of evaluation, all major LMs should be evaluated on the same scenarios as part of holistic evaluation.
After running over 4900 evaluations of different models on various scenarios, the scholars identified 25 top-level findings. They selected five main points to share:
Instruction tuning, adjusting language models with human feedback, is effective regarding accuracy, robustness, and fairness.
Open models underperform non-open models.
On average, accuracy correlates with robustness and fairness.
The best strategy for adaptation is scenario- and model-dependent.
In some cases, human evaluation is essential.
Why it matters: These findings matter provide us a useful look into the landscape of language modeling, and foundation modeling in general, as it stands today. With the buzz and trepidation surrounding language models, it is more important than ever that they are measured. In other words, we need to take the time to understand what this technology is and is not capable of and what risks it entails so that we can truly comprehend its societal impact.
Designing and developing models you can trust means that transparency is essential. We’re already seeing how a lack of transparency in models like ChatGPT is raising major ethical concerns.
In understanding that new scenarios, metrics, and models will continue to materialize, the Stanford University team invites us all to highlight further gaps, help them prioritize, and offer new scenarios, metrics, and models. We can all help each other to make benchmarking AI models as holistic and complete as possible.
There's an explosion of powerful applications for these large language models. The key to making them useful is surrounding them with appropriate safeguards and risk management.
What it's about: While large language models have been celebrated as extraordinary and mind-blowing, people have also been harmed in very real ways by models failing to meet basic and necessary standards. Instead of focusing on the hype surrounding what these models are capable of, we need to hold model builders accountable for choices that hurt people.
One of the known vulnerabilities of language learning models is their failure to process negation. As models continue to grow in size and complexity, negation is one linguistic skill that has not seen improvement. Linguists have raised concerns that models are learning English without possessing the inherent linguistic abilities that would signify true comprehension.
Another dangerous problem is language models’ ability to create responses that contradict authoritative external sources. This article provides examples of major models creating a ‘scientific paper’ on the benefits of eating crushed glass (Galactica) and a text on ‘how crushed porcelain added to breast milk can support the infant digestive system’ (ChatGPT).
While the potential and realized harms of generative models have been studied, the human decision-making that is a critical part of model development is ignored. A model’s strengths are attributed to technological marvels, and the deliberate engineering choices that lead to their failures go unnamed. The report argues that it is possible to make better choices that result in different models.
Why it matters (from Merilys Huhn): Large language models are becoming ubiquitous. Development has accelerated over the last ten years and we've moved from a basic rules-based approach to chatbots nearly indistinguishable from human response. These models have proven to be wildly successful at a diverse range of tasks like question answering, summarization, and text-to-speech, but problems remain. There is a difference between identifying relevant information and developing actual understanding. Large language models have yet to cross that barrier.
Galactica calls this a hallucination. These models can output a response that sounds perfectly fine but that unravels in the presence of a little expert knowledge. This lack of understanding in combination with an inclination towards information it sees more frequently promotes bias even if it is not intentional.
It is hard to build these models. It is a lot harder to protect them from misuse. Privacy policies and terms of service won't prevent harm when people are intent on causing it. They won't prevent harm when people are indifferent to it. In using these models, we must also accept responsibility for proactively identifying and preventing harm.
What it's about: As part of its “What’s Next in Tech” series, MIT Technology Review offers four big bets for AI in the new year. 2022 was an innovative year for the AI industry, with breakthroughs in text, picture, and even video models. Two journalists offer an inside look into the next four biggest trends:
Multipurpose chatbots. In the future, language models may provide more functionality than the current language models we are familiar with. For example, models may be able to combine different modalities, like image or video, with text. A clear problem that we’re already seeing, however, is that these models will also adopt this generation’s problems of prejudice and difficulty telling fact from fiction.
AI’s first regulators. In 2023, regulators and lawmakers could introduce laws that ban AI practices that threaten human rights, restrict the use of facial recognition, hold companies accountable for AI use, and monitor how companies use AI algorithms and data.
Changing leaders. With AI research getting more expensive and hiring proving to be challenging, startups and academia may come out on top for AI research. Ultimately, AI will be less defined by the big companies that have often dominated the industry.
New era for biotech. The potential for AI to revolutionize the pharmaceutical industry has emerged. AI can now predict protein structure and speed up drug discovery. The implications for Big Pharma are huge.
Why it matters: The most salient point highlights 2023 as the year regulators will jump on AI. Until now there have been very few rules controlling AI use and development. The final version of the EU’s sweeping AI law, the AI Act, is expected to be finalized, and the Federal Trade Commission in the U.S. looks to protect Americans from unlawful surveillance and data security practices with “urgency and rigor”. There is no doubt the enforcement of these regulations will affect how technology companies build, use, and sell AI. However, tech lobbyists point out that regulators will need to protect companies in a way that allows for innovation.
The use of AI to assist in drug development means that Big Pharma will never be the same. Protein structures have been shared in public databases, and biologists and drug makers are already taking advantage of these resources. 2023 could be the year of mind-blowing advances in the field of pharmatech, and the timing is very ripe for a tool to address the gap between these policies and this wave of AI innovation.
A number of organizations have started releasing their own versions of “responsible AI guidelines.” Which ones should I be following?
The recent increase in AI adoption has led to a number of high-profile instances of AI system failures that highlight the potential shortcomings and biases of these systems, resulting in greatly increased public scrutiny and concern.
To minimize the risks associated with AI systems, a number of organizations and industry groups have released their own responsible AI guidelines, and government regulations will soon be introduced as well (see the EU's AI Act). All of this can lead to information overload when trying to decide what your organization should be focusing on.
As a starting point, check and see if any groups within your industry have released a set of responsible AI guidelines, as these would likely be most applicable to your organization. The work done by the Responsible Artificial Intelligence Institute is one great example.
While you may not find a single published set of guidelines that can be directly adopted by your organization, it can be helpful to look at some of the common themes that appear in the guidelines developed and published by others. These include: safety, fairness, accountability, transparency, explainability, privacy, human control, robustness, team diversity, and continual monitoring.
You may decide to emphasize some of these items more than others depending on your industry, company, and use case, but hopefully this provides a good reference framework as you start to build your own set of responsible AI guidelines.