The new world of multi-modal AI

May 24, 2024

1×

0:00

-9:39

As artificial intelligence continues to evolve, its integration into wearable technology is opening up new and exciting possibilities. One particularly intriguing application lies in the realm of legal proceedings, where wearable devices equipped with multimodal AI could provide real-time insights that were previously only seen in TV shows.

Multimodal AI

Multimodal AI refers to systems that can process and integrate multiple types of data simultaneously, such as text, audio, visual inputs, and more. This capability is a significant leap from traditional AI systems, which typically handle one type of data at a time. By combining various types of inputs, multimodal AI can achieve a more comprehensive understanding of complex situations. One of the more important announcements recently made by OpenAI and Google is the increased focus on improving the visual capabilities of their models and technology. Google has recently shown demos where the user could wear glasses that would allow the user to share what user was viewing with the AI. OpenAI made similar demo videos slightly before Google, showcasing similar abilities while using the ChatGPT app for iPhone. By analyzing visual inputs, the AI is now capable of solving problems using cameras. To take an example found in both the videos above, if one were to write a math problem on a whiteboard or piece of paper in front of them, AI would be able to help them solve it by simply speaking to the AI. Prior to these demonstrations, one would have to provide photos to the LLM. This recent emphasis on using continuous visual inputs and real-time interaction makes the process more convenient for users.

Wearable Tech: The Future of Legal Analytics?

Imagine a lawyer equipped with smart glasses during a trial. These glasses could have built-in cameras and microphones, feeding data to an AI system that analyzes visual and auditory cues in real-time. The AI could monitor jurors' facial expressions, body language, and even voice tones to gauge their reactions and sentiments towards the case being presented. For instance, the Ray-Ban Meta smart glasses have recently been upgraded to support multimodal AI, which could allow the user to modify them to perform sentiment analysis if they do not have these capabilities already. Moreover, there are speakers in these glasses that are quiet enough so that only the wearer will hear them. This means that someone could be presenting and hear feedback on how they are doing in real-time.

Real-Time Sentiment Analysis

Sentiment analysis is a powerful tool that can determine the emotional tone behind words and expressions. Facial expression-based sentiment analysis relies on computer vision techniques to detect and interpret the emotions conveyed by a person's face. The process begins with face detection, where algorithms locate and isolate the face from the rest of the image or video frame. Next, facial landmark detection is applied to identify key points on the face, such as the corners of the eyes, nose, and mouth. These landmarks are then used to extract facial features, such as the position and shape of the eyebrows, the openness of the eyes, and the curvature of the lips. Machine learning models, often based on deep learning architectures like convolutional neural networks (CNNs), are trained on large datasets of labeled facial expressions to learn the associations between these facial features and specific emotions, such as happiness, sadness, anger, surprise, fear, and disgust. When a new facial image is presented to the system, it goes through the same process of face detection, landmark detection, and feature extraction, and the trained model predicts the most likely emotion based on the extracted features. This allows for real-time sentiment analysis in various applications, such as customer satisfaction monitoring, market research, and human-computer interaction.

In a courtroom setting, this capability could be revolutionary. By analyzing the reactions of jurors, a lawyer can gain immediate feedback on how their arguments are being received. Are jurors engaged, confused, or skeptical? This information could help lawyers adjust their strategies on the fly, potentially increasing their chances of success. A notable example is Hume AI, which uses advanced AI to provide real-time sentiment analysis by listening to the voice of the speaker. Of course, someone might ask how this could relate to a juror because they do not talk. However, the demo linked above by OpenAI showcased the ability to do sentiment analysis by the expression on the face of the user by asking ChatGPT to guess how the user felt while giving a great big grin to the camera. While the expression on the face of the user during the demo made it obvious what the answer was, in the future, this technology will likely become better at determining the meaning behind more subtle facial expressions. The camera will be able to record facial expressions throughout the entire trial, which could be used to determine a baseline expression that is precisely tailored to the individual person and then determine what are deviations from that baseline. Considering that the basic building blocks for this technology are already commercially available, it may not be too far in the future that robust systems for sentiment analysis become available for purchase.

AI-Powered Photo Analysis

Google's recent advancements in AI for photo analysis offer another layer of potential for legal analytics. Google Photos can now use AI to analyze and categorize images, providing detailed insights based on visual data. Google's demo of this technology showed someone asking AI to show how their children's swimming has evolved over time. The AI collected images of the child from when the child was young to when the same person became older, showing the child swimming in a variety of different locations.

This capability could be adapted to analyze jurors' reactions captured through smart glasses, offering comprehensive feedback on their responses over time. Such analysis could help lawyers identify which arguments are most persuasive, enhancing their strategies throughout a trial. Of course, the attorney cannot look at the jurors all the time so backup cameras will most likely be needed as well for most of the trial.

There are also interesting uses for this capability outside of sentiment analysis. One example is that this technology could help significantly reduce the amount of time spent on creating presentations for the case. One common way that attorneys like to present the evidence of a case is through timelines. One could imagine a world where the user says to AI "Make me an analysis of how discrimination against my client has increased throughout her employment" and the AI would actually do just that. Attorneys might also do the same for how a company's policies have changed over the years regarding something specific to their case.

Artificial Judgment

Discussion about this post