Future(connected) = IoT + AI + ML

What you'll learn:

Building blocks of the connected future.
AI for physical autonomy.
Implementing multimodal AI at the edge.
Practical examples of multimodal AI.

The pace of change in artificial intelligence (AI) is staggering. Advances arrive almost daily, and the availability of datasets and AI models is constantly expanding. In parallel, the hardware for training models on datasets and running those models to infer actionable insights is also developing rapidly.

The electronics industry provides the framework for these developments. When change happens at this scale, the entire industry reacts. We’re seeing that throughout the distribution industry, as more customers are eager to adopt AI and machine learning (ML) to enable their future products and services.

At a high level, the rapid expansion of AI and machine learning adoption is hugely important (there are over 1.8 million AI models available on Hugging Face right now and counting). The successful commercialization of any technology requires the momentum generated by a critical mass.

Many of the semiconductor manufacturing partners working with distributors have recognized the direction of travel. Partners are collaborating to make AI and ML more accessible, using pre-trained models that can be deployed on their hardware platforms for evaluation or even large-scale deployment.

These models, kept in so-called “zoos,” are typically available from repositories such as GitHub, without cost through open-source licenses. Distributors work with suppliers to integrate these models into pre-compiled demonstration platforms that run on their hardware, “out of the box.”

Building Blocks of the Connected Future

Any new technology goes through the same phases. Initially, there’s academic interest (the innovator phase), where the user experience (UX) can be suboptimal. We then hit the early adopter phase, where the UX is refined enough to attract more attention, but the return on investment (ROI) has yet to be fully demonstrated. What that ROI looks like to you will differ, based on what you care about.

Refining the ROI for the masses takes us to the early majority phase. Re-spinning the ROI takes us to the late majority phase. Despite the huge uptake and demonstrable revenue being generated, AI is still at the early adopter phase. Multimodal AI will bring us into the early majority stage.

Multimodal is both inevitable and crucial for the future of AI. In simple terms, multimodal means using more than one type (of AI) at the same time. Today, most types of AI, like large language models, use just one modality. For chatbots, that modality is typically text. For facial recognition, the modality is images. Clearly, the data for each is represented differently and the model needs to understand what it’s dealing with.

Multimodal generative AI running on an edge processor — An example of multimodal generative AI running on an edge processor that uses speech-to-text, a local LLM with retrieval augmented generation (RAG), and text-to-speech to listen and respond to users through natural language.

We’re also now seeing the emergence of agentic AI systems. Agentic refers to a system that comprises separate AI agents working collaboratively and with minimal human direction to achieve a task. For this reason, they’re also known as task-driven AI systems. An AI agent is an instantiation of an AI model trained for a specific task, such as facial recognition, and empowered to operate autonomously.

AI agents will be important building blocks for constructing autonomous systems. In turn, autonomous systems will serve as a waypoint, guiding us toward a future where AI seamlessly integrates into our lives.

AI for Physical Autonomy

Physically autonomous systems also rely on multiple modalities. Those systems can already be described as multimodal, but it’s probable that the subsystems have limited interaction. For example, an autonomous mobile robot (AMR) may use a LiDAR system to navigate a factory floor, and, separately, indoor position sensing to understand its location in the building. The output of those two subsystems would need to be simultaneously processed using AI to really describe that as a multimodal AI solution.

In our AMR example, we might use the output of the navigation subsystems, combined with other sensor data, as the input to a multimodal AI model. That model may infer from the data presented, as a separate function, the probability of human operatives being in the area. The autonomy of the AMR increases by using multimodal AI to process data originally generated purely for navigational purposes.

Achieving multimodal AI presents clear challenges. The most obvious is that AI models normally operate on one type of data. In our example, the model would need to understand two types of sensor data. In a true multimodal AI system, the navigation systems would each use a single-modality AI model and a third model that understands the output of the first two.

Even this relatively simple example may need three AI models running concurrently. The processing resources required to execute just one AI model are significant, particularly at the edge of the network.

Implementing Multimodal AI at the Edge

For embedded and connected systems, multimodal AI will include ML. Since AI’s ascendence, machine learning has become less prominent. This will change as multimodal approaches build momentum, because ML offers a (relatively) lightweight solution to implementing trained models in constrained devices.

More relevantly, ML is optimized for specific tasks. Today, these predominantly relate to analyzing the data produced by small sensors. Predictive maintenance is often cited as the “killer app” for ML at the edge, and there’s evidence to support that, but we’re also seeing increased interest around time-series data for other purposes. When we use ML to analyze time-series data for specific events, those observations can form part of a larger multimodal AI solution.

Achieving viable multimodal AI could take several forms:

Developing models that are trained on more than one data type
Running more models (using multiple single-core processors, or multicore processors)
Developing simpler models that can run concurrently on single-core processors
Cascading multiple AI or ML models on the same hardware platform
Combining multiple hardware-based ML solutions

There are examples of multimodal AI models that understand more than one type of data, but they have proportionately more parameters and generate larger tokens than other single-modality models. That may be fine when using cloud processing, but the models would need to be optimized to run on the limited resources of edge-based solutions, such as our AMR example.

In these applications, it’s more common to see a single application-class processor (typically based on an Arm Cortex-A core), with one or multiple microcontrollers (often based on Arm Cortex-M).

Leading semiconductor suppliers are already making strategic acquisitions to strengthen their offerings in these areas. And that includes both hardware and software acquisitions. More development environments now support time-series data analysis for model training. There are also examples of how engineers can use these tools to develop multimodal solutions.

Practical Examples of Multimodal AI

In the list above, the last two options are perhaps the most technically interesting. Cascading is the AI equivalent of signal conditioning and pre-processing. It uses smaller models to make general assessments on data, triggering larger models only when necessary.

This is very similar to how some microcontrollers use autonomous peripherals to monitor hardware when the core is in a deep-sleep state. The peripheral generates an interrupt to wake the main processing core only when it’s needed, reducing system power without sacrificing performance. Cascading ML models follow the same principle.

Running algorithms on microcontrollers is the norm for ML at the edge. Some manufacturers are taking this a step further, embedding AI or ML directly in the sensor. We have seen examples that include vision sensors with built-in AI processors, and MEMS sensors with additional logic cores designed to execute trained ML models. The potential demand for multimodal AI will undoubtedly accelerate this trend.

Generative AI is the type of AI most people will be familiar with. Moving generative AI to the edge is a focus for many in the industry. The next step will be to make those edge-based systems multimodal.

Conclusion

An inexorable demand is building for deploying multimodal AI at the edge. We have the building blocks, the hardware, and, with middleware platforms like /IOTCONNECT on AWS, the cloud infrastructure.

Business cases are emerging more rapidly than most expected. We have passed the proof-of-concept stage, and the commercial opportunities that multimodal AI creates are real. The momentum created by AI over the last few years continues to gain speed, and distributors are prepared to support their customers through these advances.