Multimodal Browser Extends Mobile Applications

Oct. 1, 2002

Now that text services are becoming ubiquitous with mobile devices, the complaints are growing about "triple tapping" or trying to write PDA-style script. Customers simply don't want the hassle. Yet most computing applications—like those finding their way into PDAs—have been spawned from keyboard-reliant PCs. As a result, designers are compelled to somehow mimic that form of text entry.

According to IBM (www.ibm.com) and Opera (www.opera.com), a more user-friendly approach could be found in multimodal technology. Together, the two companies are developing a multimodal browser based on the XHTML+Voice (X+V) specification. The beta version of the browser allows access to the Web and voice information from a single mobile device.

Within the same interaction, multimodal technology accepts the interchangeable use of multiple forms of input and output. Examples include voice commands, keypads, or stylus. It therefore permits end users to interact with the technology in the ways that are most appropriate to their situations. Multimodal applications can bring tremendous benefits to field-force automation, for instance. Off-site workers could vocally request inventory information from the factory floor and have hands-free access to information. Any information requested could be sent back to them either in text or as graphics.

This project builds upon the two companies' ongoing relationship. Last year, IBM, Motorola, and Opera submitted the multimodal standard X+V to the standards body W3C. That markup language leverages the existing standards that are already familiar to voice and Web developers. They can therefore use their skills and resources to extend current applications, instead of building new ones from scratch.

The browser partnership with Opera also comes on the heels of IBM's release of its multimodal toolkit for developers, as well as the planned addition of multimodal capabilities to its WebSphere Everyplace Access (WEA). Built on IBM's WebSphere Voice Toolkit, the multimodal toolkit will contain a multimodal editor; reusable blocks of X+V code; and a simulator for testing applications. In the multimodal editor, developers can write both XHTML and VoiceXML in the same application. The toolkit also adds Eclipse-based plug-ins to a Web developer's existing WebSphere Studio development environment. Clearly, IBM is leveraging its resources and partnerships to mold technology into a more malleable form that allows it to adapt to users—instead of the other way around. At the same time, this multimodal development is furthering speech technologies.