A New Paradigm of Visual Interaction in the AI Era
Foreword
I've been pondering, in the age of artificial intelligence, what form will these "tools" we're so accustomed to take?
Currently, the most common application of artificial intelligence is usually limited to chat interfaces, which typically interact in a text-based question-and-answer mode. With the continuous advancement of technology, Vercel has also launched Generative UI, which can dynamically generate customized interactive user interfaces based on dialogue content and needs. What will the future of UI look like? Will it evolve into the kind of cool holographic displays and intuitive interactive HUDs seen in movies? Or might we usher in a new era that no longer relies on traditional GUIs?
Natural language becoming the fundamental interface for every software application might mean the end of GUIs
Random Thoughts
Will I be able to simply say "Siri, book me a plane ticket" in the future? Or, like the brain-computer interface experiments Musk is doing, will I only need an idea to transmit information? Will the information that needs to be presented be displayed directly in front of my eyes (would this mean blind people could see too?), without the need for external devices? The content that needs to be controlled could also be interacted with through fingers and eyeballs, just like VR glasses. Perhaps it will really be like Black Mirror in the future.
Research
AI Pin
The Ai Pin launched by Humane is a pioneer in this field. This product is magnetically attached to clothing and has no screen, being controlled by voice and a small touch surface. It has a laser that can project images onto your palm, and a camera that can recognize gestures for switching operations.
Its current and future possible functions: laser projection display, gesture and voice interaction, AI assistant, real-time translation, health monitoring, music playback, message reminders, etc.

Rabbit
Rabbit R1 is an AI device launched by Rabbit, positioned as your pocket companion. It mainly consists of a screen, camera, scroll wheel, and button.
Its current and future possible functions: understand and execute operations (hailing a taxi, ordering food, playing music), real-time translation, teaching mode, etc. You can view a review video.

Dot
Dot is a chat application on iOS that allows users to send text, voice memos, images, and PDF files. It can also search the web for you. Currently, Dot communicates through text and aims to be an always-available companion. Unlike most AI conversations, Dot remembers what you've said before and hardly forgets anything.

Amazon Lex
Amazon Lex V2 is a service provided by AWS that allows developers to build conversational interfaces for applications, supporting voice and text input. Users can interact using natural language, such as placing orders. In addition, Lex V2 can be integrated with web elements (buttons, selection boxes, etc.) to solve user problems or help them complete specific tasks through Semi-Guided Conversations.
Summary
The examples given are not necessarily representative, but they can simply show the overall trend:
- The hardware carrier has changed. In addition to mobile phones and computers, people are also trying to present content on more portable devices, such as necklaces, badges, and pagers.
- The presentation of visual images is more diversified, such as projection and laser display. Due to size limitations, the main forms are text, simple charts, and button information.
- Natural language interaction will be used more and more widely.
My Thoughts
Generative UI is the Future Trend
Currently, interface designs must satisfy as many people as possible. Any experienced professional designer knows the main drawback of this approach - you can never completely satisfy anyone. Personalization and customization also play a relatively small role.
This application scenario is somewhat similar to the dynamic personalized cards implemented by Alipay's home screen based on location (displaying airport-related information at the airport).

Moreover, generative UI is not limited to dynamic cards; it can also dynamically display an entire system for you based on personal data. The Generative UI and Outcome-Oriented Design article gives examples of what true personalization looks like:
- For those with dyslexia - the application displays special fonts and color contrasts
- For users who are concerned about cost and time - display this prominently and automatically increase the weight of corresponding flights
- Although it is the same application, the interface, functions, etc. are all personalized

Designers will also shift from universal design to personalized design, setting boundaries for various personalized scenarios to strengthen constraints on artificial intelligence.
Traditional Interaction Will Not Be Replaced, But Will Be Greatly Simplified
Although natural language interaction has advantages, visual elements such as buttons, icons, and gestures are still necessary in scenarios requiring fast and accurate input; pure voice input may reduce efficiency. And not all tasks can be well completed through voice conversations.
Different interaction methods are suitable for different situations; the coexistence of visual and voice input/output can maximize the satisfaction of accessibility design principles, allowing people with disabilities to choose the method that suits them. After LLMs can complete specific tasks, the interface will be greatly simplified.
In addition, natural language interaction is not applicable to all scenarios, such as in companies or libraries, as it can also cause privacy leaks and other issues.
Final Thoughts
Although I wanted to write as expansively as possible, I ended up back at the GUI itself, just thinking about the future form of existing products...
References
- ScreenAI: A visual language model for UI and visually-situated language understanding
- The AI Device Revolution Isn’t Going to Kill the Smartphone
- Meet Dot, an AI companion designed by an Apple alum, here to help you live your best life
- Malleable software in the age of LLMs
- Generative UI and Outcome-Oriented Design
- UFO: A UI-Focused Agent for Windows OS Interaction
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs