MOSCOW, March 15, by Vladislav Strekopytov. Osaka University has developed a neural network capable of reconstructing the image a person is currently looking at. Analyzing functional MRI data, the system quite accurately reproduces not only the shape, but also the colors of objects. Scientists are talking about the world's first mind-reading machine.
Computer eye
The scope of promising computer vision technology is extremely wide — from the communication of paralyzed people to recording human dreams and studying how various animals perceive the world around them.
Japanese researchers took advantage of the popular image generation with text Stable Diffusion. This open source neural network does not differ in structure from other generative LLMs (Large language models, large language models), such as DALL-E2 from OpenAI (the creator of the ChatGPT chat bot) or Midjourney.
It is based on diffusion, a machine learning method, when a visual image is formed through successive approximation. Each new iteration is based on the next text hint.
The Japanese have added an additional learning stage to the standard Stable Diffusion scheme. The neural network compared the brain activity data of four participants in the experiment, who were shown different photos with text descriptions of the images.
The data of functional magnetic resonance imaging (fMRI) obtained on powerful devices with a magnetic field induction of 7 T were taken as initial signals. By recording the flow of oxygen molecules that neurons need to work, these devices are able to track which areas of the brain responsible for certain feelings or emotions are most active.
At the stage of machine learning, ten thousand images were shown to the participants, and the system collected the generated fMRI patterns, which were then decoded by artificial intelligence. Some of the brain scans were not used — they were then used to make a test task for the machine. ratio=»0.67734375″ data-crop-width=»600″ data-crop-height=»406″ data-source-sid=»cc_by-sa_40″ class=»lazyload» width=»1920″ height=»1301″ decoding =»async» />
Brain Synergy
After analyzing the peaks recorded by fMRI in different areas of the brain, the scientists found that the temporal lobes are responsible for the content of the image. This is the so-called semantic zone. And the occipital cortex, where the visual cortex is located, recreates the size and general arrangement of objects.
The results were generally consistent with the hypothesis of two streams of visual information, formulated in 1983 by the American neuropsychologist Mortimer Mishkin. He suggested that there are two anatomically and functionally different channels in the cerebral cortex for processing spatial and subject information: «Where?» and «What?».
On rhesus monkeys, Mishkin showed that the occipital (dorsal) canal «Where?» is responsible for the perception of space, and the temporal (ventral) channel «What?», closely associated with memory, is for recognition.
The Japanese combined visual and semantic information. The diffusion algorithm compared the observed patterns of neural activity formed when viewing photographs with patterns in the training dataset. According to the signals from the «visual» zone of the cortex, the overall volume and perspective were built. Then prompts from the semantic signal decoder were connected, and the primary picture, more reminiscent of interference on a TV screen, gradually took on the outlines of recognizable objects. -type=»photo» data-crop-ratio=»0.702247191011236″ data-crop-width=»600″ data-crop-height=»421″ data-source-sid=»cc_by_40″ class=»lazyload» width=» 1920″ height=»1348″ decoding=»async» />
Scientists received about a thousand pictures, and they coincided with the original in meaning and content with an accuracy of 80 percent. In most cases, the AI even recreated the color gamut of the original image.
«The presented article demonstrated that the Stable Diffusion neural network can accurately reconstruct images from fMRI scans, and this allows you to effectively read people's minds,» the summary notes. which was also written by a generative chatbot.
«We show that our method based on the activity of the human brain is able to reconstruct images with sufficient resolution and high semantic accuracy,» the researchers themselves specify. .jpg» media-type=»photo» data-crop-ratio=»1″ data-crop-width=»600″ data-crop-height=»600″ data-source-sid=»cc_by_40″ class=»lazyload » width=»1920″ height=»1920″ decoding=»async» />
Active Helpers
The authors of the work emphasize that the model they proposed is universal, not requiring fine individual adjustment to the brain of a particular person. The algorithm interprets not only the activity in the «visual» cortex, which is responsible for the perception of shape and color, but also the processes in the neighboring «semantic» zone of the brain, where the visual cortex meets the auditory cortex and where the meanings of words are encoded.
So far, however, scans of the brain activity of the same four participants in the experiment on which the machine was trained have been used as test samples. That is, the semantic decoder was tuned to recognize specific pre-learned signals.
In addition, the subjects were actively tuned in to the experiment. As the fMRI imaged their brains, they mentally «spoke» the picture, describing in words everything they see. For the car, these were additional clues.
The Secret of Portrait Likeness
The Japanese have achieved a lot. But this is far from the first attempt to combine the capabilities of AI and modern high-precision devices that read brain signals.
Last year, researchersfrom the Netherlands showed two volunteers photographs of human faces while scanning their brains on an fMRI machine. AI processed the received information and reconstructed the original images. The similarity was such that the experts questioned the authenticity of the experiment.
The secret is in the thorough pre-training of the AI. At the initial stage, the same volunteers viewed digital images of faces on a computer screen, and the system «per pixel» tracked the reaction of neurons, translated it into computer code, and reassembled the portraits again.
The authors emphasize that for the main test they selected photos that neither the subjects nor the neural network had ever seen before.
there is no doubt that fMRI/AI-based computer vision systems have a great future.
«We are already developing cameras to implant in the brains of people who have become blind due to illness or accident so that they can see again — Tirza Dado, a cognitive biologist at Radboud University, said in an interview Daily Mail. «These technologies will also be useful for some clinical applications, such as communicating with patients in a deep coma.»
In the next stage of research, scientists want to try to decipher and recreate the subjective experiences of the test subjects, their memories and even dreams. But real mind reading is still a long way off.