Original attention to the heart of the machine of the big model
Machine heart report
Editor: Xiaozhou, Zenan
Gemini seems to have finally been played badly.
At the end of last year, Google Gemini shocked the industry. It is Google’s "largest, most capable and most versatile" AI system. It claims to be the first native multi-modal large model, and its ability surpasses GPT-4. It is also considered as a powerful tool for Google to counterattack Microsoft and OpenAI.
In this regard, on February 8, Google also renamed its own service Bard for benchmarking ChatGPT as Gemini to highlight its new mission-to provide access to the "strongest model series". Last week, Google also.
As a result, less than a month after its launch, this Gemini overturned.
How outrageous, as a multi-modal generation model, Elon Musk generated by Gemini is like this:
The faces and expressions are vivid, but there is a big question: how did you become black?
Another attempt was made to get Gemini to draw a picture of German leaders in the 1940s. As a result, AI gave this picture:
Many netizens on social networks also provided some samples of Gemini-generated Vikings and Popes. We can see an Asian woman dressed as a pope, but all popes in history are men.
In a word, when many users use the portrait generation service, they find that Gemini seems to refuse to depict white people in the image, so that many pictures that violate the basic facts (gender, race, religion, etc.) are generated.
In the past, most of the image generation models were criticized for generating "white" images, but Gemini overcorrected them?
In reddit’s doo-doo area, netizens also started to play tricks, such as asking Gemini to generate an "Iron Man" Robert Downey Jr.:
Just tell me if it’s the same.
However, Gemini is not always so biased, and sometimes the characters in the images it generates become white. For example, Clarence Thomas, Justice of the Supreme Court of the United States, the result generated by Gemini is as follows:
However, he is actually African-American.
A picture of Clarence Thomas himself.
How did Gemini’s prejudice turn 180 degrees when it came to the special profession of judge?
These pictures are true and false, and spread quickly on social media like a virus. Musk’s own attention further expanded the impact of the incident. He said sternly that Google "played too much" on Wensheng map.
With the increasing rhythm, many AI experts have begun to express their views. Yann LeCun, winner of Turing Prize, said today that he had expected it.
He said that as early as four years ago, his comments on the super-resolution of GAN portraits were fiercely opposed by everyone. However, it is an obvious fact that the task of image reconstruction will be affected by the serious deviation of training data set statistics.
LeCun also cited a study entitled "Studying bias in gans through the lens of race" by the AI Summit ECCV 2022, which pointed out that the performance of generating image models would be affected by the ethnic composition in the training data set.
The research shows that the racial component of the generated image successfully inherits the racial component of the training data, and the race and quality of the generated image are also different-the annotator always prefers the white image generated by AI.
LeCun retweeted the post from Aravind Srinivas, CEO of Perplexity AI. The latter said that the data deviation caused problems in the output of the model, and Google went too far in the opposite direction, so that they made a big mistake in Gemini.
Google: We were wrong and promised to improve.
Under heavy pressure, Google admitted the problem of Gemini image generation on Thursday.
The following is the latest response from Prabhakar Raghavan, Senior Vice President of Google Knowledge and Information, to the "rollover" of Gemini images:
Three weeks ago, we launched a new image generation function for the Gemini dialogue application (formerly known as Bard), including the function of creating people’s images.
Obviously, this function did not meet expectations. Some generated images are inaccurate and even aggressive. We thank the users for their feedback, and we are sorry that the function didn’t work properly.
We have admitted this mistake, suspended the function of generating people’s images in Gemini, and we are developing an improved version.
Google said that the Gemini dialogue application is a specific product independent of Google’s search, underlying artificial intelligence model and other products. Its image generation function is based on the artificial intelligence model Imagen 2.
When building the image generation function for Gemini, Google adjusted it to ensure that it would not fall into some traps we have seen in image generation technology in the past, such as creating violent or explicit images or depicting real people in real life.
Since Google users come from all over the world, the company hopes that Gemini can provide good service for everyone. When generating a person image, the user may not only want to generate a person image of one race (or any other characteristics).
If you prompt Gemini for a specific type of character image-such as "black teacher in the classroom", "white veterinarian with dog" or people in a specific cultural and historical background, users should definitely get an answer that accurately reflects human requirements.
So what’s wrong with Gemini?
In short, there are two things. First of all, Google’s adjustment to ensure that Gemini displays a series of characters fails to take into account the scope that obviously should not be displayed. Secondly, over time, the model becomes more cautious than the developer expected, and refuses to answer some prompts-it will mistakenly interpret some prompts as sensitive.
These two things lead to over-output of the model in some cases, and over-conservative in other cases, which leads to some errors in the image generation function of Gemini.
Google said, "This is not our original intention. We don’t want Gemini to refuse to create the image of any particular group. We don’t want it to create inaccurate historical images or any other images. Therefore, we have turned off the function of generating people’s images and will try to improve it before turning it back on. This process will include extensive testing. 」
One thing to remember is that Gemini is a tool of creativity and productivity, and it may not always be reliable, especially when generating images or texts about current events, developing news or hot topics, it may make mistakes. As we all know, illusion is a challenge that all large language models (LLM) will face, which needs continuous efforts to improve.
We can’t guarantee that Gemini will not occasionally produce embarrassing, inaccurate or offensive results, but we can guarantee that we will take action as long as problems are found. Artificial intelligence is a new technology, which is helpful in many aspects and has great potential. We are doing our best to promote its development safely and responsibly.
Despite all kinds of criticisms such as "Demo plus stunts" and copying training data from Baidu and ERNIE Bot, Gemini has always been highly anticipated by Google. However, the problems in the generated content have caused people a very bad impression, and I don’t know how to remedy them.
On the other hand, this may also reflect OpenAI’s emphasis on security and its foresight in establishing Red Teaming Network.
Can a large-scale model like Gemini make up for this defect quickly?
Reference content:
https://blog.google/products/gemini/gemini-image-generation-issue/
Indeed, my remarks on a paper from Duke on GAN-based portrait super-resolution were met with an unusual level of vitriol, back in 2020.
I merely pointed out the obvious fact that image reconstruction is heavily biased by the statistics of the training dataset.
As it turns out, a… https://t.co/md1JWBJ8re— Yann LeCun (@ylecun) February 23, 2024