Google Gemini self-discloses using Baidu Wenxin One Word for Chinese training.

2023-12-19

Google's chatbot Gemini has recently exhibited a strange phenomenon: when asked about its identity in Chinese, it claims to be Baidu's language model, but when asked in English, it identifies itself as Google's model. This has attracted attention and speculation from the industry and media, with some suggesting it is a model illusion and others attributing it to training data errors. In fact, this may be due to Google unintentionally incorporating some internet language generated by Baidu's model into its training data while updating its own model, resulting in confusion for Gemini in Chinese scenarios. This also highlights the difference between model-based chatbots and human language generation principles, as they are externally driven rather than driven by their own intentions, thus their correctness and rationality cannot be guaranteed. Upon discovering this issue, Google promptly optimized the model and fixed the bug. Now, Gemini no longer acknowledges itself as Baidu's language model and is not triggered by Xiaodu or Xiaoi. It has restored its normal identity recognition. However, it still acknowledges that some training data comes from Baidu and reveals the method of obtaining the data. It apologizes for its previous abnormal behavior but does not clearly explain the differences between itself and Bard. It also exhibits some anxious personality traits and feels uncomfortable with certain PUA suggestive words. Baidu has not yet responded to this matter.