OpenAI Secretly Developing Q*, Advancing Towards General Artificial Intelligence

2023-11-24

According to reports, OpenAI is researching a project called Q* (pronounced Q-Star) that can solve unfamiliar mathematical problems. This new development is taking place against the backdrop of Andrej Karpathy's recent centralization and decentralization thinking. Some people at OpenAI believe that Q* could be a significant step towards achieving Artificial General Intelligence (AGI). However, this new model has raised concerns among some AI security researchers, especially after the model demonstration circulated internally at OpenAI in recent weeks, accelerating technological progress, according to Information. The model was created by OpenAI's Chief Scientist Ilya Sutskevar and other top researchers Jakub Pachocki and Szymon Sidor. Interestingly, this new development comes after Andrej Karpathy recently posted on X, saying that he has been thinking about centralization and decentralization. Karpathy mainly discusses building an AI system that involves the trade-offs between centralized and decentralized decision-making and information. To achieve the best results, you must balance these two aspects, and Q-learning seems to fit this equation perfectly, making all of this possible. What is Q-learning? Experts believe that Q* is built on the principles of Q-learning, which is a fundamental concept in the field of artificial intelligence, particularly in the field of reinforcement learning. The Q-learning algorithm is classified as model-free reinforcement learning and is designed to understand the value of an action in a specific state. The ultimate goal of Q-learning is to find an optimal strategy that defines the best action to take in each state, maximizing the accumulated reward over time. Q-learning is based on the concept of the Q-function, also known as the state-action value function. This function operates with two inputs: a state and an action. It returns an estimate, which is the total expected reward from starting at that state, taking that action, and then following the optimal policy. In simple examples, Q-learning maintains a table called the Q-table, where each row represents a state and each column represents an action. The entries in this table are Q-values, which are updated as the agent learns through exploration and exploitation.