The difference between a conventional model and a reasoning is similar to the two types of thinking described by Nobel Prize winning economist Michael Kahneman in his book 2011 Think quickly and slowly: Fast and instinctive system-1 thinking and slower more deliberative system-2-thinking.
The type of model that made Chatgpt possible, known as a large language model or LLM, provides immediate answers in a quick way by questioning a large neural network. These outputs can be strikingly smart and coherent, but cannot answer questions that need step-by-step reasoning, including simple arithmetic.
An LLM may be forced to imitate deliberative reasoning if it is instructed to come up with a plan which must then follow it. However, this trick is not always reliable, and models usually struggle to solve problems that need extensive, careful planning. Openai, Google and Now Anthropic all use a machine learning method known as reinforcement learning to get their latest models to learn to generate arguments that point to correct answers. This requires that additional training data be collected from people to solve specific problems.
Penn says Claude’s reasoning mode received additional data applications about business applications, including writing and correcting code, the use of computers and answering complicated legal questions. “The things on which we have made improvements are … technical subjects or subjects that require long reasoning,” says Penn. “What we have about our customers is a lot of interest in the deployment of our models in their real workload.”
Anthropic says Claude 3.7 is particularly good at solving coding problems that need step-by-step reasoning, and eliminate the Op “Oppay’s O1 on some benchmarks like SW-Bench. The company today releases a new tool called Claude Code, specifically designed for this type of AI-assisted coding.
“The model is already good at coding,” says Penn. But “Additional thinking would be good for cases that could require a lot of complicated planning – say that you are looking at an extraordinary big code base for a business.”