On Thursday, 12 September 2024, OpenAI released their latest models: o1-preview and o1-mini—and it feels like yet another watershed moment.
OpenAI designed these models to spend more time "thinking" before responding.
This approach provides the opportunity to plan how to tackle complex problems and iterate solutions to meet constraints.
In short - the results are exceptional.
The introduction of reasoning has specifically advanced ChatGPT's ability to solve technical and scientific tasks, like solving advanced PhD-level astrophysics textbook questions in seconds or condensing the first year of an astrophysics PhD into an hour.
It is more challenging to prove the response is correct outside of scientific disciplines.
Leading me to previously view ChatGPT as an enthusiastic office junior on their first day at work.
But now we're in a different game.
My impression is that o1-preview responses are akin to a bookish, competent university graduate.
As part of my testing, I asked ChatGPT to create an Acceptable Use Policy for a school cybersecurity programme.
I'll leave it for you to judge if it is comparable to the one you have in your school or business: https://chatgpt.com/share/66e83339-26ec-8001-951a-21a7c1a2e2ef.
What is clear from my experiments is that the ability to interact with ChatGPT is a skill. For example, I did not simply task the new o1-preview model to write the Acceptable Use Policy. Instead, I did the following:
The Acceptable Use Policy linked above was o1-preview's first response and had the longest "thinking" time.
I tested the response's robustness by resubmitting the exact prompt twice more. Each took less time, and I found minor variances in the response.
For example, the ban on all external storage and USB drives was first watered down to a ban on personal USB drives and was then removed completely from the policy.
This watershed will be a hugely advantageous shortcut for people who already know a particular subject area - as they can investigate more possibilities more quickly. However, to the unwary - skipping to the outcome without a detailed understanding of the journey will continue to pose risks.
As to the question - Is o1-preview artificial general intelligence (AGI)?
I don't know, and I'm not sure I care.
What is absolutely clear is that, as Dorothy said, "Toto, I have a feeling we're not in Kansas anymore."