Fresh Thoughts #136: We're Not in Kansas Anymore

    Newsletter
Pair of Ruby Slippers

On Thursday, 12 September 2024, OpenAI released their latest models: o1-preview and o1-mini—and it feels like yet another watershed moment.

OpenAI designed these models to spend more time "thinking" before responding.
This approach provides the opportunity to plan how to tackle complex problems and iterate solutions to meet constraints.

In short - the results are exceptional.

The introduction of reasoning has specifically advanced ChatGPT's ability to solve technical and scientific tasks, like solving advanced PhD-level astrophysics textbook questions in seconds or condensing the first year of an astrophysics PhD into an hour.

It is more challenging to prove the response is correct outside of scientific disciplines.
Leading me to previously view ChatGPT as an enthusiastic office junior on their first day at work.
But now we're in a different game.
My impression is that o1-preview responses are akin to a bookish, competent university graduate.

As part of my testing, I asked ChatGPT to create an Acceptable Use Policy for a school cybersecurity programme.
I'll leave it for you to judge if it is comparable to the one you have in your school or business: https://chatgpt.com/share/66e83339-26ec-8001-951a-21a7c1a2e2ef.

Final Thoughts

What is clear from my experiments is that the ability to interact with ChatGPT is a skill. For example, I did not simply task the new o1-preview model to write the Acceptable Use Policy. Instead, I did the following:

  • Used o1-mini to enumerate the questions and information needed to solve the task problem and create a prompt to generate the Acceptable Use Policy.
  • Used gpt-4o to generate typical responses to the questions.
  • Used o1-preview to run the prompt and create the policy.

The Acceptable Use Policy linked above was o1-preview's first response and had the longest "thinking" time.
I tested the response's robustness by resubmitting the exact prompt twice more. Each took less time, and I found minor variances in the response.
For example, the ban on all external storage and USB drives was first watered down to a ban on personal USB drives and was then removed completely from the policy.

This watershed will be a hugely advantageous shortcut for people who already know a particular subject area - as they can investigate more possibilities more quickly. However, to the unwary - skipping to the outcome without a detailed understanding of the journey will continue to pose risks.

As to the question - Is o1-preview artificial general intelligence (AGI)?
I don't know, and I'm not sure I care.
What is absolutely clear is that, as Dorothy said, "Toto, I have a feeling we're not in Kansas anymore."

September 17, 2024
2 Minutes Read

Related Reads

screwed up pieces of paper while writing an essay

Fresh Thoughts #126: It's a Journey

"Have you ever gone back and read a university or high school essay? Terrible, isn't it..."

Fresh Thoughts to Your Inbox

Fresh perspectives on cybersecurity every Tuesday. Real stories, analytical insights, and a slash through buzzwords.

We'll never share your email.

Subscribe to Fresh Thoughts

Our weekly newsletter brings you cybersecurity stories and insights. The insights that help you cut through the bull.

We'll never share your email.

Resources

Fresh Security Support

Your Questions

Blog

Fresh Sec Limited

Call: +44 (0)203 9255868