o3-mini, the new Diamond model

o3-mini is the new Diamond

Experienced ChatMDR, ChatIVDR or ChatFDA users know that Diamond is synonymous with the smartest model of them all. It takes a bit of time to break a complex problem down into steps and then not only solve the problem, but also explain its reasoning. This makes the output verifiable and defendable, for example to senior management or your customer.

The AI models, from left to right: Fast, Smart and Diamond. You can find this selector at the left side of the screen.

Face like a diamond

When the first Diamond model came out in October 2024, it presented a major leap forward in the quality of the output. Several customers upgraded to ChatMDR Plus just to have access to this model.

At the time, we made a benchmark between the Fast, Smart and Diamond models. This showed that the Diamond model best understood the assignment and gave the most complete response.

Since October – by the way, was it only four months?? – the o1-preview that we used at the time was followed by o1 and then by o3-mini. The latter, as “mini” suggests, is smaller and therefore much faster. This was the response we were looking for to some user feedback that Diamond was good, but it did take its sweet time.

Our expert panel did a blind test and voted for the output of o3-mini 80% of the time.

With the introduction of o3-mini as the new Diamond model, it is no longer possible for you to compare the old o1-preview to the new model. What we’ve done therefore, is run the question we asked in the benchmark again. Here’s what Diamond said:

Full response of o3-mini Diamond

Full response of o1-preview from October 2024 benchmark

Content-wise, there is not much to choose between the models. The new Diamond with o3-mini gives an answer that is a little more compact and easier to read. The ending is quite different, however. Even with a very short prompt, o3-mini understood the urgency embedded in the request. So where the old model finished with a pretty standard summary and some miscellaneous references without explaining what they mean, o3-mini actually gives some implementation tips:

Now you may feel that these tips are not rocket science. But then this is not the most difficult assignment (remember, the Fast model had to cope with it too) and it was only a prompt of two sentences. We look forward to hearing what you create – please feel free to share it with us through our Contact form. We would love to work with you to do another example workflow series with ChatMDR or one of its siblings.

Improvements to the fast model

While we were working on this, we decided to also give the assignment to the Fast model, of which the performance also improved in version 3.3. The improvement here is not the model itself (for connaisseurs: it’s gpt-4o-mini), but in the prompting and context length. Contrary to the first benchmark, where the Fast model omitted the need to engage a notified body for the Class IIa device, this time it did and actually came back with a nice summary:

Leave a Comment

Your email address will not be published. Required fields are marked *