Back to articles

ARTICLE

from hitchhikers to seekers: ChatGPT versus DeepSeek

8 minutes read

Written by Giani Statie

Machine Learning Software Engineer

Share this article on

Hey 👋 If you haven't heard about DeepSeek R1 yet, you're missing out!

In this article I'll go over why everybody is losing their minds with DeepSeek R1 and why it being open source is a thing to get you hyped.

In case you aren't familiar with Large Language Models and how they work, you can take a look at my other article The Hitchhikers Guide to GPT. However, if you're curious and don't have the time to read it, here's a short rundown.

short recap on LLMs

A generative Large Language Model (LLM) is a mathematical formula with billions of parameters that is specialised in autocompleting text.

If the user asks the LLM:

Hello! How are you?

The LLM does this:

(input) Hello! How are you? ➤ (output) Fine
(input) Hello! How are you? Fine ➤ (output)
(input) Hello! How are you? Fine ➤ (output) .
(input) Hello! How are you? Fine. ➤ (output) How
(input) Hello! How are you? Fine. How ➤ (output) about
(input) Hello! How are you? Fine. How about ➤ (output) you
(input) Hello! How are you? Fine. How about you ➤ (output) ?

The LLM is conditioned during its training phase to either remember all the information it's shown, or to answer in a certain way. The LLM does not learn when you speak with it, but rather the developers collect your interactions and train it every now and then.

The Animated Transformer

how ChatGPT o1 is trained

There are 3 steps that go into building ChatGPT o1:

GPT - trained via supervised learning to remember a lot of information
Chat - trained via reinforcement learning to answer questions in a more conversational way
o1 - trained to think before giving the answer to the user

Let's break down the last part briefly.

Having more context to answer the question

If you've been playing with ChatGPT in the past few years, you've likely seen that it does a better job when you give it some context. If you give it a snippet of your grandma's cookbook and tell it to extract the ingredients for a Gulas, it's more likely to get all the ingredients right rather than asking just “give me all the ingredients for Gulas”. That's because it already has the information there and it only needs to extract it from the prompt.

Solving complex problems

There were some issues in the past with ChatGPT where you ask it things like logic problems and how many “r”s are in the word strawberry and it would get it wrong. That's because the model itself is not capable of reasoning, and it will just produce words that produce information it saw during training.

the solution chain of thought

To solve these problems, developers told the model that before answering the question it needs to first list the steps that are needed to answer it. And if you remember the example above, it will use the steps it produces to solve the problem. Thus the example becomes something like:

If the user asks the LLM:

Hello! How are you?

The LLM does this:

(input) Hello! How are you? ➤ (output) <thinking>

(input) Hello! How are you?<thinking> ➤ (output) Okay

(input) Hello! How are you? <thinking>Okay ➤ (output) ,

...

The final result of the thinking process might look something like this:

<think>

Okay, so I'm trying to figure out how to respond to the user's message where they said "Hello! How are you?" and then provided some text about being a helpful assistant. Hmm, wait, actually, looking back, it seems like there might be a mix-up here.

</think>

This is just part of the LLM response and the process will continue like this in order to produce the final answer.

takeaways

However there are multiple catches:

OpenAI does not show the entire Chain of Thought to the user - they summarise it and keep the raw Chain of Thought as a trade secret

OpenAI has not released any information on how the networks are trained and how to reproduce the results. All the information presented are high-level guidelines inferred by other AI specialists

OpenAI does not provide direct access to the model. The only way to use the model via their APP or paid services

ChatGPT has so many parameters that you need multiple thousand dollars GPUs to run it and even more to train it

why DeepSeek is awesome

DeepSeek r1 takes all those limitations, solves them and makes everything (except the training data) open source. Here's an overview.

Training
Let's take a look at how DeepSeek was trained now:

You take DeepSeek v3 - a LLM trained in a similar manner as GPT

Then you train it to “think” - to produce those steps presented earlier before answering

Then you repeat steps 1 and 2 to make sure it really learned

In the end you'll have a DeepSeek r1 model that has capabilities comparable to ChatGPT o1.
However, the model and the information regarding how to train your own specialised model are both open source, free of charge.
And, most importantly, they allegedly only spent 6 Million Dollars on training it, compared to OpenAI's massive 100 Million Dollars (source for costs here).

Distillation

Let's say we don't have the necessary resources to load a 671 Billion parameters model on our machine. One way we can still get a glimpse of its capabilities is by using a distilled version of the model.
Through distillation (or teacher-student method) we take a big model like DeepSeek r1 (671B parameters) and we ask it a bunch of questions. We then capture its answers and use this information to train a smaller model (such as Llama 8B). The more question - answer pairs we use to train the student model, the better it will resemble the teacher model.
The catch is that you can only squeeze in so much information in a tiny model. If you want to get even closer to the teacher model, you'll need to increase the number of parameters of the student. Eventually you'll get to a good enough balance that you get good results, while also being able to run the model locally.

Running DeepSeek r1 locally

Given the fact that the model is now open source, you are able to simply download it and run it locally on a computer with no access to the internet (here's a youtube video for that).
What that means for businesses and developers is that by running the model on your local machine, you can be 100% sure that no sensitive data within the company will leak outside.
Also, by using the distilled models, you can fine-tune the models on consumer hardware, teaching it all the cool stuff about your data. As if you're training an intern for his new job on your project.

DeepSeek r1 limitations

Now that we got ourselves hyped by what DeepSeek r1 can do, lets see some disadvantages:

Though free to download, you'll still need a few thousand dollars worth of GPUs to run the model
Even for simple questions (like "What's 2+2?"), the model will always generate the <thinking> tokens, adding unnecessary costs for sometimes redundant information
Generating the <thinking> tokens also slows down response times, making users wait longer for answers.
The model is slightly biased. Through experiments, users have found that the model is more shy when answering political questions about China. The extent of its bias has not yet been properly investigated.

side-effects of Open Source

Now that DeepSeek has made their models open, there are a bunch of side effects that came to being, such as:

People are able to train good Large Language Models on consumer grade hardware. This enables developers to try out new approaches and improve upon State of The Art approaches. Which in itself leads to healthy competition, breaking the monopoly of companies such as OpenAI.
On the other hand, because of the improvements in how developers train Language Models, there is no need for big players to invest in more compute power, leading to a decrease in stock for companies like Nvidia that produce hardware.

conclusions

While I still can't run the bigger DeepSeek r1 model on my machine, I can't wait to train the distilled version for a cool project I'm working on. Also, I may need to start digging more into how others are using it and what drawbacks they found.

Overall, it's cool that there is finally some healthy competition in the Large Language Model space, and that we can host the good models ourselves. We don't need to solely rely on big companies such as OpenAI.

Machine learning has become fun again, now that we can also own the cool new toys.

Share this article on

ai & ml

software development

ui/ux design

machine learning

generative ai

computer vision

time series analysis

case studies

articles

ai, machine learning