Artificial Intelligence
~ 8 minute read
03 Feb 2025
Hey 👋 If you haven't heard about DeepSeek R1 yet, you're missing out!
In this article I'll go over why everybody is losing their minds with DeepSeek R1 and why it being open source is a thing to get you hyped.
In case you aren't familiar with Large Language Models and how they work, you can take a look at my other article The Hitchhikers Guide to GPT. However, if you're curious and don't have the time to read it, here's a short rundown.
Short Recap on LLMs
A generative Large Language Model (LLM) is a mathematical formula with billions of parameters that is specialised in autocompleting text.
If the user asks the LLM:
The LLM does this:
The LLM is conditioned during its training phase to either remember all the information it's shown, or to answer in a certain way. The LLM does not learn when you speak with it, but rather the developers collect your interactions and train it every now and then.
The Animated Transformer - Source Link
How ChatGPT o1 is trained
There are 3 steps that go into building ChatGPT o1:
Let's break down the last part briefly.
Having more context to answer the question
If you've been playing with ChatGPT in the past few years, you've likely seen that it does a better job when you give it some context. If you give it a snippet of your grandma's cookbook and tell it to extract the ingredients for a Gulas, it's more likely to get all the ingredients right rather than asking just “give me all the ingredients for Gulas”. That's because it already has the information there and it only needs to extract it from the prompt.
Solving complex problems
There were some issues in the past with ChatGPT where you ask it things like logic problems and how many “r”s are in the word strawberry and it would get it wrong. That's because the model itself is not capable of reasoning, and it will just produce words that produce information it saw during training.
The solution - Chain of Thought
Chain-of-Thought Prompting - Original Paper
To solve these problems, developers told the model that before answering the question it needs to first list the steps that are needed to answer it. And if you remember the example above, it will use the steps it produces to solve the problem. Thus the example becomes something like:
If the user asks the LLM:
The LLM does this:
...
The final result of the thinking process might look something like this:
<think>
Okay, so I'm trying to figure out how to respond to the user's message where they said "Hello! How are you?" and then provided some text about being a helpful assistant. Hmm, wait, actually, looking back, it seems like there might be a mix-up here.
</think>
This is just part of the LLM response and the process will continue like this in order to produce the final answer.
Takeaways
However there are multiple catches:
Why is DeepSeek r1 awesome
DeepSeek r1 takes all those limitations, solves them and makes everything (except the training data) open source. Here's an overview.
Training
Let's take a look at how DeepSeek was trained now:
In the end you'll have a DeepSeek r1 model that has capabilities comparable to ChatGPT o1.
Training of DeepSeek r1 - Source Link - Explanation video
However, the model and the information regarding how to train your own specialised model are both open source, free of charge.
And, most importantly, they allegedly only spent 6 Million Dollars on training it, compared to OpenAI's massive 100 Million Dollars (source for costs here).
Distillation
Let's say we don't have the necessary resources to load a 671 Billion parameters model on our machine. One way we can still get a glimpse of its capabilities is by using a distilled version of the model.
Through distillation (or teacher-student method) we take a big model like DeepSeek r1 (671B parameters) and we ask it a bunch of questions. We then capture its answers and use this information to train a smaller model (such as Llama 8B). The more question - answer pairs we use to train the student model, the better it will resemble the teacher model.
The catch is that you can only squeeze in so much information in a tiny model. If you want to get even closer to the teacher model, you'll need to increase the number of parameters of the student. Eventually you'll get to a good enough balance that you get good results, while also being able to run the model locally.
Running DeepSeek r1 locally
Given the fact that the model is now open source, you are able to simply download it and run it locally on a computer with no access to the internet (here's a youtube video for that).
What that means for businesses and developers is that by running the model on your local machine, you can be 100% sure that no sensitive data within the company will leak outside.
Also, by using the distilled models, you can fine-tune the models on consumer hardware, teaching it all the cool stuff about your data. As if you're training an intern for his new job on your project.
DeepSeek r1 Limitations
Now that we got ourselves hyped by what DeepSeek r1 can do, lets see some disadvantages:
Side-effects of Open Source
Now that DeepSeek has made their models open, there are a bunch of side effects that came to being, such as:
Conclusions
While I still can't run the bigger DeepSeek r1 model on my machine, I can't wait to train the distilled version for a cool project I'm working on. Also, I may need to start digging more into how others are using it and what drawbacks they found.
Overall, it's cool that there is finally some healthy competition in the Large Language Model space, and that we can host the good models ourselves. We don't need to solely rely on big companies such as OpenAI.
Machine learning has become fun again, now that we can also own the cool new toys.
Written by
Machine Learning Software Engineer
Share this article on