Overview

  • Founded Date Kasım 17, 1975
  • Sectors Construction
  • Posted Jobs 0
  • Viewed 42

Company Description

Open-R1: a Completely Open Reproduction Of DeepSeek-R1

Hey there! This article is an introduction to the job, not a claim that we’ve replicated R1 yet. We’re developing in the open, so as quickly as we have examination numbers, we’ll share them. You can follow our development on Hugging Face and GitHub.

True, but it looks like there’s absolutely nothing to be evaluated as of today. I assume the ultimate goal is to train a brand-new thinking model and then utilize the very same as o1 and the DeepSeek-R1.

Well, there should be at least some sanity check and validation to guarantee the design was trained properly.

Oh yes, if you are speaking about the assessment number of deepseek’s model it’s coming soon!

As pointed out in the article there is no design called Open-R1 to check at all … not yet anyhow. This is a blog describing that Hugging face will take the R1 Deepseek model, exercise how it was constructed as described in the paper and from what they released, and after that reproduce that process.

in reality this is practically how science works … A comes up with a plan, discovery or development and it is tested by B, C and D to see if it is reproduceable. Thats been the cornerstone of research study now for a few centuries.

This blog site is not stating they have already done so … Its a blog site laying out an intent to begin training a design like R1 and calling it Open-R1.

Also DeepSeek-R1 was just released last week, and even in their paper they detailed the compute hours needed. While those are low calculate hours for a SOTA model this does not suggest you can train said model in a week. I ‘d personally love to be able to train a transformer design in a week, however we may need to wait a while for that level of compute innovation.

So there are no benchmarks for a design that has not been built yet right? As outlined in the blog site, and once again in reply to your question.

However fear not, there is a GitHub Repo currently and factors (hell I may join myself), some prelim work done, and a master plan. A good starting position.

n
@edbeeching
has evaluated the released designs currently

( src: https://x.com/edwardbeeching/status/1884273209136275742)

R1 just trained on o1 outputs, so collectively …/ s. This is what the new AI czars are saying

Hi! This post is an intro to the project, not a claim that we’ve reproduced R1 yet. We will totally share the missing out on piece when we have them, you can anticipate the designs and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo

That’s great and important to understand this tremendous buzz that lacks technical understanding and description. Science is about recreation, and if they declare to be open, let them fullfill the open part.

Please do release the training cost.

We will!

Excalidraw Hi n
@bojan2501
thanks, we will undoubtedly be working hard to ensure this training recipe can work for little language models on customer hardware since not everybody has a cluster of H100s in your home:-RRB- The tool we used for the images was Excalidraw! https://excalidraw.com

looking forward to it! WTF are your discussing?

must be a joke

It’s truly cool to see how the entire open source community comes together!

Ops …

5.5 M is number reporter in the deepseekv3 tech report (simply the training, not the experiment afaik), for R1 hard to approximate tbh however much less than 5.5 M imo

Historically, they have never ever released code or datasets of their LLM training, so I wouldn’t anticipate this time to be different. If they would launch it that would be incredible obviously!

Yes naturally!

So generally you’re asking to change existing censorship with another flavour of censorship?

The code for the designs are inside the design repositories, e.g. for V3: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py

Hello Team, I’m Ray Bernard, the author and creator of EQUATOR. My research study team will be dealing with a paper focused on reproducing certain parts of DeepSeek R1. Our aim is to replicate the cold start and supply your group with a dataset that includes COT and other methods to support these efforts. We like to contribute our work to help. Please let me know if you discover this useful. Best, Ray Bernard https://www.facebook.com/groups/1186310571520299/

Where is the evaluation numbers? without it you can’t call it recreation.

8 replies

True, but it looks like there’s nothing to be evaluated as of right now. I assume the supreme goal is to train a new thinking design and then use the exact same evaluation metrics as o1 and the DeepSeek-R1.

That’s rather fascinating, I was asking myself why the questions the author exposed here are not being asked by others? I think the work they have done is remarkable however at the exact same time I question why they wouldn’t put these missing out on pieces on if they are expected to be totally open.
Why even without reproduction and comprehension of the development they could affect a lot the market in this way?

4 replies

Hi! This article is an introduction to the task, not a claim that we’ve reproduced R1 yet. We will totally share the missing out on piece when we have them, you can anticipate the models and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo

Interesting read, and it is excellent that we see more effort into this direction: more optimization and less brute force.
Also wonder what tool did the author use for developing step diagram.

2 replies

Excalidraw I’m so delighted that effort like this currently exist, I’m gon na try to contribute:-RRB- 1 reply

anticipating it! So racist articel

2 replies

WTF are your talking about?

Awesome to have this open recreation started!

For Step # 1 check out https://github.com/open-thoughts/open-thoughts!

https://x.com/ryanmart3n/status/1884284101265612856

Let’s do this thing!

1 reply

It’s actually cool to see how the entire open source community comes together!

Does anyone know the real training expense of r1? I can’t find it in the paper or the announcement post. Is the 6M expense reported by media simply the number drawn from v3’s training expense?

2 replies

Ops …

Has anyone asked the DeepSeek team to release their training data and code, or at least share them privately with an independent duplication project like this? Have they declined such a request?

A loyal replication depends upon using the same dataset and hyperparameters. Otherwise, any significant discrepancies with the published criteria would be hard to pin down-whether due to training information distinctions or the replication technique itself.

1 reply

Historically, they have actually never released code or datasets of their LLM training, so I would not expect this time to be various. If they would launch it that would be remarkable naturally!

In the meantime we have to make best guess estimates and see if we can get there ourselves.

You offer excellent replication process of Deepseek thinking training. I will attempt something similar to it.

This is actually good details, can we fine tune with particular use case when code is released?

1 reply

Yes obviously!

Please consider eliminating prejudiced, tainted or unaligned training data and make an effort to get rid of copyrighted works from the crawl from consumption. This will make the design more usable. If you recycled anthropic curation checks, this might also assist, remove obviouslybiased information will likely add a lot of worth. We don’t want another polluted, unaligned open source model, right? And no business would ever utilize deepseek or a model that reuses it, right?
We appreciate your work for the benefit of humanity, we hope.
Miike C from NJ

1 reply

So generally you’re asking to replace existing censorship with another flavour of censorship?

Can’t wait! Hopefully the model will be uncensored but whatever you can do is alright! Love seeing open source building itself up. I’m not wise adequate to really help but I can contribute moral support lol

Hello guys, I am even simply looking for code for DeepSeek-V2, in order to completely understand multi-head latent attention. You do not seem to have code in Hugging Face even for that. Or am I missing something? Don’t see anything in src/transformers/models. MLA is not effectively explained in their paper, so it would be important to have code for this.