DeepSeek - Beyond the Open Source!

Last week was phenomenal, we saw Open AI giving a sneak peak of Operator, which allows the user to build agents that can take control of your browser and perform tasks like booking a hotel suite or flight tickets. While they were flexing the CUA architecture, A Chinese based company called DeepSeek was quietly taking over the internet and getting accolades from AI researchers worldwide. DeepSeek’s latest model outperformed most established models like Claude, GPT and Llama. The interesting part is, It cost DeepSeek $5.5 Million to train their latest model, that’s a huge number until you learn that Open AI spent over $100 Million to train GPT-4 and Meta’s Llama trained at $50 Million. If that doesn’t impress you already, DeepSeek is also Open Sourced - making it available for developers and researchers across the world. In this article we’ll go deep into what DeepSeek’s rising popularity actually means.

My Experience with DeepSeek : The Visible Chain-of Thought

What I personally love the most about DeepSeek is how it thinks [That’s the first time I have said it about a model]. One can see the internal monologue that the model has within itself and the Chain-of Thought reasoning that builds on a loop before generating an answer. I was talking to a developer about DeepSeek yesterday, he said “I think the CoT reasoning also teaches us how to think about a problem” and I couldn't agree more.

The AI Superpowers :

Most Frontier large scale models built by US Tech companies like Open AI, Claude & Meta ensure that their models remain closed while demonstrating the need for massive investments and infrastructure sophistication in building these models. DeepSeek on the other end breaks the wall by building a state of the art model that outperforms all these closed models while needing less GPU hours and training cost leaving the door open for others to follow. Imposing Sanctions on GPUs to China had the opposite effect to quelling their AI progress - it only accelerated their quest to optimise the learning models with less reliance on GPU.

DeepSeek is a testament to Kai Fu Lee’s book, AI Superpowers. The central theme of the book is how Chinese Entrepreneurs copy original ideas from the West and emulate them for the better while becoming more profitable than their Western counterparts in doing so. The book is full of ripe examples like Baidu vs Google, Uber vs WeRide and E-bay vs Alibaba. Countries are on a fast paced race to becoming an AI Superpower, taking us back to the era of the race for becoming a nuclear power and China has leap frogged many levels with their sophisticated Peer-to Peer payment systems like WeChat and Alipay powering the digital economy, similar to UPI but introduced 10 years before UPI. They are leading the consumer drone space, rapid manufacturing and the panopticon model of a deep surveillance taking making them an advanced autonomous state supplemented with social credit systems with a massive propaganda, which is questionable, dangerous and a topic for another time.

Architecture, Cost effectiveness and performance:

Architecture overview :

DeepSeek is a Mixture-of-Exports ( MoE) Model with 671 Billion Parameters, of which 37 Billion are activate for each token. The model is trained on 14.8 Trillion Tokens. DeepSeek’s architecture is optimised for superior Model Performance through Multi Head latent Attention (MLDA) for efficient inference while using DeepSeek MoE for cost-efficient training. In addition to this, DeepSeek also includes an auxiliary-loss-free-strategy for load balancing and a multi-token prediction training objective to boost overall performance.Pre-training the model is also cost effective due to the support of FP8 training and meticulous engineering optimisations and Post Training focuses on distilling the reasoning capabilities demonstrated y DeepSeek-R1 series of models.

Training Cost :

DeepSeek is remarkably cost effective and "Affordable AI” is their mission [Which was surprisingly what OpenAI promised at inception]. DeepSeek V3 is trained with 2048 GPU’s [Llama 3 model used 24000 Nvidia H100 Chips at a cost of $50 Million] Refer to the screenshot from their paper to understand the training costs better.

To illustrate the actual cost difference, let’s compare DeepSeek-V3 with GPT-4o and GPT-4. GPT-4o is priced at $10 per million tokens for output and $3 per million tokens for input. This means DeepSeek-V3 is approximately 36 times cheaper for output and 214 times cheaper for input compared to GPT-4o. Compared to the standard GPT-4 (GPT-4 Turbo), DeepSeek-V3 is 214 times cheaper for output and 2143 times cheaper for input.

Above : DeepSeek Pricing model, source : DeepSeek Github Page,

Above : Chat GPT 4 Pricing Model, Source : Open AI Website from 2025

Model Performance and Benchmark

DeepSeek's benchmark against popular Closed Frontier models. From their Research paper.

Future Goals :

DeepSeek seeks for longtermism and their ultimate goal is to reach AGI [Artificial General Intelligence], some key areas they’ll focus to improve,

Invest in research to study and redefine the existing architecture and improve training and inference efficiency.
Push beyond the boundaries of the Transformer’s architectural limitations and thereby expand the boundaries of it's modelling capabilities
Improve deep thinking and problem solving abilities by expanding the model’s reasoning length and depth.
Continuously explore and iterate on the quality and quantity of training data and drive data scaling across more dimensions.

Open Source Software : Affordable AI towards the future.

DeepSeek’s benchmark against other closed models highlight one thing - The open source models are catching up and closing the performance gap between large scale closed models and as pointed by Marc Andreessen, "LLMs are on a race to the bottom" reaching a future point where there will be no moats for foundational models, because of decreasing token costs that erode profit margins.

DeepSeek makes a very powerful State of the art AI tool available to a broader audience including researchers developers and businesses resulting in rapid experimentation. The lower training costs will encourage more AI adoption, enabling smaller organisations to leverage AI capabilities.

The reality of Open Source models and their adoption :

❝

Closed-source models dominate and survive the market and not many companies can sustain the cost of developing open source models. A research from CB Insights state that the training cost of models are increasing 2.4X every year because of hardware, talent and energy needs.

Closed models have good traction and generate revenue that keeps them afloat whereas open source models move towards a paid model eventually, most open source models keep their cashflow running through services revenue and in some cases commercialise their flagship models.

Valued at $1 Billion once, take a look at Stability AI, the company that brought us Text-to-image generation to the market through their Stable-Diffusion model. 2 years ago, they secured a funding of $173 Million and enjoyed wide-spread early success. The decision to keep the company open source without a clear vision and revenue model led to financial mismanagements in early 2024 and eventually resulting in stepping down of their CEO while competitors like DALL-E monetised their image generation models with API limited access. A group of high profile saviours still keep Stability AI alive by forgiving a $100 Million debt to its vendors and investing $80 Million with a massive board restructuring. Stability AI now works with Enterprises like Hubspot, AWS and provides paid- API access, walking the balance between open-source and closed-source models.

Mistral AI took a similar path towards democratising AI but were quick to course correct learning from others, Mistrals flagship model Mistral Large is commercialised and made available in Azure through a strategic partnership with Microsoft.

Closed Source model developers have also secured double the funding ($37.5 Billion) compared to open source model developers ($14.9 Billion) in the last 4 years. Nvidia and Microsoft invested in more Open-source models not because they love open-source software but to feed their own interest of selling AI Chips and cloud infrastructure services that are compatible with open-source and closed models to support a wide variety of Enterprise business cases.

Smaller Open Source Models : Built with fewer parameters at light weights, Smaller models are gaining huge popularity with Enterprises. Open source companies face competition from smaller open source models released by industry leaders for specialised business use cases like Microsoft’s Phi, Google’s Gemm and Apple’s Open ELM. These Smaller models are capable in running in your laptop and mobile phones opening up for wide variety of use cases and Enterprises have the option to take a Hybrid approach by opting closed models for sophisticated applications and smaller open source models for specialised needs.

Survey results from SlashData released in Q3 2024 where 2000 participants [Developers] responded to a question: What types of models do you use to add AI functionality to your applications?

The findings suggest that 66% of respondents opted to choose open-source AI models to build their AI powered applications because of Ease of Integration, customisation flexibility and community support. However 41% of Professionals still opt for Closed models that are optimised for specific applications, offering enhanced performance, technical support and security. Alternately Closed AI models are costlier, making them a popular choice with increase in company size.

Source : SlashData Q3 2024 Report : How Developers Build AI -Enabled Applications

While building and training larger foundational models are costlier, require talent and infrastructure support, 24% of respondents out of 1768 people say that in-house models offer great control and data privacy over open-source / closed models.

The major outcome of the survey also indicates that large Enterprises are mostly likely to use proprietary or closed source AI models than open source models.

While it's clear that Open-source models push the limits of AI forward and make AI accessible to everyone they are still met with questions on Ethical risks, legal complexity and safety that is not compromised by transparency. Nevertheless DeepSeek will disrupt the entire AI industry in the coming months. While DeepSeek's future goals align towards AGI, we'll have to see how the landscape shifts because anything can happen.

Article References :

02. DeepSeek - Beyond the Open Source!

The AI Superpowers :

Architecture, Cost effectiveness and performance:

Model Performance and Benchmark

Open Source Software : Affordable AI towards the future.

The reality of Open Source models and their adoption :

Keep Reading

Curated by The AI Brains community