AI and the Crypto Assets industry are deeply integrated, with large models leading a new wave of technology.

2025-08-16 08:12:41

AI x Crypto: From Zero to Peak

The AI industry has recently developed rapidly and is regarded by some as the fourth industrial revolution. The emergence of large models has significantly improved efficiency across various industries, with estimates suggesting that GPT has increased work efficiency in the U.S. by about 20%. At the same time, the generalization capability brought by large models is considered a new paradigm for software design. In the past, software design involved precise coding, but now it is more about embedding generalized large model frameworks into software, which gives the software better performance and broader modal support. Deep learning technology has indeed brought a new wave of prosperity to the AI industry, and this trend has also spread to the cryptocurrency industry.

This report will detail the development history of the AI industry, the classification of technologies, and the impact of deep learning technology on the industry. It will then analyze the current status and trends of the upstream and downstream of the industrial chain, including GPU, cloud computing, data sources, and edge devices in deep learning. Finally, we will fundamentally explore the relationship between cryptocurrency and the AI industry, sorting out the pattern of the AI industrial chain related to cryptocurrency.

The Development History of the AI Industry

The AI industry started in the 1950s. To achieve the vision of artificial intelligence, academia and industry have developed various schools of thought to implement artificial intelligence under different historical periods and disciplinary backgrounds.

Modern artificial intelligence technology mainly uses the term "machine learning", which is based on the idea of allowing machines to iteratively improve system performance in tasks by relying on data. The main steps involve inputting data into algorithms, training models with the data, testing and deploying the models, and using the models to complete automated prediction tasks.

Currently, there are three main schools of thought in machine learning: connectionism, symbolicism, and behaviorism, which respectively mimic the human nervous system, thinking, and behavior.

Currently, connectionism represented by neural networks is dominant ( also known as deep learning ). The main reason is that this architecture has an input layer, an output layer, but multiple hidden layers. Once the number of layers and neurons ( parameters ) is sufficient, there are enough opportunities to fit complex general tasks. Through data input, the parameters of the neurons can be continuously adjusted, and after multiple data experiences, the neuron will reach an optimal state ( parameters ), which is also the origin of the term "deep" - a sufficient number of layers and neurons.

For example, it can be simply understood as constructing a function where when X=2, Y=3; and when X=3, Y=5. If we want this function to handle all X values, we need to continuously add the degree of this function and its parameters. For instance, a function that meets this condition could be Y = 2X - 1. However, if there is a data point where X=2 and Y=11, we need to reconstruct a function suitable for these three data points. Using a GPU for brute force searching, we find that Y = X² - 3X + 5 is more appropriate. It doesn't need to match the data points perfectly; it just needs to maintain balance and provide roughly similar outputs. Here, X² and X, X₀ represent different neurons, while 1, -3, and 5 are their parameters.

At this point, if we input a large amount of data into the neural network, we can increase the number of neurons and iterate parameters to fit the new data. This way, we can fit all the data.

The deep learning technology based on neural networks has also undergone multiple iterations and evolutions, such as the earliest neural networks, feedforward neural networks, RNN, CNN, and GAN, eventually evolving into modern large models like GPT that use Transformer technology. The Transformer technology is just one evolutionary direction of neural networks, adding a converter ( Transformer ) to encode data from all modalities ( such as audio, video, images, etc. ) into corresponding numerical representations. This data is then input into the neural network, allowing the neural network to fit any type of data, thus achieving multimodality.

The development of AI has gone through three technological waves. The first wave occurred in the 1960s, a decade after AI technology was proposed. This wave was prompted by the development of symbolic technology, which addressed common issues in natural language processing and human-computer dialogue. During the same period, expert systems were born, one of which was the DENRAL expert system completed by certain institutions. This system possesses very strong knowledge in chemistry and infers answers akin to those of a chemistry expert through questions. This chemistry expert system can be seen as a combination of a chemistry knowledge base and an inference system.

After expert systems, scientists proposed Bayesian networks in the 1990s, which are also known as belief networks. During the same period, Brooks introduced behavior-based robotics, marking the birth of behaviorism.

In 1997, a technology company’s Deep Blue defeated chess champion Garry Kasparov 3.5:2.5, and this victory was seen as a milestone for artificial intelligence, marking the second wave of development for AI technology.

The third wave of AI technology occurred in 2006. The three giants of deep learning, Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, proposed the concept of deep learning, an algorithm that uses artificial neural networks as its architecture to perform representation learning on data. Subsequently, deep learning algorithms gradually evolved, from RNNs and GANs to Transformers and Stable Diffusion. These algorithms collectively shaped this third technological wave, which was also the peak period of connectionism.

Many iconic events have gradually emerged alongside the exploration and evolution of deep learning technology, including:

In 2011, a technology company's system defeated humans and won the championship in the quiz show "Jeopardy!" (.
In 2014, Goodfellow proposed GAN), Generative Adversarial Network(, which learns by allowing two neural networks to compete against each other, capable of generating photo-realistic images. At the same time, Goodfellow also wrote a book titled "Deep Learning," known as the "Deep Learning Book," which is one of the important introductory books in the field of deep learning.
In 2015, Hinton et al. proposed deep learning algorithms in the journal "Nature", and the introduction of this deep learning method immediately caused a huge response in both the academic and industrial circles.
In 2015, an artificial intelligence company was founded, and several well-known figures announced a joint investment of $1 billion.
In 2016, AlphaGo, based on deep learning technology, competed against the Go world champion and professional 9-dan player Lee Sedol in a human-machine Go battle, winning with a total score of 4 to 1.
In 2017, a humanoid robot named Sophia, developed by a company, was referred to as the first robot in history to obtain first-class citizen status, possessing rich facial expressions and the ability to understand human language.
In 2017, a company with a rich talent pool and technical reserves in the field of artificial intelligence published a paper titled "Attention is all you need" proposing the Transformer algorithm, marking the beginning of large-scale language models.
In 2018, a company released the GPT) Generative Pre-trained Transformer( built on the Transformer algorithm, which was one of the largest language models at the time.
In 2018, a team from a certain company released AlphaGo based on deep learning, which is capable of predicting protein structures and is regarded as a significant milestone in the field of artificial intelligence.
In 2019, a certain company released GPT-2, which has 1.5 billion parameters.
In 2020, a company developed GPT-3, which has 175 billion parameters, 100 times more than the previous version GPT-2. The model was trained on 570GB of text and can achieve state-of-the-art performance in multiple NLP) natural language processing( tasks) such as answering questions, translating, and writing articles(.
In 2021, a certain company released GPT-4, which has 1.76 trillion parameters, ten times that of GPT-3.
The ChatGPT application based on the GPT-4 model was launched in January 2023, and in March, ChatGPT reached 100 million users, becoming the fastest application in history to reach 100 million users.
In 2024, a certain company launched GPT-4 omni.

![Newbie Guide丨AI x Crypto: From Zero to Peak])https://img-cdn.gateio.im/webp-social/moments-0c9bdea33a39a2c07d1f06760ed7e804.webp(

Deep Learning Industry Chain

The current large language models are all based on deep learning methods using neural networks. Led by GPT, large models have created a wave of artificial intelligence enthusiasm, with a large number of players entering this field. We also find that the market's demand for data and computing power has surged. Therefore, in this part of the report, we mainly explore the industrial chain of deep learning algorithms, how the upstream and downstream are composed in the AI industry dominated by deep learning algorithms, and what the current situation and supply-demand relationship of the upstream and downstream are, as well as future developments.

First, we need to clarify that when training large models led by GPT based on Transformer technology, ), it is divided into three steps.

Before training, because it is based on Transformer, the converter needs to convert text input into numerical values, a process known as "Tokenization"; afterwards, these numerical values are called Tokens. Under general rules of thumb, an English word or character can be roughly considered as one Token, while each Chinese character can be roughly regarded as two Tokens. This is also the basic unit used for GPT pricing.

Step one, pre-training. By providing the input layer with a sufficient amount of data pairs, similar to the example given in the first part of the report (X,Y), to find the optimal parameters for each neuron under the model. At this time, a large amount of data is needed, and this process is also the most computationally intensive, as it requires repeatedly iterating the neurons to try various parameters. After a batch of data pairs has been trained, the same batch of data is generally used for secondary training to iterate the parameters.

Step two, fine-tuning. Fine-tuning involves training with a smaller batch of high-quality data, which will enhance the output quality of the model. Pre-training requires a large amount of data, but much of it may contain errors or be of low quality. The fine-tuning step can improve the model's quality through high-quality data.

Step three, reinforcement learning. First, a brand new model will be established, which we call the "reward model". The purpose of this model is very simple: to rank the output results. Therefore, implementing this model will be relatively easy as the business scenario is quite vertical. After that, this model will be used to determine whether the output of our large model is of high quality, allowing us to use a reward model to automatically iterate the parameters of the large model. ( However, sometimes human participation is also needed to assess the quality of the model's output ).

In short, during the training process of large models, pre-training has a very high demand for the amount of data, and the GPU computing power required is also the highest. Fine-tuning requires higher quality data to improve parameters, and reinforcement learning can iterate parameters through a reward model to produce higher quality results.

In the training process, the more parameters there are, the higher the ceiling of its generalization ability. For example, in the case of the function Y = aX + b, there are actually two neurons X and X0. Therefore, how the parameters change is extremely limited in terms of the data it can fit, because its essence is still a straight line. If there are more neurons, then more parameters can be iterated, allowing for fitting more data. This is why large models achieve miracles, and it is also why they are colloquially named large models. Essentially, it involves a massive number of neurons and parameters, vast amounts of data, and simultaneously requires substantial computing power.

Therefore, the performance of large models is mainly determined by three aspects: the number of parameters, the amount and quality of data, and computing power. These three factors jointly influence the quality of the model's results and its generalization capability. Let's assume the number of parameters is p, and the amount of data is n( calculated in terms of the number of tokens), then we can estimate the required computing power using general empirical rules, which allows us to roughly predict the computing power we need to purchase and the training time.

Computing power is generally measured in Flops, which represents a single floating-point operation. Floating-point operations refer to the addition, subtraction, multiplication, and division of non-integer numerical values, such as 2.5 + 3.557. Floating-point indicates the ability to include decimals, while FP16 represents precision that supports decimals, and FP32 is a more commonly used precision. According to empirical rules based on practice, pre-training ( Pre-training ) typically requires multiple training iterations for a large model, which generally needs about 6np Flops, where 6 is referred to as the industry constant. And inference ( Inference is the process where we input data and wait for the output from the large model ), divided into two parts: inputting n tokens and output.

GPT4.01%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes