2023年的深度学习入门指南(6) - 在你的电脑上运行大模型



大部分开发者自身都是在开源系统上做开发的,Windows的适配关注得较少,甚至完全不关心。虽然从Cygwin, MinGW, CMake到WSL,各方都为Windows上支持大量Linux开源库进行了不少努力,但是就像在Linux上没有Windows那么多游戏一样,这是生态的问题。


Nomic AI gpt4all (基于LLaMA)


因为不并对所有人开放,我们讲解LLaMA是没有意义的。但是我们可以尝试一些基于LLaMA的项目,比如Nomic AI的gpt4all。

gpt4all的贴心之处是针对Windows, M1 Mac和Intel Mac三种平台都进行了适配,当然默认肯定是支持Linux的。而且,推理使用CPU就可以。



git clone https://github.com/nomic-ai/gpt4all






Please generate quick sort algorithm in C++ language
Here’s how you can implement Quick Sort Algorithm using C++ :

// Implementation of QuickSort() function for array A[] with size N and pivot value P[N/2]voidQS(int*A,int n){// Function prototype declarationif(n <=1)return;/* If there is only one element in the input Array */// QuickSort() function for array A[] with size N and pivot value P[N/2]else{//else statement to check condition of quick sort algorithm.int i = n-1, j=0;/* Indexes used in the loop*/while(i>=0){/* Loop for sorting elements */if(A[j]> A[i]){// Swap condition to check element's position.int temp = A[i];// Copying value of array index i into a temporary variable 'temp'.
                    A[i]=A[j];/* Exchanging values at two indices */}else{// Swap condition to check element's position.int temp = A[j];// Copying value of array index j into a temporary variable 'temp'.
                    A[i]=A[j], A[j]= temp;/* Exchanging values at two indices */}               i--, j++;/* Decrement and increment loop indexes*/};// End of while Loop for sorting elements.}//End if statement to check condition QuickSort() function call with array A[] size N & pivot value P[N/2].


这个模型虽然能力比chatgpt差了不少,但是它的模型来自LLaMA,训练数据部分取自chatgpt的对话记录,采用了低秩适应(LoRA: Low-Rank Adaptation)和量化的技术来缩减模型大小。每一项技术都是目前的前沿技术。







git clone https://github.com/THUDM/ChatGLM-6B


pip install -r requirements.txt

上面的gpt4all的Python编程接口不支持Windows,而ChatGLM-6B基于Hugging Face Transformer库开发,支持Windows下的Python编程,我们写个代码来调用它吧:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer,"用C++实现快速排序", history=[])print(response)


#include<iostream>usingnamespace std;voidswap(int& a,int& b){int temp = a;
    a = b;
    b = temp;}intpartition(int arr[],int low,int high){int pivot = arr[high];int i =(low -1);for(int j = low; j < high -1; j++){if(arr[j]< pivot){
            i++;swap(arr[i], arr[j]);}}swap(arr[i+1], arr[high]);return i+1;}voidquicksort(int arr[],int low,int high){if(low < high){int pi =partition(arr, low, high);quicksort(arr, low, pi -1);quicksort(arr, pi +1, high);}}intmain(){int arr[]={5,2,9,1,6,3,8};int n =sizeof(arr)/sizeof(arr[0]);quicksort(arr,0, n-1);

    cout << arr[0]<< endl;return0;}



这里面我们可以引出Transformer时代的门户,hugging face。我们在上面代码中所使用的from的 transformers库,就是hugging face出品的。

from transformers import AutoTokenizer, AutoModel

hugging face

从上图我们可以看到,Hugging face基本上就是各种Transformer模型的集散地。使用Hugging face的接口,就可以使用基本上所有的开源的大模型。



classTransformer(nn.Module):def__init__(self, params: ModelArgs):super().__init__()
        self.params = params
        self.vocab_size = params.vocab_size
        self.n_layers = params.n_layers

        self.tok_embeddings = ParallelEmbedding(
            params.vocab_size, params.dim, init_method=lambda x: x

        self.layers = torch.nn.ModuleList()for layer_id inrange(params.n_layers):
            self.layers.append(TransformerBlock(layer_id, params))

        self.norm = RMSNorm(params.dim, eps=params.norm_eps)
        self.output = ColumnParallelLinear(
            params.dim, params.vocab_size, bias=False, init_method=lambda x: x

        self.freqs_cis = precompute_freqs_cis(
            self.params.dim // self.params.n_heads, self.params.max_seq_len *2)




classTransformerBlock(nn.Module):def__init__(self, layer_id:int, args: ModelArgs):super().__init__()
        self.n_heads = args.n_heads
        self.dim = args.dim
        self.head_dim = args.dim // args.n_heads
        self.attention = Attention(args)
        self.feed_forward = FeedForward(
            dim=args.dim, hidden_dim=4* args.dim, multiple_of=args.multiple_of
        self.layer_id = layer_id
        self.attention_norm = RMSNorm(args.dim, eps=args.norm_eps)
        self.ffn_norm = RMSNorm(args.dim, eps=args.norm_eps)defforward(self, x: torch.Tensor, start_pos:int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor]):
        h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)
        out = h + self.feed_forward.forward(self.ffn_norm(h))return out


classAttention(nn.Module):def__init__(self, args: ModelArgs):super().__init__()

        self.n_local_heads = args.n_heads // fs_init.get_model_parallel_world_size()
        self.head_dim = args.dim // args.n_heads

        self.wq = ColumnParallelLinear(
            args.n_heads * self.head_dim,
            init_method=lambda x: x,)
        self.wk = ColumnParallelLinear(
            args.n_heads * self.head_dim,
            init_method=lambda x: x,)
        self.wv = ColumnParallelLinear(
            args.n_heads * self.head_dim,
            init_method=lambda x: x,)
        self.wo = RowParallelLinear(
            args.n_heads * self.head_dim,
            init_method=lambda x: x,)

        self.cache_k = torch.zeros((args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)).cuda()
        self.cache_v = torch.zeros((args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)).cuda()



LM Flow

LM Flow也是最近很火的项目,它是香港科技大学在LLaMA的基础上搞的全流程开源的,可以在单3090 GPU上进行训练的工程。



比如,在目前的开源项目中,LMFlow是少有的提供了Instruction Tuning的工程。

我们来看个Instruction Tuning的例子:

{"id":0,"instruction":"The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.","input":"If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.","infer30b_before_item":" Output: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n---\nInput: Input: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n Output: Output: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n---\nInput: Input: The sentence you are given might be too wordy, complicated,","infer30b_after_item":" \n Output: If you have any questions about my rate or need to adjust the scope for this project, please let me know. \n\n","infer13b_before_item":" The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","infer13b_after_item":" \n Output: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know. \n\n","infer7b_before_item":" The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nInput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nOutput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nInput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by","infer7b_after_item":" \n Output: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know. \n\n"}



最近浙大和微软的团队又推出了充分利用Hugging Face的门户中枢地位的Jarvis工程。




  1. 通过对大模型进行剪枝、降秩、量化等手段,我们是可以在资源受限的电脑上运行推理的。当然,性能是有所损失的。我们可以根据业务场景去平衡,如果能用prompt engineer解决最好
  2. HuggingFace是预训练大模型的编程接口和模型集散地
  3. 大模型的基本原理仍然是我们上节学习的自注意力模型

