LLama2 本地部署 大数据 AI


https://github.com/FlagAlpha/Llama2-Chinese

https://huggingface.co/FlagAlpha

https://hub.docker.com/r/longerhuya/llama2-chinese-7b

https://hub.docker.com/r/ninthkat/jupyterlab-pytorch-cuda


# Installing the NVIDIA Container Toolkit

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \

  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

yum install -y nvidia-container-toolkit

-Configuring Docker

nvidia-ctk runtime configure --runtime=docker

systemctl restart docker


docker  run  -d --restart=always  --name llama2  -p 9999:8888 -p 15550:15550  -p 15551:15551  -v /data/site/docker/data/llama2:/home/jovyan  -e TZ='Asia/Shanghai' -v /etc/localtime:/etc/localtime:ro --shm-size 12G    ninthkat/jupyterlab-pytorch-cuda:latest


http://g.htmltoo.com:9999


-p 15550:15550  -p 15551:15551 

bitnami/pytorch

--shm-size 16G  

nvcr.io/nvidia/pytorch:21.08-py3


docker exec -it llama2 /bin/bash

-安装conda

cd /notebooks

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh

export PATH="/root/miniconda3/bin:$PATH"

-创建 conda 环境:

python --version

conda --version

conda create -n llama2 python=3.10.11

-安装依赖库

activate llama2

conda init


docker exec -it llama2  env \LANG=C.UTF-8  /bin/bash

cd /home/jovyan

activate llama2

git clone https://github.com/facebookresearch/llama.git

cd  llama

-安装依赖库:

pip install -e .

pip install -r requirements.txt  -i https://pypi.tuna.tsinghua.edu.cn/simple


-代码及模型权重拉取

git clone https://github.com/FlagAlpha/Llama2-Chinese.git

-拉取 Llama2-Chinese-13b-Chat 模型权重及代码

cd Llama2-Chinese

git clone git clone https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat

-文件大小查看:

du -sh Llama2-Chinese-13b-Chat

-输出:

25G    Llama2-Chinese-13b-Chat


---终端测试

-进入python环境:

python3

-输入代码

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('Llama2-Chinese-13b-Chat',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('Llama2-Chinese-13b-Chat',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介绍一下深圳\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)


---页面测试

-使用 gradio 搭建页面

pip3 install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

---加载模型并启动服务

vi /notebooks/Llama2-Chinese/examples/chat_gradio.py

到94行:
    demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
1
修改为:
    demo.queue().launch(share=False, debug=True, server_name="0.0.0.0", server_port=15550)

启动脚本:

python3 examples/chat_gradio.py --model_name_or_path Llama2-Chinese-13b-Chat


-如果出现下面的错误:

  File "/notebooks/Llama2-Chinese/examples/chat_gradio.py", line 94
    demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
                                               ^
SyntaxError: invalid character ',' (U+FF0C)

-则按照下面的步骤修改代码:

vi  /notebooks/Llama2-Chinese/examples/chat_gradio.py
:94 
修改中文逗号,为英文逗号,
94    demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
=>
94    demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")

---测试

http://g.htmltoo.com:15550



llama2的模型下载需要去官网申请,申请可能需要科学上网,下载不需要,

申请地址:

https://ai.meta.com/resources/models-and-libraries/llama-downloads/


cd  /opt

git clone https://github.com/facebookresearch/llama.git

cd llama

pip3 install -e .

pip3 install --upgrade torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple fastNLP


llama2有7B、13B和70B三个版本,分别为70 亿、130 亿和 700 亿三种参数变体,参数越多对配置要求越高。

每个版本有可调参的版本和chat版本,我这里选择第一个7B的版本。

Llama-2-7b

Llama-2-7b-chat

Llama-2-13b

Llama-2-13b-chat

Llama-2-70b

Llama-2-70b-chat


bash download.sh

按照提示,贴上邮件内提供的下载的URL,选择需要下载的版本,然后等待下载完成,7B的文件13G比较大得等半天,其他的版本更大。



https://blog.csdn.net/cecere/article/details/132120423

https://blog.csdn.net/zengNLP/article/details/131965453

https://zhuanlan.zhihu.com/p/647067870


签名:这个人很懒,什么也没有留下!
最新回复 (0)
返回