ChaGLM3在多显卡上运行

在明确是16位量化的时候，用ChatGLM项目中的utils文件的load_model_on_gpus方法，进行对model的配置， num_gpus=4,意思是说在4块显卡上运行。
from utils import load_model_on_gpus

model = load_model_on_gpus(model_name, num_gpus=4)

if quantize == 16:
model = load_model_on_gpus(model_name, num_gpus=4)
else:
model = AutoModel.from_pretrained(model_name, device_map="auto",trust_remote_code=True).half().quantize(quantize).cuda()

运行的时候，用命令参数 -d，指定所在运行的显卡。

1	parser.add_argument('--device', '-d', help='device， -1 means cpu, other means gpu ids', default='0')

1	python fastapiGPU.py -d 0,1,2,3

相当于SD的 CUDA_VISIBLE_DEVICES。

webui-user.sh中加入export参数。

export CUDA_VISIBLE_DEVICES=0,1,2,3

./stable-diffusion-webui/webui.sh –listen –device-id 1
这样运行，SD可以同时使用显卡0和显卡1.

CUDA_VISIBLE_DEVICES=0,1,2,3 python launch.py –share

CUDA_VISIBLE_DEVICES=1 python launch.py –share

cmd_args.py –device-id

device-id参数是在cmd_args.py文件中出现的。