当前位置：首页 > news >正文

在SCNet DCU异构环境使用Ollama 0.5.7启动deepseek等大模型

news 2025/10/28 13:49:20

SCNet DCU异构环境,创立的时候支持使用官方Ollama镜像。

创立使用官方Ollama的镜像

创建镜像的时候，选择异构加速卡AI，选“基础镜像” -Ollama镜像即可，是0.5.7版本。这个版本，说实话，是有点老的，一些新的模型，比如qwen3和文心4.5大模型，都不支持！

启动服务

ollama serve

使用ernie 0.3B模型测试

ollama run dengcao/ERNIE-4.5-0.3B-PT

直接退出了

efore.free="485.6 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="485.6 GiB" now.free_swap="0 B"
time=2025-10-27T11:24:06.240Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="64.0 GiB" now="64.0 GiB"
time=2025-10-27T11:24:06.489Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.252584939 model=/root/.ollama/models/blobs/sha256-72511d0ebf100f82b036a1a868cd3a2b5a1c0c99a51ed4cedc5e726313def1ca
time=2025-10-27T11:24:06.491Z level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="503.6 GiB" before.free="485.6 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="485.6 GiB" now.free_swap="0 B"
time=2025-10-27T11:24:06.491Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="64.0 GiB" now="64.0 GiB"
time=2025-10-27T11:24:06.739Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.502722848 model=/root/.ollama/models/blobs/sha256-72511d0ebf100f82b036a1a868cd3a2b5a1c0c99a51ed4cedc5e726313def1ca

不支持这个新模型：Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

用deepseek 1.5b模型试试

ollama run deepseek-r1:1.5b

这个就很快啊！

你好！很高兴见到你，有什么我可以帮忙的吗？😊[GIN] 2025/10/27 - 11:30:09 | 200 | 1.075429335s |

1秒回复！

问个经典树上几只鸟的问题：

树上10只鸟，打死2只，还有几只？
time=2025-10-27T11:31:14.072Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
⠙ time=2025-10-27T11:31:14.130Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<｜User｜>hello<｜Assistant｜><think>\n\n</think>\n\nHello! How can I assist you today? 😊<｜end▁of▁sentence｜><｜User｜>你好啊<｜Assistant｜><think>\n\n</think>\n\n你好！很高兴见到你，有什么我可以帮忙的吗？😊<｜end▁of▁sentence｜><｜User｜>树上10只鸟，打死2 只，还有几只？<｜Assistant｜>"
time=2025-10-27T11:31:14.150Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=41 prompt=59 used=41 remaining=18
<think>
嗯，好的。让我仔细想想这个问题。题目是：“树上有10只鸟，枪声让2只飞走了，那么还剩下多少只呢？”一开始可能会觉得答案很
简单，就是10减去2，等于8只鸟了。不过，我还是有点犹豫，为什么会有这样的疑问呢？或许是因为有时候会有人误解题目中的“打死”这个词。在中文里，“打死”通常是
指确保所有鸟都被击落，包括已经飞走的和刚被击落的那些。所以，在这种情况下，除了已经飞走的2只之外，树上原本就有10只鸟
，那么剩下的应该是全部的10只，而不是8只。再仔细想想，如果枪声让2只鸟飞走了，那说明有2只鸟已经被击中并离开树了，这时候树上的鸟数就剩下原来的总数减去被击中的数
量。所以，正确的答案应该还是10只鸟Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _ZL12soft_max_f32ILb1ELi0ELi0EfEvPKfPKT2_Pfiiffffj please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program !
，而不是8只。当然，也有可能是因为语言的歧义性或者其他因素，导致误解。不过，在中文里，“打死”通常指的是确保所有鸟都被击落，包括刚飞
走的和被击中而飞落的。因此，剩下的鸟数应该是原来的总数，也就是10只。综上所述，我觉得正确答案应该是10只鸟，而不是8只。
</think>树上有10只鸟，枪声让2只飞走了。在这种情况下，按照中文语境，“打死”通常意味着确保所有被击中的鸟都被击落，包括已经飞走
的和刚被击落的。因此，除了已经被击走的2只鸟外，剩下的鸟数仍然是树上的全部10只。所以，正确的答案是：还剩下10只鸟。[GIN] 2025/10/27 - 11:31:20 | 200 |  6.014265646s |       127.0.0.1 | POST     "/api/chat"

6秒回答，这个速度，这个答案，deepseek真的饿很能打！

deepseek 14b模型

24秒回答树上鸟的问题

qwen3 14b模型

千问3也是报错，可能是ollama的版本有点低的缘故吧

time=2025-10-27T11:48:44.361Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251786058 model=/root/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
time=2025-10-27T11:48:44.361Z level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="503.6 GiB" before.free="479.4 GiB" before.free_swap="0 B" now.total="503.6 GiB" now.free="479.4 GiB" now.free_swap="0 B"
time=2025-10-27T11:48:44.361Z level=DEBUG source=amd_linux.go:490 msg="updating rocm free memory" gpu=GPU-714e872b32e82041 name=1d94:6210 before="51.0 GiB" now="51.0 GiB"
time=2025-10-27T11:48:44.611Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.502008897 model=/root/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e

确实报错了：Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

总结

官方Ollama镜像，推理起来速度快，但是不支持新模型，比如文心大模型和qwen3都是报错。看到报错了，确实是版本的问题：Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

问题是自己装的ollama新版本，是cpu版本，速度慢。参见：https://skywalk.blog.csdn.net/article/details/153737200

查看全文

http://www.dtcms.com/a/537808.html