知识库ragflow和dify安装
ragflow的安装参考官方的Quick start文档,如下。
https://ragflow.io/docs/dev/
在虚拟机中进行安装,虚拟机8核处理器,16G内存。安装有操作系统Ubuntu 22.04.4 LTS。
设置单个进程可使用的内存映射区域的最大数量,默认值为65530,以下设置为262144。并且将配置写入/etc/sysctl.conf 文件,设备重启可自动配置上。
```
# sysctl vm.max_map_count
vm.max_map_count = 65530
#
# sysctl -w vm.max_map_count=262144
vm.max_map_count = 262144
#
# sysctl vm.max_map_count
vm.max_map_count = 262144
#
#
# cat /etc/sysctl.conf
#
# /etc/sysctl.conf - Configuration file for setting system variables
# See /etc/sysctl.d/ for additional system variables.
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
# Uncomment the following to stop low-level messages on console
#kernel.printk = 3 4 1 3
vm.max_map_count=262144
```
下载ragflow工程,切换到最新的版本v0.16.0。
```
# git clone https://github.com/infiniflow/ragflow.git
Cloning into 'ragflow'...
remote: Enumerating objects: 26810, done.
remote: Counting objects: 100% (4/4), done.
remote: Total 26810 (delta 3), reused 3 (delta 3), pack-reused 26806 (from 2)
Receiving objects: 100% (26810/26810), 62.49 MiB | 9.49 MiB/s, done.
Resolving deltas: 100% (19405/19405), done.
#
# cd ragflow/
#
# git checkout -f v0.16.0
Note: switching to 'v0.16.0'.
```
安装依赖包:
```
# apt install docker docker-compose-v2
```
下载预编译好的docker镜像,出现如下的错误信息。
```
# docker compose -f docker/docker-compose.yml up
WARN[0000] The "HF_ENDPOINT" variable is not set. Defaulting to a blank string.
WARN[0000] The "MACOS" variable is not set. Defaulting to a blank string.
[+] Running 9/11
⠏ minio [⣿⣿⣿⣿⣿⣿⣿] 51.27MB / 52.78MB Pulling 15.9s
✔ f72461870632 Pull complete 3.5s
✔ 683391db8929 Pull complete 3.6s
⠋ ba8b8055313f Extracting [==================================================>] 35.51MB/35.... 10.1s
✔ a8e0787fb7ed Download complete 4.7s
✔ fd20cadb8d39 Download complete 5.0s
✔ 3738ac54d510 Download complete 6.3s
✔ 128c59a31db4 Download complete 6.5s
✘ mysql Error Get "https://registry-1.docker.io/v2/": context deadline exceeded 15.9s
✘ ragflow Error context canceled 15.9s
✘ redis Error context canceled 15.9s
Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded
```
在文件daemon.json中写入以下镜像源地址:
```
# cat /etc/docker/daemon.json
{
"registry-mirrors": ["https://docker.m.daocloud.io/", "https://huecker.io/",
"https://dockerhub.timeweb.cloud",
"https://noohub.ru/",
"https://dockerproxy.com",
"https://docker.nju.edu.cn",
"http://hub.daocloud.io",
"https://dockerhub.icu",
"https://docker.ckyl.me",
"https://docker.awsl9527.cn",
"https://hub.uuuadc.top",
"https://docker.anyhub.us.kg",
"https://dockerhub.jobcher.com",
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://mirror.baidubce.com",
"https://mirror.aliyuncs.com",
"https://dockerpull.org",
"https://hub.geekery.cn",
"https://docker.1ms.run",
"https://docker.1panel.dev",
"https://docker.1panel.live",
"https://docker.foreverlink.love",
"https://dytt.online",
"https://func.ink",
"https://lispy.org",
"https://docker.nju.edu.cn",
"https://docker.mirrors.ustc.edu.cn"]
}
```
或者,按照官方文档设置,在 docker/.env 文件内根据变量 RAGFLOW_IMAGE
的注释提示选择华为云或者阿里云的相应镜像。
- 华为云镜像名:
swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow
- 阿里云镜像名:
registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow
这里,使用在daemon.json文件中添加镜像源的方法,需要重新启动docker:
```
# systemctl daemon-reload
# systemctl restart docker
# systemctl status docker
```
镜像拉下来之后,启动ragflow,注意进入到ragflow/docker子目录下执行命令,不然会报错。
```
# cd docker
# docker compose up
```
查看启动的docker镜像:
```
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
infiniflow/ragflow v0.16.0-slim dc20d4239e94 3 weeks ago 7.03GB
valkey/valkey 8 405e0e9d7a59 7 weeks ago 134MB
mysql 8.0.39 f5da8fc4b539 7 months ago 573MB
quay.io/minio/minio RELEASE.2023-12-20T01-00-02Z 73f9f0b5b015 14 months ago 151MB
docker.elastic.co/elasticsearch/elasticsearch 8.11.3 792fab0c0bd8 15 months ago 1.43GB
```
查看相关日志命令:
```
# docker logs -f ragflow-es-01
# docker logs -f ragflow-server
# docker logs -f ragflow-redis
# docker logs -f ragflow-mysql
```
打开浏览器登录:
需要注册一个账号,登录之后,首先关联LLM提供商,这里可以选择阿里的通义千问,也支持硅基流动,后者注册之后应该是有几十块钱的金额可以使用,测试基本够用:
之后,就可以创建知识库了。
手动上传资料文件,注意这里要执行解析操作,否则无法使用。
最后,创建聊天助理,关联以上创建的知识库,提示引擎中可以修改提示词,模型设置中可更换大语言模型。
在另外一台虚拟机上安装dify,系统Ubuntu 22.04.4 LTS。如下下载dify工程:
```
# git clone https://github.com/langgenius/dify.git
```
拉取镜像:
```
# cd dify/docker/
# cp .env.example .env
# docker compose up -d
```
由于网络限制,还是需要修改/etc/docker/daemon.json,添加镜像源。但是下载还是很慢很慢,根据经验一般每天下午6点以后,网络比较好。
完成之后,dify自动启动。
看一下/var/lib/docker/目录的大小,占用了10几个G的空间:
```
root@logging:/var/lib/docker# du -sh *
112K buildkit
556K containers
4.0K engine-id
13M image
100K network
12G overlay2
16K plugins
4.0K runtimes
4.0K swarm
4.0K tmp
48K volumes
```
查看启动的docker镜像:
```
root@logging:~# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu/squid latest dae40da440fe 2 days ago 243MB
langgenius/dify-plugin-daemon 0.0.3-local 98e493768cf1 2 days ago 909MB
langgenius/dify-api 1.0.0 e75fe639e420 2 days ago 2.14GB
langgenius/dify-web 1.0.0 0325f9a1e172 2 days ago 472MB
postgres 15-alpine afbf3abf6aeb 3 days ago 273MB
nginx latest b52e0b094bc0 3 weeks ago 192MB
redis 6-alpine 6dd588768b9b 7 weeks ago 30.2MB
langgenius/dify-sandbox 0.2.10 4328059557e8 4 months ago 567MB
semitechnologies/weaviate 1.19.0 8ec9f084ab23 22 months ago 52.5MB
```
打开浏览器,登录dify:
注册账号,登录dify。
dify数据源支持web网站爬取,这一点相对于ragflow比较好。