当前位置: 首页 > news >正文

docker部署的ragflow服务迁移 数据卷迁移

[docker ragflow数据迁移]目录

  • 背景
  • 一、我的配置文件
    • ✅ 数据存储路径一览(基于你的配置文件)
    • 关于这些 `volumes` 的说明
    • 📁 如何查看这些卷在本地的具体位置?
        • 可能用到的docker 命令如下。
    • 📦 总结建议
    • 🛠️ 如果想自定义这些路径
    • TODO 未完 待续 。。。

背景

通过 docker compose -f docker-compose.yml -p ragflow up -d 部署了 ragflow本地服务,现在想迁移到另一台服务器上,服务可以通过github 拉取最新的 https://github.com/infiniflow/ragflow 代码,重新配置启动。
但是原服务器上添加过的数据,比如知识库,怎么迁移到新服务器,避免重复添加,重复操作呢?

一、我的配置文件

启动 ragflow 基础配置文件如下 : docker-compose-base.yml
可以看到,各个基础服务的 volumes 数据卷名称和挂载信息

(base) root@hostname:/usr/local/soft/ai/rag/v0.19.0/ragflow/docker# vim docker-compose-base.ymlservices:es01:container_name: ragflow-es-01profiles:- elasticsearchimage: elasticsearch:${STACK_VERSION}volumes:- esdata01:/usr/share/elasticsearch/dataports:- ${ES_PORT}:9200env_file: .envenvironment:- node.name=es01- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}- bootstrap.memory_lock=false- discovery.type=single-node- xpack.security.enabled=true- xpack.security.http.ssl.enabled=false- xpack.security.transport.ssl.enabled=false- cluster.routing.allocation.disk.watermark.low=5gb- cluster.routing.allocation.disk.watermark.high=3gb- cluster.routing.allocation.disk.watermark.flood_stage=2gb- TZ=${TIMEZONE}mem_limit: ${MEM_LIMIT}ulimits:memlock:soft: -1hard: -1healthcheck:test: ["CMD-SHELL", "curl http://localhost:9200"]interval: 10stimeout: 10sretries: 120networks:- ragflowrestart: on-failureopensearch01:container_name: ragflow-opensearch-01profiles:- opensearchimage: hub.icert.top/opensearchproject/opensearch:2.19.1volumes:- osdata01:/usr/share/opensearch/dataports:- ${OS_PORT}:9201env_file: .envenvironment:- node.name=opensearch01- OPENSEARCH_PASSWORD=${OPENSEARCH_PASSWORD}- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_PASSWORD}- bootstrap.memory_lock=false- discovery.type=single-node- plugins.security.disabled=false- plugins.security.ssl.http.enabled=false- plugins.security.ssl.transport.enabled=true- cluster.routing.allocation.disk.watermark.low=5gb- cluster.routing.allocation.disk.watermark.high=3gb- cluster.routing.allocation.disk.watermark.flood_stage=2gb- TZ=${TIMEZONE}- http.port=9201mem_limit: ${MEM_LIMIT}ulimits:memlock:soft: -1hard: -1healthcheck:test: ["CMD-SHELL", "curl http://localhost:9201"]interval: 10stimeout: 10sretries: 120networks:- ragflowrestart: on-failureinfinity:container_name: ragflow-infinityprofiles:- infinityimage: infiniflow/infinity:v0.6.0-dev3volumes:- infinity_data:/var/infinity- ./infinity_conf.toml:/infinity_conf.tomlcommand: ["-f", "/infinity_conf.toml"]ports:- ${INFINITY_THRIFT_PORT}:23817- ${INFINITY_HTTP_PORT}:23820- ${INFINITY_PSQL_PORT}:5432env_file: .envenvironment:- TZ=${TIMEZONE}mem_limit: ${MEM_LIMIT}ulimits:nofile:soft: 500000hard: 500000networks:- ragflowhealthcheck:test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]interval: 10stimeout: 10sretries: 120restart: on-failuresandbox-executor-manager:container_name: ragflow-sandbox-executor-managerprofiles:- sandboximage: ${SANDBOX_EXECUTOR_MANAGER_IMAGE-infiniflow/sandbox-executor-manager:latest}privileged: trueports:- ${SANDBOX_EXECUTOR_MANAGER_PORT-9385}:9385env_file: .envvolumes:- /var/run/docker.sock:/var/run/docker.socknetworks:- ragflowsecurity_opt:- no-new-privileges:trueenvironment:- TZ=${TIMEZONE}- SANDBOX_EXECUTOR_MANAGER_POOL_SIZE=${SANDBOX_EXECUTOR_MANAGER_POOL_SIZE:-3}- SANDBOX_BASE_PYTHON_IMAGE=${SANDBOX_BASE_PYTHON_IMAGE:-infiniflow/sandbox-base-python:latest}- SANDBOX_BASE_NODEJS_IMAGE=${SANDBOX_BASE_NODEJS_IMAGE:-infiniflow/sandbox-base-nodejs:latest}- SANDBOX_ENABLE_SECCOMP=${SANDBOX_ENABLE_SECCOMP:-false}- SANDBOX_MAX_MEMORY=${SANDBOX_MAX_MEMORY:-256m}- SANDBOX_TIMEOUT=${SANDBOX_TIMEOUT:-10s}healthcheck:test: ["CMD", "curl", "http://localhost:9385/healthz"]interval: 10stimeout: 5sretries: 5restart: on-failuremysql:# mysql:5.7 linux/arm64 image is unavailable.image: mysql:8.0.39container_name: ragflow-mysqlenv_file: .envenvironment:- MYSQL_ROOT_PASSWORD=${MYSQL_PASSWORD}- TZ=${TIMEZONE}command:--max_connections=1000--character-set-server=utf8mb4--collation-server=utf8mb4_unicode_ci--default-authentication-plugin=mysql_native_password--tls_version="TLSv1.2,TLSv1.3"--init-file /data/application/init.sql--binlog_expire_logs_seconds=604800ports:- ${MYSQL_PORT}:3306volumes:- mysql_data:/var/lib/mysql- ./init.sql:/data/application/init.sqlnetworks:- ragflowhealthcheck:test: ["CMD", "mysqladmin" ,"ping", "-uroot", "-p${MYSQL_PASSWORD}"]interval: 10stimeout: 10sretries: 3restart: on-failureminio:image: quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Zcontainer_name: ragflow-miniocommand: server --console-address ":9001" /dataports:- ${MINIO_PORT}:9000- ${MINIO_CONSOLE_PORT}:9001env_file: .envenvironment:- MINIO_ROOT_USER=${MINIO_USER}- MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}- TZ=${TIMEZONE}volumes:- minio_data:/datanetworks:- ragflowrestart: on-failurehealthcheck:test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]interval: 30stimeout: 20sretries: 3redis:# swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/valkey/valkey:8image: valkey/valkey:8container_name: ragflow-rediscommand: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lruenv_file: .envports:- ${REDIS_PORT}:6379volumes:- redis_data:/datanetworks:- ragflowrestart: on-failurehealthcheck:test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]interval: 5stimeout: 3sretries: 3start_period: 10svolumes:esdata01:driver: localosdata01:driver: localinfinity_data:driver: localmysql_data:driver: localminio_data:driver: localredis_data:driver: localnetworks:ragflow:driver: bridg

根据上面提供的 docker-compose-base.yml 配置文件内容,下面是 RAGFlow 各个组件在本地的数据存储路径说明:


✅ 数据存储路径一览(基于你的配置文件)

组件容器内路径本地路径(默认)用途说明
MySQL/var/lib/mysqlmysql_data(默认在 Docker 管理的 volume 中)存储用户、文档元数据等结构化信息
MinIO/dataminio_data(Docker volume)存储上传的原始文档、处理后的文本等
Elasticsearch (es01)/usr/share/elasticsearch/dataesdata01(Docker volume)存储关键词搜索索引数据
OpenSearch (opensearch01)/usr/share/opensearch/dataosdata01(Docker volume)可选搜索引擎,用于检索
Infinity(向量数据库)/var/infinityinfinity_data(Docker volume)存储向量索引
Redis/dataredis_data(Docker volume)缓存数据、任务状态等
Sandbox Executor Manager无持久化数据用于运行沙箱任务,不持久化

关于这些 volumes 的说明

你在 docker-compose-base.yml 文件末尾定义了如下 volume:

volumes:esdata01:driver: localosdata01:driver: localinfinity_data:driver: localmysql_data:driver: localminio_data:driver: localredis_data:driver: local

这些 volumesDocker 管理的命名卷(named volumes),默认情况下它们会存储在 Docker 的默认数据目录中:

  • Linux 系统下通常为:/var/lib/docker/volumes/<volume_name>/_data

例如:

  • MySQL 数据实际路径为:/var/lib/docker/volumes/mysql_data/_data
  • MinIO 数据实际路径为:/var/lib/docker/volumes/minio_data/_data

📁 如何查看这些卷在本地的具体位置?

可能用到的docker 命令如下。

📦 总结建议

目标建议
查看数据存储位置使用 docker volume inspect <volume_name>
备份数据备份对应路径,如 /var/lib/docker/volumes/mysql_data/_data
自定义路径修改 docker-compose-base.yml 中的 volumes 配置
清理数据删除对应 volume,如 docker volume rm mysql_data

所以,我们想查看某个 volume 的实际路径,可以有两种方法:

  • find / -name ‘esdata01
  • docker volume inspect xxxx

本次我们使用如下命令,esdata01 是从上面配置文件中获悉:

(base) root@hostname:/home/ltkj# docker volume inspect esdata01
[]
Error response from daemon: get esdata01: no such volume(base) root@hostname:/home/ltkj# docker volume inspect ragflow_esdata01

输出示例:


[{"CreatedAt": "2025-06-13T10:52:28Z","Driver": "local","Labels": {"com.docker.compose.config-hash": "3bcef595c5f477c290ccaa07cbf05671d287fef95c1fa4b67fad841e66481794","com.docker.compose.project": "ragflow","com.docker.compose.version": "2.34.0","com.docker.compose.volume": "esdata01"},"Mountpoint": "/var/lib/docker/volumes/ragflow_esdata01/_data","Name": "ragflow_esdata01","Options": null,"Scope": "local"}
]

在这里插入图片描述

这样就能知道 esdata01,minio,MySQL 等基础服务的数据存在哪里了。
在这里插入图片描述


🛠️ 如果想自定义这些路径

可以在 volumes 配置中指定本地路径,例如:

volumes:mysql_data:driver: localdriver_opts:type: noneo: binddevice: /opt/ragflow/mysql_data

这样 MySQL 数据就会存在 /opt/ragflow/mysql_data


TODO 未完 待续 。。。

http://www.dtcms.com/a/298818.html

相关文章:

  • 内存优化:从堆分配到零拷贝的终极重构
  • Web前端:JavaScript 随机点名系统案例详解
  • 肺癌预测模型实战案例
  • C51:用DS1302时钟读取和设置时间
  • 静电释放检测误报率↓79%:陌讯多模态融合算法实战解析
  • 算法:数组part02: 209. 长度最小的子数组 + 59.螺旋矩阵II + 代码随想录补充58.区间和 + 44. 开发商购买土地
  • 2025年ASOC SCI2区TOP,无人机集群路径规划与任务分配的组合优化在多障碍战场环境中的应用,深度解析+性能实测
  • 汽车功能安全 -- TC3xx Error Pin监控机制
  • Post-Training on PAI (5): PAI-EasyDistill, PAI 自研大模型蒸馏框架
  • JAVA知识点(二):数据库
  • rk3588跑通速腾雷达RS16的ros点云读取
  • 装修进度管理系统功能对比:主流工具9选
  • Apipost:离线可用+AI全栈覆盖,国产API开发协作工具新标杆
  • HTTP响应状态码详解
  • 通过Deepseek找工作
  • java-方法的综合练习
  • 屏幕适配--像素篇
  • 100条常用SQL语句大全
  • Linux系统编程——进程
  • 两个MCU互联采集数据
  • kubesphere安装使用
  • 手写数组洗牌算法
  • Vue2 element cascader级联选择器懒加载编辑时回显数据
  • 【VLAs篇】06:从动作词元化视角谈VLA模型的综述
  • 异常(全)
  • which soffice soffice not found
  • Wordpress主题配置
  • 2025年7月24日·AI今日头条
  • KNN算法:从原理到实战全解析
  • Execel文档批量替换标签实现方案