当前位置：首页 > news >正文

调试parlant的大模型配置，最终自己动手写了g4f的模块挂载

news 2025/10/1 9:43:05

parlant安装参见：https://skywalk.blog.csdn.net/article/details/152094280

手写g4f模块参见：https://skywalk.blog.csdn.net/article/details/152253434

说实话，parlant是我目前接触到的，最难配大模型的一个项目了。它没有展示配置文件，导致要换模型，都不知道该怎么写？主要是没有太仔细看手册....但是手册没有考虑到非官方大模型提供商的情况，官方倒是给了怎么样写自己的大模型，但是太复杂了，还不如我拿一个官方的py文件修改呢！

另外这几天Trae抽风，它是一点忙也没有帮上！

另外这几天g4f的gpt-4o模型也有问题，也增加了调试难度。

先看看parlant的手册

Environment Variables

Configure the Ollama service using these environment variables:

# Ollama server URL (default: http://localhost:11434)
export OLLAMA_BASE_URL="http://localhost:11434"# Model size to use (default: 4b)
# Options: gemma3:1b, gemma3:4b, llama3.1:8b, gemma3:12b, gemma3:27b, llama3.1:70b, llama3.1:405b
export OLLAMA_MODEL="gemma3:4b"# Embedding model (default: nomic-embed-text)
# Options: nomic-embed-text, mxbai-embed-large
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"# API timeout in seconds (default: 300)
export OLLAMA_API_TIMEOUT="300"

Example Configuration

# For development (fast, good balance)
export OLLAMA_MODEL="gemma3:4b"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"
export OLLAMA_API_TIMEOUT="180"# higher accuracy cloud
export OLLAMA_MODEL="gemma3:4b"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"
export OLLAMA_API_TIMEOUT="600"

Recommended Models

⚠️ IMPORTANT: Pull these models before running Parlant to avoid API timeouts during first use:

Text Generation Models

# Recommended for most use cases (good balance of speed/accuracy)
ollama pull gemma3:4b-it-qat# Fast but may struggle with complex schemas
ollama pull gemma3:1b# embedding model required for creating embeddings
ollama pull nomic-embed-text

Large Models (Cloud/High-end Hardware Only)

# Better reasoning capabilities
ollama pull llama3.1:8b# High accuracy for complex tasks
ollama pull gemma3:12b# Very high accuracy (requires more resources)
ollama pull gemma3:27b-it-qat# ⚠️ WARNING: Requires 40GB+ GPU memory
ollama pull llama3.1:70b# ⚠️ WARNING: Requires 200GB+ GPU memory (cloud-only)
ollama pull llama3.1:405b

Embedding Models

To use custom embedding model set OLLAMA_EMBEDDING_MODEL environment value as required name Note that this implementation is tested using nomic-embed-text ⚠️ IMPORTANT: Support for using other embedding models has been added including a custom embedding model of your own choice Ensure to set OLLAMA_EMBEDDING_VECTOR_SIZE which is compatible with your own embedding model before starting the server Tested with snowflake-arctic-embed with vector size of 1024 It is not NECESSARY to put OLLAMA_EMBEDDING_VECTOR_SIZE if you are using the supported nomic-embed-text, mxbai-embed-large or bge-m3. The vector size defaults to 768, 1024 and 1024 respectively for these

# Alternative embedding model (512 dimensions)
ollama pull mxbai-embed-large:latest

配置

export PARLANT_MODEL_URL="http://192.168.1.5:1337/v1"
export PARLANT_MODEL_API_KEY="key sample"
export PARLANT_MODEL_NAME="gpt-4o"set PARLANT_MODEL_URL="http://192.168.1.5:1337/v1"
set PARLANT_MODEL_API_KEY="key sample"
set PARLANT_MODEL_NAME="gpt-4o"set PARLANT_MODEL_URL="http://192.168.0.98:1337/v1"
set PARLANT_MODEL_API_KEY="key sample"
set PARLANT_MODEL_NAME="gpt-4o"

不行

看看parlant源代码

ollama里面的配置

class OllamaEstimatingTokenizer(EstimatingTokenizer):"""Simple tokenizer that estimates token count for Ollama models."""def __init__(self, model_name: str):self.model_name = model_nameself.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")@overrideasync def estimate_token_count(self, prompt: str) -> int:"""Estimate token count using tiktoken"""tokens = self.encoding.encode(prompt)return int(len(tokens) * 1.15)class OllamaSchematicGenerator(SchematicGenerator[T]):"""Schematic generator that uses Ollama models."""supported_hints = ["temperature", "max_tokens", "top_p", "top_k", "repeat_penalty", "timeout"]def __init__(self,model_name: str,logger: Logger,base_url: str = "http://localhost:11434",default_timeout: int | str = 300,) -> None:self.model_name = model_nameself.base_url = base_url.rstrip("/")self._logger = loggerself._tokenizer = OllamaEstimatingTokenizer(model_name)self._default_timeout = default_timeoutself._client = ollama.AsyncClient(host=base_url)@property@overridedef id(self) -> str:return f"ollama/{self.model_name}"@property@overridedef tokenizer(self) -> EstimatingTokenizer:return self._tokenizer@property@overridedef max_tokens(self) -> int:if "1b" in self.model_name.lower():return 12288elif "4b" in self.model_name.lower():return 16384elif "8b" in self.model_name.lower():return 16384elif "12b" in self.model_name.lower() or "70b" in self.model_name.lower():return 16384elif "27b" in self.model_name.lower() or "405b" in self.model_name.lower():return 32768else:return 16384

这里设定了base_url 为：base_url: str = "http://localhost:11434",

openai的相关代码

class OpenAISchematicGenerator(SchematicGenerator[T]):supported_openai_params = ["temperature", "logit_bias", "max_tokens"]supported_hints = supported_openai_params + ["strict"]unsupported_params_by_model: dict[str, list[str]] = {"gpt-5": ["temperature"],}def __init__(self,model_name: str,logger: Logger,tokenizer_model_name: str | None = None,) -> None:self.model_name = model_nameself._logger = loggerself._client = AsyncClient(api_key=os.environ["OPENAI_API_KEY"])self._tokenizer = OpenAIEstimatingTokenizer(model_name=tokenizer_model_name or self.model_name)

deepseek

deepseek的，至少知道怎么设置base url

class DeepSeekEstimatingTokenizer(EstimatingTokenizer):def __init__(self, model_name: str) -> None:self.model_name = model_nameself.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")@overrideasync def estimate_token_count(self, prompt: str) -> int:tokens = self.encoding.encode(prompt)return len(tokens)class DeepSeekSchematicGenerator(SchematicGenerator[T]):supported_deepseek_params = ["temperature", "logit_bias", "max_tokens"]supported_hints = supported_deepseek_params + ["strict"]def __init__(self,model_name: str,logger: Logger,) -> None:self.model_name = model_nameself._logger = loggerself._client = AsyncClient(base_url="https://api.deepseek.com",api_key=os.environ["DEEPSEEK_API_KEY"],)self._tokenizer = DeepSeekEstimatingTokenizer(model_name=self.model_name)

问题是，它也要用self.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")

这样我没有gpt模型，是不是就不能用了？

glm的调用

class GLMEmbedder(Embedder):supported_arguments = ["dimensions"]def __init__(self, model_name: str, logger: Logger) -> None:self.model_name = model_nameself._logger = loggerself._client = AsyncClient(base_url="https://open.bigmodel.cn/api/paas/v4", api_key=os.environ["GLM_API_KEY"])self._tokenizer = GLMEstimatingTokenizer(model_name=self.model_name)

parlant调用Ollama api的手册

参考Ollama的手册：docs/adapters/nlp/ollama.md · Gitee 极速下载/parlant - 码云 - 开源中国

Environment Variables

Configure the Ollama service using these environment variables:

# Ollama server URL (default: http://localhost:11434)
export OLLAMA_BASE_URL="http://localhost:11434"# Model size to use (default: 4b)
# Options: gemma3:1b, gemma3:4b, llama3.1:8b, gemma3:12b, gemma3:27b, llama3.1:70b, llama3.1:405b
export OLLAMA_MODEL="gemma3:4b"# Embedding model (default: nomic-embed-text)
# Options: nomic-embed-text, mxbai-embed-large
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"# API timeout in seconds (default: 300)
export OLLAMA_API_TIMEOUT="300"

Example Configuration

# For development (fast, good balance)
export OLLAMA_MODEL="gemma3:4b"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"
export OLLAMA_API_TIMEOUT="180"# higher accuracy cloud
export OLLAMA_MODEL="gemma3:4b"
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text"
export OLLAMA_API_TIMEOUT="600"

Recommended Models

⚠️ IMPORTANT: Pull these models before running Parlant to avoid API timeouts during first use:

Text Generation Models

# Recommended for most use cases (good balance of speed/accuracy)
ollama pull gemma3:4b-it-qat# Fast but may struggle with complex schemas
ollama pull gemma3:1b# embedding model required for creating embeddings
ollama pull nomic-embed-text

Large Models (Cloud/High-end Hardware Only)

# Better reasoning capabilities
ollama pull llama3.1:8b# High accuracy for complex tasks
ollama pull gemma3:12b# Very high accuracy (requires more resources)
ollama pull gemma3:27b-it-qat# ⚠️ WARNING: Requires 40GB+ GPU memory
ollama pull llama3.1:70b# ⚠️ WARNING: Requires 200GB+ GPU memory (cloud-only)
ollama pull llama3.1:405b

Embedding Models

# Alternative embedding model (512 dimensions)
ollama pull mxbai-embed-large:latest

embedding

关于openai的那个embedding问题，可以使用这个

import tiktokenclass UniversalTokenizer:def __init__(self, encoding_name="cl100k_base"):self.encoding = tiktoken.get_encoding(encoding_name)def estimate(self, text, ratio=1.1):return int(len(self.encoding.encode(text)) * ratio)

调用模型

import parlant.sdk as p
from parlant.sdk import NLPServicesasync with p.Server(nlp_service=NLPServices.ollama) as server:agent = await server.create_agent(name="Healthcare Agent",description="Is empathetic and calming to the patient.",)

准备这样做

直接改代码，不用gpt-4o-2024-08-06 ,直接就用字符长度算了

class DeepSeekEstimatingTokenizer(EstimatingTokenizer):def __init__(self, model_name: str) -> None:self.model_name = model_name# self.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")# self.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")@overrideasync def estimate_token_count(self, prompt: str) -> int:# tokens = self.encoding.encode(prompt)tokens = promptreturn len(tokens)

I get it! 我知道了！

when i changed base_url to my llm server ,such as 192.168.1.5:1337 or 127.0.0.1:1337

this change effect gpt-4o-2024-08-06 ,then error

当我修改base_url的时候，我可能也修改了gpt-4o-2024-08-06到自己的自定义llm服务器，导致会报没有这个模型。

这里吐槽一下，我的拼音输入法突然快捷键调不出来了，需要用鼠标点状态栏切换，真是屋漏偏逢连夜雨啊！

so I need to use other llms such as deepseek or ollama ,then gpt-4o-2024-08-06 can be ok

所以我只需要使用deepseek或者ollama模型的配置，这样就不会干扰pt-4o-2024-08-06模型

首先测试国内pt-4o-2024-08-06模型的连通性：

import tiktoken
import time
import asyncioprompt="hello world 测试完成"
prompt='国内无法用这个模型怎么办？ tiktoken.encoding_for_model("gpt-4o-2024-08-06")'
testencoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")
tokens = testencoding.encode(prompt)
print(tokens, len(tokens))

output:

[48450, 53254, 5615, 41713, 184232, 50182, 4802, 260, 8251, 2488, 154030, 11903, 10928, 568, 70, 555, 12, 19, 78, 12, 1323, 19, 12, 3062, 12, 3218, 1405] 27

nlp_service=load_custom_nlp_service,

import parlant.sdk as p
from parlant.sdk import NLPServicesasync with p.Server(nlp_service=NLPServices.ollama) as server:agent = await server.create_agent(name="Healthcare Agent",description="Is empathetic and calming to the patient.",)

现在的几个问题

想使用deepseek，发现NLPServices里没有它：

from parlant.sdk import NLPServices
dir(NLPServices)

'anthropic',
'azure',
'cerebras',
'gemini',
'glm',
'litellm',
'ollama',
'openai',
'qwen',
'snowflake',
'together',
'vertex

使用ollama，发现它要用自己的token模型，不太想整ollama了。

主要是ollama启动后，整个机器负载有点大，而且只能启动8G或更小的模型，效果跟g4f比有点弱。

'anthropic', self._estimating_tokenizer = AnthropicEstimatingTokenizer(self._client, model_name)
'azure',self._tokenizer = AzureEstimatingTokenizer(model_name=self.model_name)
'cerebras', self.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")
'gemini',
'glm',self.encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06") base_url="https://open.bigmodel.cn/api/paas/v4", api_key=os.environ["GLM_API_KEY"]
'litellm',
'ollama', self.model_name = os.environ.get("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text")
'openai',

self._tokenizer = OpenAIEstimatingTokenizer(

model_name=tokenizer_model_name or self.model_name

'qwen',
'snowflake',
'together',
'vertex

要仔细看openai的这部分代码

class OpenAIEstimatingTokenizer(EstimatingTokenizer):def __init__(self, model_name: str) -> None:self.model_name = model_nameself.encoding = tiktoken.encoding_for_model(model_name)@overrideasync def estimate_token_count(self, prompt: str) -> int:tokens = self.encoding.encode(prompt)return len(tokens)class OpenAISchematicGenerator(SchematicGenerator[T]):supported_openai_params = ["temperature", "logit_bias", "max_tokens"]supported_hints = supported_openai_params + ["strict"]unsupported_params_by_model: dict[str, list[str]] = {"gpt-5": ["temperature"],}def __init__(self,model_name: str,logger: Logger,tokenizer_model_name: str | None = None,) -> None:self.model_name = model_nameself._logger = loggerself._client = AsyncClient(api_key=os.environ["OPENAI_API_KEY"])self._tokenizer = OpenAIEstimatingTokenizer(model_name=tokenizer_model_name or self.model_name)

测试

用这个调试

# 导入必要的库
import tiktoken
import time
import asyncio
import osos.environ["DEEPSEEK_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["DEEPSEEK_BASE_URL"] = "http://192.168.0.98:1337/"  # 自定义大模型API地址os.environ["OLLAMA_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["OLLAMA_BASE_URL"] = "http://192.168.0.98:1337/"  # 自定义大模型API地址
os.environ["OLLAMA_MODEL"] = "default"  # 自定义大模型API地址os.environ["SNOWFLAKE_AUTH_TOKEN"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["SNOWFLAKE_CORTEX_BASE_URL"] = "http://192.168.0.98:1337/"  # 
os.environ["SNOWFLAKE_CORTEX_CHAT_MODEL"] = "default"import asyncio
import parlant.sdk as p
from parlant.sdk import NLPServices# DEEPSEEK_API_KEY
async def main():async with p.Server(nlp_service=NLPServices.snowflake) as server:agent = await server.create_agent(name="Otto Carmen",description="You work at a car dealership",)asyncio.run(main())

ollama和deepseek的都不适合自己。

最终解决

最终决定，自己手写g4f的service文件，在紧张的调试之后（trae还抽风，一点忙都帮不上），终于能跑了。

手写g4f的service代码过程见：https://skywalk.blog.csdn.net/article/details/152253434?spm=1011.2415.3001.5331

测试文件test_server.py这样写，把环境变量用os库写了，这里面主要起作用的是类似G4F_API_KEY这样的G4F开头的环境变量：

# 导入必要的库
import tiktoken
import time
import asyncio
import osos.environ["DEEPSEEK_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["DEEPSEEK_BASE_URL"] = "http://192.168.0.98:1337/"  # 自定义大模型API地址os.environ["OLLAMA_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["OLLAMA_BASE_URL"] = "http://192.168.0.98:1337/"  # 自定义大模型API地址
os.environ["OLLAMA_MODEL"] = "default"  # 自定义大模型API地址os.environ["SNOWFLAKE_AUTH_TOKEN"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["SNOWFLAKE_CORTEX_BASE_URL"] = "http://192.168.0.98:1337/"  # 
os.environ["SNOWFLAKE_CORTEX_CHAT_MODEL"] = "default"os.environ["OPENAI_MODEL"] = "default" 
os.environ["OPENAI_MODEL"] = "default" 
os.environ["OPENAI_MODEL"] = "default" os.environ["G4F_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["G4F_BASE_URL"] = "http://192.168.0.98:1337/v1"  # 自定义大模型API地址
os.environ["G4F_MODEL"] = "default"  # 自定义大模型API地址os.environ["OPANAI_API_KEY"] = "your_custom_api_key"  # 自定义API密钥（可为任意值，仅作占位）
os.environ["OPENAI_BASE_URL"] = "http://192.168.0.98:1337/v1"  # 自定义大模型API地址
os.environ["OPENAI_MODEL"] = "default"  # 自定义大模型API地址
import asyncio
import parlant.sdk as p
from parlant.sdk import NLPServices# DEEPSEEK_API_KEY
async def main():async with p.Server(nlp_service=NLPServices.g4f) as server:agent = await server.create_agent(name="Otto Carmen",description="You work at a car dealership",# model="default")asyncio.run(main())

跑起来这个样：

调试

没找到snowflake这个模型库


PS E:\work\parlwork> python .\testdeepseek.py
Traceback (most recent call last):File "E:\work\parlwork\testdeepseek.py", line 30, in <module>asyncio.run(main())File "E:\py312\Lib\asyncio\runners.py", line 195, in runreturn runner.run(main)^^^^^^^^^^^^^^^^File "E:\py312\Lib\asyncio\runners.py", line 118, in runreturn self._loop.run_until_complete(task)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "E:\py312\Lib\asyncio\base_events.py", line 691, in run_until_completereturn future.result()^^^^^^^^^^^^^^^File "E:\work\parlwork\testdeepseek.py", line 24, in mainasync with p.Server(nlp_service=NLPServices.snowflake) as server:^^^^^^^^^^^^^^^^^^^^^

我的天，咋这个也没有了？

>>> from parlant.sdk import NLPServices
>>> dir(NLPServices)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'anthropic', 'azure', 'cerebras', 'gemini', 'litellm', 'ollama', 'openai', 'together', 'vertex']
>>>

原来没有安装本地的parlant代码，所以需要在github\parlant\src目录执行才可以，也就是需要再这个目录运行测试文件。

另外对自己手写添加的g4f的库，需要再sdk.py文件里写上相应的导入：

# 学习openai，加上g4f@staticmethoddef g4f(container: Container) -> NLPService:"""Creates an G4F NLPService instance using the provided container."""from parlant.adapters.nlp.g4f_service import G4FServiceif error := G4FService.verify_environment():raise SDKError(error)return G4FService(container[Logger])

查看全文

http://www.dtcms.com/a/427664.html