当前位置: 首页 > news >正文

对接gemini-2.5-flash-image-preview教程

对接gemini-2.5-flash-image-preview教程

一、前置准备

1. 明确模型要求

本次对接的gemini-2.5-flash-image-preview模型,继承Gemini系列多模态特性,支持文本生成图片、文本结合图片编辑等功能。需注意该模型不支持仅输出图片,必须配置["TEXT", "IMAGE"]双模态输出;所有生成图片均含SynthID水印,当前支持英语、西班牙语(墨西哥)、日语、简体中文、印地语等语言提示词,暂不支持音频或视频输入。

2. 环境配置

  • 安装基础网络请求工具:如Python的requests库、JavaScript的axios库,用于向指定BaseURL发送API请求。
  • 准备Base64编码工具:若涉及图片编辑,需将本地图片转为Base64格式传入请求参数。
  • 获取Gemini API密钥(GEMINI_API_KEY):用于身份验证,需在请求头或参数中携带(若BaseURL接口已集成密钥管理,可省略此步骤)。

二、核心功能对接步骤

1. 文本生成图片(Text-to-Image)

通过文本提示词生成对应图片,以下为不同编程语言实现示例,均基于指定BaseURL(http://api.aaigc.top)开发。

Python实现
import requests
import base64
from io import BytesIO
from PIL import Image# 配置基础信息
BASE_URL = "http://api.aaigc.top"
ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent"  # 接口端点(参考Gemini API规范,以实际为准)
API_KEY = "你的GEMINI_API_KEY"  # 接口集成密钥时可删除# 文本提示词
prompt = "3D渲染风格:戴礼帽、长翅膀的小猪,飞越满是绿色植物的未来科幻城市,城市高楼林立且带霓虹灯光"# 构造请求参数
payload = {"contents": [{"parts": [{"text": prompt}]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}  # 必须双模态输出
}# 构造请求头
headers = {"Content-Type": "application/json","Authorization": f"Bearer {API_KEY}"  # 接口集成密钥时可删除
}# 发送请求并处理响应
response = requests.post(f"{BASE_URL}{ENDPOINT}", json=payload, headers=headers)
response.raise_for_status()
data = response.json()# 解析文本与图片
for part in data["candidates"][0]["content"]["parts"]:if "text" in part and part["text"]:print("模型文本回复:", part["text"])elif "inlineData" in part and part["inlineData"]["data"]:image_data = base64.b64decode(part["inlineData"]["data"])image = Image.open(BytesIO(image_data))image.save("gemini-text-to-image.png")image.show()print("图片已保存:gemini-text-to-image.png")
JavaScript实现(Node.js环境)
const axios = require('axios');
const fs = require('fs');
const path = require('path');// 配置基础信息
const BASE_URL = "http://api.aaigc.top";
const ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent";
const API_KEY = "你的GEMINI_API_KEY";// 文本提示词
const prompt = "3D渲染风格:戴礼帽、长翅膀的小猪,飞越满是绿色植物的未来科幻城市,城市高楼林立且带霓虹灯光";// 构造请求参数
const payload = {"contents": [{"parts": [{"text": prompt}]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
};// 构造请求头
const headers = {"Content-Type": "application/json","Authorization": `Bearer ${API_KEY}`
};// 发送请求并处理响应
async function generateImageFromText() {try {const response = await axios.post(`${BASE_URL}${ENDPOINT}`, payload, { headers });const data = response.data;for (const part of data.candidates[0].content.parts) {if (part.text) {console.log("模型文本回复:", part.text);} else if (part.inlineData && part.inlineData.data) {const imageBuffer = Buffer.from(part.inlineData.data, 'base64');const savePath = path.join(__dirname, "gemini-text-to-image.png");fs.writeFileSync(savePath, imageBuffer);console.log(`图片已保存:${savePath}`);}}} catch (error) {console.error("请求失败:", error.response?.data || error.message);}
}generateImageFromText();

2. 图片编辑(Image + Text-to-Image)

传入Base64格式原始图片与编辑提示词,模型将按要求修改图片,关键步骤如下:

前置操作:图片转Base64(Python示例)
import base64def image_to_base64(image_path):with open(image_path, "rb") as image_file:return base64.b64encode(image_file.read()).decode("utf-8")# 转换本地图片
original_image_path = "original-image.png"
image_base64 = image_to_base64(original_image_path)
Python编辑图片示例
import requests
import base64
from io import BytesIO
from PIL import Image# 配置基础信息(同文本生成图片)
BASE_URL = "http://api.aaigc.top"
ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent"
API_KEY = "你的GEMINI_API_KEY"# 原始图片Base64编码
original_image_path = "original-image.png"
image_base64 = image_to_base64(original_image_path)# 编辑提示词
edit_prompt = "在人物身旁添加一只白色羊驼,羊驼面向人物,整体风格与原图保持一致(如原图写实,羊驼也需写实)"# 构造请求参数
payload = {"contents": [{"parts": [{"text": edit_prompt},{"inlineData": {"mimeType": "image/png", "data": image_base64}}  # 匹配图片实际格式]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}# 构造请求头(同文本生成图片)
headers = {"Content-Type": "application/json","Authorization": f"Bearer {API_KEY}"
}# 发送请求并解析响应
response = requests.post(f"{BASE_URL}{ENDPOINT}", json=payload, headers=headers)
response.raise_for_status()
data = response.json()# 保存编辑后图片
for part in data["candidates"][0]["content"]["parts"]:if "inlineData" in part and part["inlineData"]["data"]:image_data = base64.b64decode(part["inlineData"]["data"])edited_image = Image.open(BytesIO(image_data))edited_image.save("gemini-edited-image.png")edited_image.show()print("编辑后图片已保存:gemini-edited-image.png")elif "text" in part and part["text"]:print("模型编辑说明:", part["text"])

三、常见问题与注意事项

  1. 仅输出文本:需在提示词中明确包含“生成图片”“更新图片”等指令,如将“添加羊驼”改为“生成添加羊驼后的图片”。
  2. 生成中断:重试请求或简化提示词,避免单次提示包含过多元素。
  3. Base64编码错误:确保编码完整(无多余空格/换行),且mimeType与图片格式一致(JPG对应image/jpeg,PNG对应image/png)。
  4. 地区可用性:若提示“服务暂不可用”,需确认当前地区是否开放该模型功能,可参考BaseURL接口的地区支持说明。

四、案例

1.以下为一张卡哇伊风格的快乐小熊贴纸。背景为设定的白色,整体采用清晰轮廓和大胆配色,整个设计十分生动和吸引人

Create a [image type] for [brand/concept] with the text “[text to render]” in a [font style]. The design should be [style description], with a [color scheme].1from google import genai2from google.genai import types3from PIL import Image4from io import BytesIO56client = genai.Client()78# Generate an image from a text prompt9response = client.models.generate_content(
10    model="gemini-2.5-flash-image-preview",
11    contents="Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white.",
12)
13
14image_parts = [
15    part.inline_data.data
16    for part in response.candidates[0].content.parts
17    if part.inline_data
18]
19
20if image_parts:
21    image = Image.open(BytesIO(image_parts[0]))
22    image.save('logo_example.png')
23    image.show()

2.以下为官方生成的一位老年陶瓷艺术家的特写柔和的金色阳光透过窗户洒进画面,照亮了陶土的细腻质感和老人脸上的皱纹。

A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be white.1from google import genai2from google.genai import types3from PIL import Image4from io import BytesIO56client = genai.Client()78# Generate an image from a text prompt9response = client.models.generate_content(
10    model="gemini-2.5-flash-image-preview",
11    contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful.",
12)
13
14image_parts = [
15    part.inline_data.data
16    for part in response.candidates[0].content.parts
17    if part.inline_data
18]
19
20if image_parts:
21    image = Image.open(BytesIO(image_parts[0]))
22    image.save('photorealistic_example.png')
23    image.show()

3.猫猫在双子座星空下的豪华餐厅里吃香蕉。哇哦,猫猫桌子上还摆着刀叉和酒杯,餐厅里其他桌子上也有客人,真是充满了细节。

 1    from google import genai2    from google.genai import types3    from PIL import Image4    from io import BytesIO56    client = genai.Client()78    # Generate an image from a text prompt9    response = client.models.generate_content(
10    model="gemini-2.5-flash-image-preview",
11    contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful.",
12    )
13
14    image_parts = [
15    part.inline_data.data
16    for part in response.candidates[0].content.parts
17    if part.inline_data
18    ]
19
20    if image_parts:
21    image = Image.open(BytesIO(image_parts[0]))
22    image.save('photorealistic_example.png')

文章转载自:

http://tUT08kC6.xdhcr.cn
http://ZzrNHapG.xdhcr.cn
http://854lMsIG.xdhcr.cn
http://AYcsZl8S.xdhcr.cn
http://wdaCs8Jr.xdhcr.cn
http://GAUNMp7X.xdhcr.cn
http://nqN4V59B.xdhcr.cn
http://elx1WGbG.xdhcr.cn
http://W3fabxXL.xdhcr.cn
http://Y5fwbX6h.xdhcr.cn
http://ckceUwir.xdhcr.cn
http://E9sGpgA9.xdhcr.cn
http://T58Cpevm.xdhcr.cn
http://nDNzJ2Z5.xdhcr.cn
http://PpuMCaqj.xdhcr.cn
http://i8sU1hAa.xdhcr.cn
http://fbk2INLM.xdhcr.cn
http://Bdxme8Ps.xdhcr.cn
http://CjWfkj5U.xdhcr.cn
http://EBKdlapg.xdhcr.cn
http://BwBiMdzc.xdhcr.cn
http://Br56TJNl.xdhcr.cn
http://RFMJ5twk.xdhcr.cn
http://GfwzPRi1.xdhcr.cn
http://qNr4ymwS.xdhcr.cn
http://THl0T7W9.xdhcr.cn
http://lccY8lWe.xdhcr.cn
http://ZYDCfQIt.xdhcr.cn
http://5qAwmlQr.xdhcr.cn
http://tFGqVWiU.xdhcr.cn
http://www.dtcms.com/a/370333.html

相关文章:

  • ModelScope概述与实战
  • 【Bluedroid】A2dp Source播放流程源码分析(7):蓝牙音频流启动流程深度解析(btif_av_stream_start)
  • Docker 本地开发环境搭建(MySQL5.7 + Redis7 + Nginx + 达梦8)- Windows11 版 2.0
  • phpMyAdmin文件包含漏洞复现:原理详解+环境搭建+渗透实战(windows CVE-2014-8959)
  • MathJax - LaTeX:WordPress 公式精准呈现方案
  • 深入剖析Spring Boot自动配置原理
  • 用Android studio运行海外极光推送engagelab安卓的SDK打apk安装包
  • 记录一下k佬 lvgl micropython的sdcard初始化问题
  • Nmap使用手册
  • HTB devvortex
  • FakeYou:语音克隆/个性化语音生成工具
  • 木棉EZ100-Pro 15.5G矿机参数解析:Etchash算法与高效能耗
  • OpenAI新论文:Why Language Models Hallucinate
  • Compose笔记(四十九)--SwipeToDismiss
  • Coze源码分析-资源库-删除插件-前端源码-核心组件实现
  • 主流的开源协议(MIT,Apache,GPL v2/v3)
  • 计算机原理(二)
  • 算法题(200):最大子段和(动态规划)
  • vue3图标终极方案【npm包推荐】vue3-icon-sui(含源码详解)
  • 当小智 AI 遇上数字人,我用 WebRTC 打造实时音视频应用
  • 后端(JDBC)学习笔记(CLASS 1):基础篇(一)
  • 3分钟快速入门WebSocket
  • ElasticSearch 基础内容深度解析
  • 行为式验证码技术解析:滑块拼图、语序选词与智能无感知
  • CAN总线学习
  • 02.继承MonoBehaviour的单例模式基类
  • 【CSS,DaisyUI】自定义选取内容的颜色主题
  • C. 引入位置编码是不是3D的
  • Docker学习笔记-网络类型
  • 进程状态深度解析:从操作系统原理到Linux实践