当前位置: 首页 > news >正文

LangChain4j(5):LangChain4j实现RAG之RAG简介

1 RAG简介

RAG:检索增强生成(Retrieval-augmented Generation)

对于基础大模型来说,他只具备通用信息,他的参数都是拿公网进行训练,并且有一定的时间延迟,无法得知一些具体业务数据和实时数据,这些数据往往在各种文件中(比如txt、word、htm1、数据库..)

虽然function-cal、SystemMessage可以用来解决一部分问题但是它只能少量,如果你要提供大量的业务领域信息,就需要给他外接一个知识库

比如:

(1)我问他退订要多少费用

(2)这些资料可能都由产品或者需求编写在了文档中: 

本服务条款适用于您对航空公司的体验。预订航班,即表示您同意这些条款。
1. 预订航班
- 通过我们的网站或移动应用程序预订。
- 预订时需要全额付款。
- 确保个人信息(姓名、ID 等)的准确性,因为更正可能会产生 25 的费用。
2. 更改预订
- 允许在航班起飞前 24 小时更改。
- 通过在线更改或联系我们的支持人员。
- 改签费:经济舱 50,豪华经济舱 30,商务舱免费。
3. 取消预订
- 最晚在航班起飞前 48 小时取消。
- 取消费用:经济舱 75 美元,豪华经济舱 50 美元,商务舱 25 美元。
- 退款将在 7 个工作日内处理。

所以需要现在需求信息存到向量数据库这个过程叫Embedding,涉及到文档读取、分词、向量化存入

(3)去向量数据库中查询“退订费用相关信息!

(4)将查询到的数据和对话信息再请求大模型

(5)此时会响应退订需要多少费用

2 相关概念

2.1 向量

向量通常用来做相似性搜索,比如语义的一维向量,可以表示词语或短语的语义相似性。例如,“你好”、“hello”和“见到你很高兴”可以通过一维向量来表示它们的语义接近程度。

然而,对于更复杂的对象,比如小狗,无法仅通过一个维度来进行相似性搜索。这时,我们需要提取多个特征,如颜色、大小、品种等,将每个特征表示为向量的一个维度,从而形成一个多维向量。例如,一只棕色的小型泰迪犬可以表示为一个多维向量 [棕色, 小型,泰迪犬]。

如果需要检索见过更加精准, 我们肯定还需要更多维度的向量,组成更多维度的空间,在多维向量空间中,相似性检索变得更加复杂。我们需要使用一些算法,如余弦相似度或欧几里得距离,来计算向量之间的相似性。向量数据库会帮我实现。

2.2 文本向量化

LangChain4j中来调用向量模型来对一句话进行向量化体验:

package org.example;

import dev.langchain4j.community.model.dashscope.QwenEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import org.springframework.beans.factory.annotation.Value;

public class VectorTest {

    public static void main(String[] args) {

        // 向量模型
        QwenEmbeddingModel embeddingModel= QwenEmbeddingModel.builder()
                .apiKey("你的api_key")
                .build();
        // 文本向量化
        Response<Embedding> embed = embeddingModel.embed("你好,我叫诸葛奸");
        System.out.println(embed.content().toString());
        System.out.println(embed.content().vector().length);

    }
}

代码执行结果如下:

Embedding { vector = [0.009697899, -0.0020659368, 0.06201601, -0.05168001, -0.01889037, -0.036643527, -0.033206616, -0.01348229, -0.025423026, -0.015137567, -0.0014475773, -0.0049310816, -0.025284033, 0.010070652, -0.01971169, 0.033181347, -0.016476952, 0.02782381, -0.059943754, -0.014493147, -1.6762044E-4, 0.029213736, 0.0017911104, 0.009862163, 0.0075751017, -0.00998852, 0.0043498394, -0.031033276, -0.024222635, 0.026509697, -0.0067032385, -0.02964335, 3.5019642E-5, 0.0019285235, -0.0012540931, 0.023085423, -0.01897882, 0.018220678, 0.009659992, 0.024273178, 0.011018329, -0.012585156, 0.014290975, -0.045336887, -0.0032236828, -0.013861362, 0.0012193449, 0.055950876, 0.022744259, 0.032600105, 0.0029836043, 0.05696173, 0.006677967, 0.02479124, 0.03871578, -0.014644775, -0.0072402554, -0.036062285, -0.03184196, -0.0034527048, 0.007827816, -0.011763835, -0.010032745, -0.02576419, -0.05317102, 0.015150203, -0.031033276, 7.008864E-4, -0.0043056146, -0.01260411, -0.0036801472, -0.026332796, 0.0038854773, 5.3978123E-4, 0.0066147884, -0.019471612, -0.0033389835, -0.037755467, 0.04248122, 0.03270119, 0.02767218, 0.0010716652, -0.014720589, 0.0016331641, 0.013709733, -0.025978997, -0.03308026, -0.025486205, 0.001415988, 0.03833671, 0.026155896, 0.007979444, 0.050188996, 0.024260541, 0.014177254, 0.0026329637, 0.0067790523, -0.035051428, 0.0066084703, -0.021657588, 0.039777182, 0.0025129246, 0.01756362, 0.020381382, 0.021581775, 0.0183344, 0.008522779, 0.020330839, 0.029137922, 0.018561842, 0.076572336, -0.013596012, 0.015238653, -0.004371952, 0.054535676, 0.0046436195, 0.002978866, 0.01918099, 0.021872396, -0.041723076, -0.02062146, 0.012503024, 0.041470364, -0.018928276, -0.022807436, -0.013444384, -0.02432372, 0.00790363, 0.0052343383, -0.03995408, 0.018321764, -0.038791597, -0.021101616, 0.019193627, 0.0170961, -0.01645168, 0.0368457, -0.011681704, 0.0010321786, 0.043441534, 0.012635699, 0.016173694, -0.025448298, -0.007082309, 0.009198789, -0.025410391, -0.01627478, -0.037831284, 0.051326208, 5.354377E-4, -0.015200746, -0.019370526, 0.031917777, -0.004852108, 0.012446163, 6.9022505E-4, -0.015883073, 3.628815E-4, -0.033964757, 0.016047338, -0.008712314, 0.023780385, -0.013848726, 0.0040086755, -0.029719165, -0.025625197, -0.010538173, 0.016641216, 0.040585864, -0.014290975, 0.013027405, 0.016426409, -0.019496884, -0.030780563, 0.029340092, -0.016919201, -0.033888943, -0.012730466, -0.031235447, 0.005092187, 0.057264987, -0.009798985, -0.021278517, 0.01915572, -0.010329684, -0.0071454877, 0.0045141033, 0.025423026, 0.0022286214, 0.010809841, -0.007202348, -0.008396422, 7.589317E-4, 0.013368569, 0.011498486, 0.022516815, 0.02447535, -0.0026171692, 0.013355934, 0.0101717375, -0.002511345, 0.05969104, -0.010001156, 0.005957732, -0.02949172, 0.02144278, 0.0069559524, 0.01492276, 0.026307525, -0.0077520013, 0.0028161814, -0.04288556, 0.014872218, -0.053019393, 0.021341696, 0.04283502, 0.018966185, 0.048344184, 0.023312865, -0.022605265, 0.043441534, -0.036213912, 0.023110693, 0.012964227, 0.046752088, 0.012351396, 7.0049154E-4, 0.046575185, 0.016565401, -0.01997704, -0.050669152, 0.015971523, -0.024513256, -0.0011245772, -0.008491189, -0.058831815, -0.019231534, 0.0033674138, -0.020987896, -0.033029716, 0.01292632, 0.041167106, 0.053069934, -0.0041760984, 0.011915464, -0.025435662, 0.022984337, -0.011100462, 0.015377645, -0.009520999, -0.020773089, -0.005370172, 0.032322116, -0.0064410474, -0.025195584, 0.026863497, -0.012029185, 0.0049974187, 0.014328883, 0.037250042, -0.012755738, -0.0074108373, -0.02602954, 0.02094999, 0.025170311, 0.003149448, 0.009767395, 0.01680548, -0.070861, 0.01454369, 0.031766146, 0.022377823, 0.012553567, -0.02059619, 0.010639259, -0.03671934, 0.020836268, -0.005657634, -0.040029895, -0.012389303, -0.013002134, -0.035556857, -0.009205107, 0.012553567, -0.0024955506, 0.044224948, 0.06863712, -0.009299874, -0.015529274, -0.0020517216, -0.040055167, -0.009261968, 0.047535498, 6.152006E-4, 0.01926944, 0.069445804, 0.0118901925, -0.0040086755, 0.011972325, 0.044452388, 0.039448652, -0.039676096, 0.022200923, 0.035885386, 0.03899377, -0.03211995, 0.00332003, 0.016615944, -6.306004E-4, -0.009748442, 0.015655631, 0.01574408, 0.019724326, 0.016085245, 0.021430146, 0.021227974, -0.032524288, 0.026080083, 0.016085245, -0.004747864, 0.009937977, -5.058228E-4, 9.571542E-4, 0.017146643, -0.019054634, 0.023603486, -0.01089829, -0.038437795, 0.019231534, 0.010670848, -0.012528296, 0.01125209, -0.030805834, -0.0129768625, -0.036188643, -0.0471817, -0.030325677, -0.017942693, 0.014379425, 0.039852995, 0.02487969, 0.025006048, -0.061864384, 0.01004538, 0.015541909, -0.0064757955, -0.041723076, 0.012218721, -0.011195229, 0.013987719, -0.0077835904, 0.049077056, -0.019294713, -0.0012793646, 0.0016837069, -0.05423242, 0.015642995, 0.00705072, -0.0016900247, 0.029618079, 0.010152784, 0.027571095, -0.010993058, -0.021341696, -0.0043119323, 0.011631161, -0.015036481, 0.03919594, 0.01654013, -0.021657588, -0.0334846, 0.02837978, 0.03815981, 0.0055218004, 0.0361381, 0.006384187, 0.012951591, -0.0013828193, 0.005458622, -0.0011743302, 0.027747994, -0.0045551695, -0.03110909, -0.05564762, 0.027520552, -0.0069496343, 0.04917814, -0.018486027, 0.047864027, -0.014455239, -0.062319268, -0.017285636, -0.005540754, -0.014531054, 0.031159634, 0.042430677, -0.005496529, 6.7561504E-4, 0.011447943, -0.044452388, 0.017058194, -0.036264457, 0.018751377, -0.031058548, -0.018675564, 0.033914216, -0.004608871, -0.014429968, -0.042961378, 0.020343475, -0.008674407, -0.009382007, 0.021366967, -0.012553567, 0.024867056, 0.0340153, 0.026105354, 0.029036837, 0.018473392, 0.013520198, 0.010114877, 0.025246127, -0.0043435213, 0.03545577, -0.008668089, -0.014657411, -0.036769886, -0.0045488514, 0.026812954, -0.008983982, 0.030300407, -9.3346223E-4, 0.034394372, 0.03601174, -0.002776695, 0.0018606067, -0.017866878, 0.050871324, -0.023843564, 0.01680548, -0.0024276336, -0.010089606, 0.016906565, -0.0229717, 0.02150596, 0.001930103, -0.025271397, -0.012370349, -0.008187933, -0.0024386898, -0.05423242, -0.0722762, -0.028480865, 0.09668837, -0.024728063, 0.021076346, 0.0033737316, 0.014063532, 0.014442604, 0.04649937, -0.0075498302, -0.030098235, -0.05443459, 0.018928276, 0.0073792483, 0.017930057, 0.018018506, -8.1421284E-4, -0.015137567, -0.015478731, 0.01392454, 0.005294358, 0.04483146, 0.0012698878, -0.042001065, 9.966408E-4, 0.009830574, -0.0121744955, -0.054838933, 1.5330657E-4, 0.0109741045, -0.04303719, 0.05660793, 0.015301831, 0.028632494, -9.168779E-4, 0.024513256, 0.0102854585, -0.008668089, -0.038210355, 0.013760276, 0.0024039417, 0.013873997, 0.0011893351, -0.0366688, -0.029365364, 0.007265527, 0.019433705, -0.015074389, -0.018170135, 0.018106956, 0.036491897, -0.012717831, -0.0038507292, -0.023666665, -0.023641393, 0.017171916, 0.021632317, 0.03287809, -0.028531408, -0.028607223, 0.019736962, -0.018751377, -0.01751308, 0.0100580165, -0.0069433167, 0.04124292, 0.031184904, 0.01089829, -0.023704572, -0.0049626706, -0.0027751154, -0.005995639, 0.036795154, 0.01216186, 0.012799963, -0.013987719, -0.019964404, 0.05443459, -0.018144863, -0.02015394, 0.032473747, 0.03434383, 0.04409859, -0.0334846, -0.032600105, -0.020406654, -0.028152337, -0.010373909, -0.011814378, 0.032018863, -0.011359493, -0.0114921685, -0.035986472, -0.0031873551, -0.004829996, 0.021341696, -0.017652072, -0.007524559, 0.026080083, -0.03924648, -0.059792127, -0.013633919, -0.02276953, -0.054990564, -0.020141304, 0.001889037, 0.029896064, -0.009868481, -0.020292932, 0.011738564, 0.018043779, -0.0051837955, -0.008503825, -0.015466096, -0.021948209, -0.020672003, -0.022984337, 0.003278964, 0.013823454, 0.046928987, -0.0024655408, 0.0015123353, -0.0015968365, -0.010696119, -0.027343653, -0.0012311909, 0.024917599, -0.04227905, 0.036795154, 0.02124061, 0.0014839049, 0.008687043, -0.0056481576, -0.031033276, -0.0026282254, -0.036314998, -0.02215038, 0.010822476, -0.014404696, 0.020482467, 0.03469763, -0.01518811, 0.0013385944, 0.019762233, -0.03037622, -0.03376259, -0.038260896, -0.032979175, 0.060651354, -0.005667111, 0.0334846, -0.03234739, 0.012799963, 0.009268285, 0.006302055, 0.0095968135, 0.025625197, 0.00772673, -0.0032473747, 0.0031889344, 0.03563267, 0.035986472, 0.014859582, 0.0016071029, -0.0010187533, 0.0035506315, 0.03073002, -0.015289196, 0.025031319, 0.013810819, 0.031740874, 0.012319806, 0.035708483, 0.00420137, -0.026231712, -0.005869282, -0.010708755, -0.022251466, 0.011744882, 0.037704926, 0.07556148, 0.040737495, 0.021910302, -0.0348998, -0.0043024556, -0.0021938733, 5.8005756E-4, -0.0063241674, 0.0377302, 0.019673783, 0.013355934, 0.0028730421, -0.062622525, 0.025625197, -0.0022681078, -0.023439221, -0.013684462, -0.010272823, 0.018372307, 6.108571E-4, 0.015314467, -0.01383609, -0.007341341, 0.002479756, -0.01536501, 0.030527849, 0.01756362, 0.003601174, 0.02097526, -0.021619681, -0.052867763, 0.013052677, 0.01536501, 5.370172E-4, 0.036618255, 0.03871578, 0.020141304, 0.015933616, 0.008421693, 0.031210177, -0.0061472673, -0.02303488, 0.0016094721, 0.009261968, -0.018903006, 0.04018152, 0.037755467, -0.031740874, 0.025663104, 0.005212226, -3.1628733E-4, -0.009742124, -0.010980423, 0.0114921685, 0.04250649, 0.009009253, 0.014973303, 0.0013622863, 0.00787204, 0.0063083726, -0.00857964, -0.072074026, -0.009931659, 0.0582253, 0.04025734, 6.633742E-4, 0.010620305, 0.040661678, -0.009552589, -0.008592275, -0.03646663, 0.018953549, 0.010430769, -0.014050897, 0.015642995, -0.040105708, -0.012041821, -0.0027609002, -0.0348998, -0.008895532, 0.015289196, 0.014290975, -0.0057144947, 0.040990207, -0.0023107533, -0.0026076923, -0.008383786, 0.0041350327, 0.010531855, 0.023186507, -0.054485135, -0.012781009, -9.058217E-4, 0.021455416, 0.003217365, -0.00196801, 0.016641216, -0.016552765, 0.0020232913, 0.010506583, -0.0040781717, 0.0018369147, 9.5636444E-4, 0.005209067, 0.015996795, 0.019004092, 0.017879514, -0.02097526, 0.052918307, -0.019484248, -0.012465117, 0.0056007737, -0.03866524, -0.009603132, -0.01624951, 0.0021686018, -0.05408079, -0.0065263384, -0.0059356196, -0.019421069, 6.2388764E-4, 0.010828794, -0.012300853, -0.04341626, -0.016691757, -0.025423026, 0.04301192, 0.00804894, -0.044250216, -0.016944472, -0.018776648, 0.021177432, 0.021202702, -0.047636583, 0.04124292, -0.027217295, 0.030654205, 0.0048805387, -0.030325677, 0.012212403, -0.027040396, 0.037553295, -0.027545823, -0.08299127, 0.01671703, -0.030249864, 0.023868835, 0.010083288, 0.032271575, -0.02276953, -0.04591813, 0.009470456, 0.025372483, -0.034975614, 0.044553474, -0.019231534, -5.417556E-4, 0.019673783, -4.0631669E-4, 0.005654475, -0.040762763, -0.0170961, 0.018473392, -0.035506316, -0.016868658, -0.005910348, -0.045766503, 0.005325947, -0.0060588177, 0.03778074, 0.012389303, 0.003857047, 0.00511114, 0.0019806458, 0.022832708, 0.023338135, -0.009451503, -0.020027583, 0.035657942, -0.010133831, -0.026711868, 0.018915642, 0.02784908, 0.01116364, 0.011416354, -0.040535323, -0.035759028, 0.0053954436, -0.009862163, 0.01915572, -0.02323705, -0.005540754, 0.0040813307, 0.009122974, -0.01257252, 0.023085423, -0.016957108, -0.04839473, -0.0121744955, 0.04938031, -0.012079728, -0.0011451102, 0.03075529, 0.023616122, -0.036441356, 0.008630183, 0.0014317826, 0.014733225, -0.028961021, 0.0050953454, 0.040004622, -0.019395798, -0.022782166, -0.032827545, -0.015263924, 0.044224948, 0.012812599, 0.0015518218, -2.3988084E-4, 0.023186507, -0.008023669, 0.039676096, -0.025094498, 0.014531054, -0.022377823, 0.018662928, 0.015314467, 0.005878759, 0.005998798, 0.00381914, -0.0085670035, -0.021986116, -0.06676703, -0.027798537, -0.020078126, 0.03760384, 0.013697098, 0.0012035504, -0.029744435, -0.040232066, -0.02144278, 0.011043601, -0.015049118, -0.028683037, -0.0060209106, -0.008554368, -0.003771756, 0.007909947, -0.025663104, 0.038816866, 0.0026929833, 0.0010187533, 0.011694339, -0.014834311, 0.03292863, 0.020760454, 0.010323366, 0.013911905, -0.008320608, 0.037022598, 0.015402917, -0.012048139, -0.038791597, -0.033383515, -0.008307972, 0.044553474, 0.0473586, -0.033130802, 0.033332974, 0.039398108, 0.0031684015, 0.0032915995, -0.015642995, 0.022782166, -0.0041697808, -0.008409058, 0.0050985045, -0.009274603, -0.0024829148, -0.0070696734, -0.022668444, -0.0029046312, 0.0033768904, -0.04844527, 0.032397933, 0.027621638, 0.004242436, 0.010064335, 0.014809039, 0.004820519, -0.004615189, 0.009761077, 0.026610782, 0.010304413, -0.027090939, -0.0038412525, 0.007954173, 0.0045520104, 0.041975792, 0.0026455994, -0.022099838, -0.03184196, -7.1233755E-4, 9.777859E-5, 0.015933616, -0.057062816, -0.0036769884, -0.00990007, 0.016110515, 0.023742478, 0.03975191, 0.015238653, -0.04586759, -0.002544514, -0.0047257515, 0.03249902, -0.012458799, 0.014897489, -0.021177432, -2.4205261E-4, -0.028657766, 0.0048457906, -0.0231486, -0.007391884, 0.043896418, -0.009615767, -0.020659368, -0.012730466, -0.026433881, 0.012559885, -0.020255025, 0.0055597075, 0.0072844806, -0.018764013, -0.067019746, -0.018675564, 7.54983E-4, 0.0071202163, -0.043997504, -0.0070380843, -0.023451857, -0.015175475, 0.010961469, 0.0060998835, 0.023995193, -0.03540523, 0.0077014584, 0.034823988, -0.016072609, 0.00796049, -0.0188651, 0.0044382894, 0.041141834, -0.009021889, -0.059893213, 0.00490581, 0.025701012, 0.029340092, 0.011694339, -0.010184374, -0.023401314, -0.022605265, -0.036567714, 0.008503825, 0.0052659274, -0.04642356, 0.0043119323, -0.018574478, 0.024702791, 0.011656432, 0.0021654428, 0.038210355, 0.0032410568, -0.008055258, -0.028607223, 0.022428365, -0.023666665, 0.045261074, 0.0081058005, -0.0039581326, 0.0016347435, 0.009344099, 0.02041929, 0.012010232, 0.013393841, -0.028885208, -0.02822815, 0.040105708, 0.015150203, 0.032650646, -0.025043955, 0.013469655, 0.036416084, 0.009110339, -0.011789107, -0.004785771, 0.004779453, 0.03580957, -0.020229753, -0.00478893, -0.026838224, -0.10047908, 0.04025734, 0.017108737, 0.017424628, 0.07611745, 0.0069875414, -0.016426409, -0.010348638, -0.03709841, -0.06686812, 0.020785725, 0.1360612, -0.07490442, 7.340551E-4, 0.0051490474, 0.0068548666, 0.022125108, -0.047636583, 0.014354154, -0.02094999, 0.02268108, -0.027520552, -0.0033895262, 0.048546355, -0.007562466, -0.039777182, -0.038589425, 0.0047604996, -0.022895886, 0.03922121, -0.053474277, 0.013633919, 0.010847747, -0.012932638, -0.009021889, 0.013520198, 0.007202348, -0.014493147, 3.557739E-4, 0.04227905, 0.00664006, 0.0072465735, -0.03090692, 0.016818115, -0.027596366, -0.006538974, 0.016552765, -0.0035727439, 0.00787204, -0.009122974, -0.04667627, 0.006200969, -0.016148424, -6.412617E-4, 0.016236873, -0.021392237, -0.02062146, 0.006210446, 0.03431856, -0.017841607, -0.01980014, -0.0103423195, 0.020570917, 0.004744705, -0.0051869545, -0.031690333, 0.0013275382, -0.012585156, 0.011062554, 0.03709841, -0.026332796, -0.0017073988, 0.010582398, 0.018688198, 0.020697275, -0.001930103, 0.022125108, 0.0038823185, 0.07035557, 0.0074171554, 0.045387432, 0.01926944, 0.007846769, 0.01022228, -0.029794978, 0.0059293015, 0.01700765, 0.026307525, 0.0356074, -0.02458907, -0.016527494, 0.047611315, 0.0039676093, 0.0152133815, 0.0085101435, -0.05205908, -0.034950342, -0.0046752086, 0.023704572, -0.023818292, -0.029441178, -0.008585958, 0.02041929, 0.016401136, -0.01680548, -0.020166576, 0.0033768904, 0.022630537, 0.009483092, 0.0039612916, -0.0022602107, 0.045159988, 0.020217119, -0.003087849, 0.007094945, 0.06236981, 0.0072402554, 0.029188465, -0.04245595, 0.042430677, -0.04953194, -0.039297022, 0.017816335, -0.0091419285, -0.014265704, -0.014038262, 0.061156783, -0.020634096, -0.0015115455, -0.019762233, -0.010885655, 0.027975438, 0.008396422, 0.014253069, -0.025031319, -0.022744259, -0.01627478, -0.009678945, -0.011523757, 0.019812776, -0.018561842, 0.03745221, -0.034975614, -0.031336535, -0.018511299, 0.018296491, 0.047965113, 0.04030788, -5.7610887E-4, -0.009040843, -0.028101794, 0.014556325, -0.0012848927, -0.0077456837, 0.027571095, 0.026231712, -0.027166752, -0.026610782, 0.019041998, -0.0011988119, 0.024487985, 0.02276953, -0.023376044, -0.034823988, 0.020078126, 0.0013227997, -0.05392916, 0.028961021, -0.003395844, 0.01889037, -0.05534436, 0.018347034, 0.02197348, -0.016754936, 0.022428365, -0.0076066907, -0.01842285, 0.006008275, -0.027368924, 0.0564563, 0.030679477, -0.03780601, -0.022226194, -0.016641216, -0.025473569, -0.0030325677, -0.0049279225, 0.010506583, -0.0064821136, -0.020937353, -0.0058819177, -0.016224237, -0.019888591, 0.0065137027, -0.004087649, -6.0493406E-4, 0.005919825, -0.010632941, -0.03037622, 0.009028207, -0.009805302, 0.004823678, -0.0022744257, -0.008061576, -0.007132852, -0.0058756, 0.015554545, 0.01427834, -0.0022760052, 0.012774692, -0.035986472, 0.0034779762, -0.016388502, 0.010828794, -0.02276953, 0.0019790663, -0.0019522154, 0.0038380935, -0.024462713, -0.015251288, -0.022845343, 0.011201547, 0.02658551, 0.034925073, 0.013987719, 0.008844989, -0.010241234, 0.011719611, 0.003610651, 3.8746188E-5, -0.018498663, -0.008876579, 0.032524288, -0.040232066, 0.019471612, -0.0043498394, 0.022542087, -3.9210153E-4, 0.00966631, 0.026433881, 0.05620359, 1.5488603E-4, 0.012433528, -0.023755115, -0.01791742, 0.0034274333, -0.014644775, 0.028531408, 0.005458622, -0.01971169, -0.032145217, -0.031311262, -0.007353977, -0.042683393, -0.006981224, 0.010474995, -0.018486027, 0.0038728418, 0.0036422403, 0.025903182, 0.012743102, 0.004662573, -3.2912046E-4, -0.08607438, 0.043795332, 0.0036896241, -5.6544755E-4, -0.006791688, -0.008358515, 0.002042245, -0.018220678, 0.0032884406, 0.022352552, -0.0052153845, -0.021126889, -0.024968142, 0.028961021, 0.010247552, -0.006074612, 0.028127065, 0.017639436, 0.01454369, 0.023641393, 0.006677967, -0.0025271398, 0.010424452, 0.015592452, 0.0070380843, -0.0038033454, 0.0042140055, 0.015781987, -0.027419467, -0.039827723, -0.03217049, -0.04662573, 0.008270065, -0.012515659, 0.007954173, 0.041268192, 0.0067727347, -0.04958248, -0.00854805, 0.01624951, -0.007973126, -0.032018863, 0.06595835, -0.043087732, 0.04083858, 0.002479756, -0.051048223, 0.03795764, 0.0564563, -0.0053417417, -0.033737317, -0.021644952, -0.034823988, -0.036239184, 0.009451503, -0.013406477, 0.010449723, 0.007594055, -0.0034337512, -0.022857979, -0.02341395, -0.013027405, -0.029744435, -0.011934417, -0.013596012, 0.0126925595, 0.013242212, 0.012206085, -0.015440824, 8.244793E-4, -0.0018937754, -0.05317102, 0.0066084703, 0.019838048, 0.010538173, 0.018233314, -0.035481043, 0.0116627505, -0.05640576, -0.012357714, 0.032069404, 0.03287809, 0.024601705, 0.014632139, -0.043820605, 0.0058756, -0.027242567, 0.017323542, -0.015642995, -0.058326386, -0.026737139, 0.019648511, 0.011220501, 0.02487969, -0.026307525, 0.022314643, -0.03527887, -0.0013030565, 0.017437264, -0.049835198, -0.00857964, -0.011466897, 0.031892505, -0.01671703, 0.004441448, -0.008270065, -0.00948941, 0.0170961, -0.022744259, -0.004160304, -0.0043119323, 0.009590495, 0.01025387, 0.005344901, -0.02053301, 0.05185691, -0.01771525, -0.021215338, -0.007594055, -0.059236158, 0.012357714, -0.011182593, 0.017462537, 0.028329236, -0.0071897125, 0.01627478, -0.046271928, -0.013899269, 0.0048963334, 0.015984159, -0.04356789, 0.019319983, 0.019130448, -0.0023992034, 0.030174049, 0.0039928807, 0.007271845, -0.015137567, 0.024639612, -0.07081046, 0.04644883, 2.100685E-4, -0.0015707753, 0.01997704, -0.03704787, 0.014404696, -0.020217119, 0.0468279, 0.0055597075, -0.025347212, 0.015023846, -0.0010740344, 9.5004664E-4, -0.013760276, 0.026787682, 0.021935573, -0.023388678, 0.0014309929, -0.0038475704, -0.004697321, 0.011447943, 0.05569816, -0.0024039417, 0.013330663, 0.037907097, -0.044907276, 0.015554545, -0.006753781, -0.017222458, -0.03452073, 0.065048575, 0.005278563, -0.0025697853, 7.2299887E-4, -0.025877912, -0.035000887, 0.017020287, -0.011852286, 0.011757518, 0.036946785, -0.0042550717, -0.037881825, -0.024311084, 0.0028146019, -2.564652E-4, 0.02875885, 0.010651894, 0.032423202, 0.007884677, 0.02729311, 0.03219576, 0.04139455, 0.0117512, 0.018612385, -0.0226179, -0.013419112, -0.0028572474, -0.00943255, 0.0016284257, -0.0354305, -0.019231534, 0.030174049, -0.03707314, 0.01004538, 0.009483092, 0.0012296115, 0.058275845, -0.0076193265, -0.0024228953, -0.005262769, 0.022251466, 0.026307525, -0.007897312, 0.011062554, 0.0091672, 0.062268723, 0.018106956, 0.040585864, -0.013709733, 0.018486027, -0.04288556, 0.003569585, -0.029441178, 0.023072787, 0.006753781, -0.018486027, 0.010942515, -0.010961469, 0.004033947, 0.05190745, -0.046170846, 0.0068990914, 0.011991278, -8.8765786E-4, 0.009881116, -0.04124292, -0.018599749, 0.034268014, 0.027520552, 0.032777004, 0.037250042] }
1536

从结果可以知道“你好,我叫徐庶“这句话经过OpenAiEmbeddingModel向量化之后得到的一个长度为1536的float数组。注意,1536是固定的,不会随着句子长度而变化。

那么,我们通过这种向量模型得到一句话对应的向量有什么作用呢?非常有用,因为我们可以基于向量来判断两句话之间的相似度,举个例子:

查询跟秋田犬类似的狗, 在向量数据库中根据每个狗的特点进行多维向量, 你会发现秋田犬的向量数值和柴犬的向量数值最接近,就可以查到类似的狗。(当然我这里只是举例,让你对向量数据库有一个印象)

2.3 向量数据库

对于向量模型生成出来的向量,我们可以持久化到向量数据库,并且能利用向量数据库来计算两个向量之间的相似度,或者根据一个向量查找跟这个向量最相似的向量。在LangChain4j中,EmbeddingStore表示向量数据库,它有支持20+嵌入模型:

文档说明地址:Comparison table of all supported Embedding Stores | LangChain4j

其中有我们熟悉的几个数据库都可以用来存储向量,比如Elasticsearch、MongoDb、Neo4j、Pg、Redis。下面主要通过In-memory方式演示完整使用流程。

这是使用redis,其他的向量数据库不做介绍,注意:安装redis7.0+的版本(然后需要注意的是,普通的Redis是不支持向量存储和查询的,需要额外的redisearch模块,建议直接安装redis-stack扩展版。

Redis Stack是Redis的一个扩展版本,除了包含Redis的所有基本功能外,还增加了以下几个特性:

  1. 全文搜索:可以在数据中进行高效的全文搜索。
  2. 图形数据库功能:支持图形存储和查询。
  3. 时间序列数据:专门为时间序列数据提供支持。
  4. 快速数据分析:提供数据分析库。

注意:建议使用linux,如果一定要使用windows,请安装docker,在官网上有这样一句话:https://redis.io/docs/install/install-stack/windows/

To install Redis Stack on Windows, you will need to have Docker installed. When Docker is up and running, open Windows PowerShell and follow the instructions described in Run Redis Stack on Docker. Then, use Docker to connect with redis-cli as explained in that topic.

大概的意思:在windows上如果部署redis stack,需要先在Windows上安装docker,其实Windows上安装Docker环境也比较简单,大家只需要安装一个Docker Desktop即可。

继续在之前的代码上进行编写:

(1)加入依赖

    <!-- https://mvnrepository.com/artifact/dev.langchain4j/langchain4j-redis -->
    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-redis</artifactId>
      <version>1.0.0-alpha1</version>
    </dependency>

(2)编写代码

    @Test
    public void test02()  {
        // 向量模型
        QwenEmbeddingModel embeddingModel= QwenEmbeddingModel.builder().modelName("text-embedding-v3")
                .apiKey("你的apikey")
                .build();
        RedisEmbeddingStore embeddingstore = RedisEmbeddingStore.builder()
                .host("192.168.222.131")
                .port(6379)
                .password("123456")
                .dimension(1536)
                .build();
        // 生成向量
        Response<Embedding>embed =embeddingModel.embed("我是诸葛奸");
        // 存储向量
        embeddingstore.add(embed.content());
    }

dimension表示要存储的向量的维度,所以为1536,如果你不同的模型得到的向量,那么维度可能会不一样。

(3)查看结果

数据以向量的模式存入了redis

2.4 匹配向量

这里使用一个例子,飞机的退票流程:

代码如下:

    @Test
    public void test03()  {

        // ------------------------------------------embedding阶段


        InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();


        // 创建向量模型
        QwenEmbeddingModel  embeddingModel= QwenEmbeddingModel.builder()
                .apiKey("sk-4d1748fba8994a2e94cb0fbaf3d34f23")
                .build();

        // 利用向量模型进行向量化, 然后存储向量到向量数据库
        TextSegment segment1 = TextSegment.from("""
                预订航班:
                - 通过我们的网站或移动应用程序预订。
                - 预订时需要全额付款。
                - 确保个人信息(姓名、ID 等)的准确性,因为更正可能会产生 25 的费用。
                """);
        Embedding embedding1 = embeddingModel.embed(segment1).content();
        // 存入向量数据库
        embeddingStore.add(embedding1, segment1);

        // 利用向量模型进行向量化, 然后存储向量到向量数据库
        TextSegment segment2 = TextSegment.from("""
                取消预订:
                - 最晚在航班起飞前 48 小时取消。
                - 取消费用:经济舱 75 美元,豪华经济舱 50 美元,商务舱 25 美元。
                - 退款将在 7 个工作日内处理。
                """);
        Embedding embedding2 = embeddingModel.embed(segment2).content();
        embeddingStore.add(embedding2, segment2);


        // ----------------------数据检索阶段------------------------------

        // 需要查询的内容 向量化
        Embedding queryEmbedding = embeddingModel.embed("退票要多少钱").content();

        // 去向量数据库查询
        // 构建查询条件
        EmbeddingSearchRequest build = EmbeddingSearchRequest.builder()
                .queryEmbedding(queryEmbedding)
                //最多的返回结果条数,这里设置的1
                .maxResults(1)
                //设置检索的最低匹配度,这里设置的是0.7,如果匹配度低于0.7则不检索出来
                .minScore(0.7)
                .build();

        // 查询
        EmbeddingSearchResult<TextSegment> segmentEmbeddingSearchResult = embeddingStore.search(build);
        segmentEmbeddingSearchResult.matches().forEach(embeddingMatch -> {
            System.out.println(embeddingMatch.score()); // 0.8144288515898701
            System.out.println(embeddingMatch.embedded().text()); // I like football.
        });

    }

代码执行结果如下:

0.7321294156328824
取消预订:
- 最晚在航班起飞前 48 小时取消。
- 取消费用:经济舱 75 美元,豪华经济舱 50 美元,商务舱 25 美元。
- 退款将在 7 个工作日内处理。

由于我设置的是返回结果数量为1,所以他会返回匹配度分数最高的那段内容。如果返回结果数量为2.其实预订航班那段也会查出来, 但是他的匹配度分数更低也没有太大的意义。

相关文章:

  • leetcode_19. 删除链表的倒数第 N 个结点_java
  • 【补题】P10424 [蓝桥杯 2024 省 B] 好数(数位dp)
  • LabVIEW驱动开发的解决思路
  • 《微服务与事件驱动架构》读书分享
  • 宝塔面板数据库管理页面打不开,提示405 Not Allowed
  • 强化学习Double DQN模型详解
  • C基础笔记_指针专题
  • zk基础—5.Curator的使用与剖析一
  • 【FreeRTOS】二值信号量 是 消息队列 吗
  • FPGA_BD Block Design学习(一)
  • VBA高级应用30例应用4:打开工作薄时进行身份验证
  • 记录vscode连接不上wsl子系统下ubuntu18.04问题解决方法
  • LeetCode 3375 题解
  • LibreOffice 自动化操作目录
  • 常见算法模板总结
  • 高压安全新挑战:新能源汽车三电系统绝缘材料的漏电流与击穿特性研究
  • 如何判断家里的宽带是否有公网IPv4或公网IPv6
  • 14 GIS地类面积统计终极指南:3步速通「栅格VS矢量」双线操作
  • 洛谷 P11962:[GESP202503 六级] 树上漫步 ← dfs + 邻接表
  • 从静态绑定驱动模型到现代设备模型 —— 一次驱动架构的进化之旅
  • 英语网站的栏目名称/线上营销平台
  • 嘉兴模板建站代理/品牌广告
  • 建设工程合同 网站/seo优化软件购买
  • 无锡网站建设设计公司/深圳新闻最新事件
  • 做网站设计的公司排名/百度号码查询平台
  • 做网站用什么软件/上海高端seo公司