【ElasticSearch实用篇-01】需求分析和数据制造
ElasticSearch实用篇整体栏目
内容 | 链接地址 |
---|---|
【一】ElasticSearch实用篇-需求分析和数据制造 | https://zhenghuisheng.blog.csdn.net/article/details/149178534 |
spring配置类实现原理
- 一,【ElasticSearch实用篇】需求分析和数据制造
- 1,需求分析
- 1.1,业务分析
- 1.2,数据分析
- 1.3 功能实现
- 2,数据制造代码实现
- 2.1,基础配置
- 2.2,线程池和线程配置
- 2.3,插入数据
- 3,kibana查看数据
如需转载,请附上链接:https://blog.csdn.net/zhenghuishengq/article/details/149178534
一,【ElasticSearch实用篇】需求分析和数据制造
为了更加的熟练elasticSearch,掌握其语法,底层原理,实际业务开发等,接下来的系列就是通过实操来对es进行深度学习
1,需求分析
1.1,业务分析
假设我需要做一个简单的相亲用户平台,然后会涉及到用户的筛选,比如用户的性别,年龄,身高,体重,学历,老家,工作城市等基本信息。接下来就以这个维度的需求,来深度的学习一下es,熟悉es语法的使用和原理。
那么根据上面的需求,es需要存储的字段就如下:索引名称为user,在mysql数据库类似于表名
- 在设置mapping映射属性时,如果是基本属性可以设置成基本属性即可,比如Long,Integer等;
- 如果需要精确查询,可以直接设置成keyword,那么就不会分词,那么就可以通过term精确查找;
- 如果设置成text属性,那么就会通过对应的分词器进行分词,那么后期得通过match查找
@Data
@Document(indexName = "user")
public class UserEO {@Id@Field(type = FieldType.Long)private Long id;@Field(type = FieldType.Keyword)private String nickName;/*** 性别:1=男,0=女*/@Field(type = FieldType.Integer)private Integer sex;/*** 出生-年*/@Field(type = FieldType.Integer)private Integer birthYear;/*** 出生-月*/@Field(type = FieldType.Integer)private Integer birthMonth;/*** 出生-日*/@Field(type = FieldType.Integer)private Integer birthDay;/*** 身高*/@Field(type = FieldType.Integer)private Integer height;/*** 体重*/@Field(type = FieldType.Integer)private Integer weight;/*** 学历: 3=大专以下,4=大专,5=大学本科,6=硕士,7=博士*/@Field(type = FieldType.Integer)private Integer eduLevel;/*** 居住-省份*/@Field(type = FieldType.Keyword)private String liveProvince;/*** 居住-城市*/@Field(type = FieldType.Keyword)private String liveCity;/*** 老家-省份*/@Field(type = FieldType.Keyword)private String regProvince;/*** 老家-城市*/@Field(type = FieldType.Keyword)private String regCity;/*** 是否删除,0=未删除,1=已删除*/@Field(type = FieldType.Integer)private Integer delFlag;
}
1.2,数据分析
也许在实际开发中,es中的数据是mysql数据库同步过去的,通过canal中间件同步过去的,canal伪装成mysql主节点的一个从节点,监听主节点的binlog日志,然后将数据同步过去,为了先将es的各个语法先练熟,那么先通过springboot项目手动的同步一些数据到es中,先同步10w条数据到es中
这里采用的是手动的制造用户数据,用户名和性别随机,年在1990-2010年区间,月日随机,身高在155-185,体重在100-160,学历在大专到硕士之间,省份是全国省份,城市是全部省会城市,当然数据可以动态调整。
1.3 功能实现
目标:快速实现数据查询,基于权重打分优先推出用户匹配度高的数据
- 可以动态的查询用户想要的数据,比如实现异性,同城和高学历等的优质异性,也能对身高体重的一些塞选;
- 优先推出优质男用户,比如同城异性优先推出,年龄相仿优先推出,学历相同或者更高优先推出等
- 快速响应用户想要推出的数据,实现快速响应
2,数据制造代码实现
这里采用线程池多批量插入的方式制造数据
2.1,基础配置
其详细代码如下,首先就是核心依赖
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactI<version>2.7.10</version> <!-- 请根据你的 Spring Boot 版本选择适当的版本 --
</dependency>
其次就是yml配置,设置域名和端口号到application.yml中统一管理
es:param:connect:hostname: xx.xx.xx.xxport: 9200
上面的配置对应的配置文件如下
@Component
@ConfigurationProperties(prefix = "es.param.connect")
@Data
public class EsConnectProperties {private String hostname;private Integer port;
}
配置对应的es连接文件,将es注入到spring容器中
@Configuration
@Slf4j
public class ElasticSearchConfig {public static final RequestOptions COMMON_OPTIONS;static {RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();COMMON_OPTIONS = builder.build();}private final EsConnectProperties esConnectProperties;public ElasticSearchConfig(EsConnectProperties esConnectProperties) {this.esConnectProperties = esConnectProperties;}@Beanpublic RestHighLevelClient esRestClient() {log.info("ES配置注入完成:,{},{}", esConnectProperties.getHostname(), esConnectProperties.getPort());//初始化配置RestClientBuilder builder = RestClient.builder(new HttpHost(esConnectProperties.getHostname(), esConnectProperties.getPort()));builder.setRequestConfigCallback(requestConfigBuilder ->requestConfigBuilder.setConnectTimeout(5000).setSocketTimeout(60000));builder.setHttpClientConfigCallback(httpClientBuilder ->httpClientBuilder.setMaxConnTotal(100).setMaxConnPerRoute(20));return new RestHighLevelClient(builder);}
}
2.2,线程池和线程配置
自定义线程池,采用cpu密集型的线程池,设置阻塞队列为有界链表
@Slf4j
public class ThreadPoolUtil {/*** io密集型:最大核心线程数为2N,可以给cpu更好的轮换,* 核心线程数不超过2N即可,可以适当留点空间* cpu密集型:最大核心线程数为N或者N+1,N可以充分利用cpu资源,N加1是为了防止缺页造成cpu空闲,* 核心线程数不超过N+1即可* 使用线程池的时机:1,单个任务处理时间比较短 2,需要处理的任务数量很大*/private static ThreadPoolExecutor pool = null;public static synchronized ThreadPoolExecutor getThreadPool() {if (pool == null) {//获取当前机器的cpuint cpuNum = Runtime.getRuntime().availableProcessors();log.info("当前机器的cpu的个数为:{}", cpuNum);int maximumPoolSize = cpuNum * 2;pool = new ThreadPoolExecutor(maximumPoolSize - 2,maximumPoolSize,5L, //5sTimeUnit.SECONDS,new LinkedBlockingQueue<>(50), //数组有界队列Executors.defaultThreadFactory(), //默认的线程工厂new ThreadPoolExecutor.AbortPolicy()); //直接抛异常,默认异常}return pool;}
}
定义线程任务,这里直接实现Runnable即可,里面包括每个属性的设置
@Slf4j
public class UserSaveTask implements Runnable {private final UserRepository userRepository;public UserSaveTask(UserRepository userRepository) {this.userRepository = userRepository;}/*** 批量插入10 0000条数据*/@Overridepublic void run() {List<UserEO> list = new ArrayList<>();//每次1000条log.info("开始插入数据...");for (int i = 0; i < 100; i++) {list.add(buildUserBaseInfo());}userRepository.saveAll(list);log.info("结束插入数据...");}/*** 构建用户基础信息* @return*/public UserEO buildUserBaseInfo() {UserEO user = new UserEO();//设置用户id,雪花算法user.setId(IdUtil.getSnowflakeNextId());user.setNickName("用户" + getRandomString(6));//设置性别user.setSex(ThreadLocalRandom.current().nextInt(0, 2));//构建年月日int year = randBetween(1990, 2010);int month = randBetween(1, 12);int day = getRandomDay(year, month);user.setBirthYear(year);user.setBirthMonth(month);user.setBirthDay(day);//设置身高体重user.setHeight(randBetween(150, 185));user.setWeight(randBetween(100, 160));user.setEduLevel(randBetween(3, 7)); // 大专以下 ~ 硕士//居住省份+城市String[] live = CityUtil.getRandomCity();user.setLiveProvince(live[0]);user.setLiveCity(live[1]);//老家省份+城市String[] reg = CityUtil.getRandomCity();user.setRegProvince(reg[0]);user.setRegCity(reg[1]);// 默认不被删除user.setDelFlag(0);return user;}private static String getRandomString(int length) {String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";StringBuilder sb = new StringBuilder(length);ThreadLocalRandom r = ThreadLocalRandom.current();for (int i = 0; i < length; i++) {sb.append(chars.charAt(r.nextInt(chars.length())));}return sb.toString();}private static int randBetween(int start, int end) {return ThreadLocalRandom.current().nextInt(start, end + 1);}private static int getRandomDay(int year, int month) {// 获取当月最大天数return randBetween(1, LocalDate.of(year, month, 1).lengthOfMonth());}
}
城市工具类如下,只需要对应的省份和省会城市即可,这里还包含了四个直辖市
public class CityUtil {private static final List<String[]> PROVINCE_AND_CITY_LIST = Arrays.asList(new String[]{"北京市", "北京市"},new String[]{"天津市", "天津市"},new String[]{"上海市", "上海市"},new String[]{"重庆市", "重庆市"},new String[]{"河北省", "石家庄市"},new String[]{"山西省", "太原市"},new String[]{"辽宁省", "沈阳市"},new String[]{"吉林省", "长春市"},new String[]{"黑龙江省", "哈尔滨市"},new String[]{"江苏省", "南京市"},new String[]{"浙江省", "杭州市"},new String[]{"安徽省", "合肥市"},new String[]{"福建省", "福州市"},new String[]{"江西省", "南昌市"},new String[]{"山东省", "济南市"},new String[]{"河南省", "郑州市"},new String[]{"湖北省", "武汉市"},new String[]{"湖南省", "长沙市"},new String[]{"广东省", "广州市"},new String[]{"海南省", "海口市"},new String[]{"四川省", "成都市"},new String[]{"贵州省", "贵阳市"},new String[]{"云南省", "昆明市"},new String[]{"陕西省", "西安市"},new String[]{"甘肃省", "兰州市"},new String[]{"青海省", "西宁市"},new String[]{"台湾省", "台北市"},new String[]{"内蒙古自治区", "呼和浩特市"},new String[]{"广西壮族自治区", "南宁市"},new String[]{"西藏自治区", "拉萨市"},new String[]{"宁夏回族自治区", "银川市"},new String[]{"新疆维吾尔自治区", "乌鲁木齐市"},new String[]{"香港特别行政区", "香港"},new String[]{"澳门特别行政区", "澳门"});public static String[] getRandomCity() {return PROVINCE_AND_CITY_LIST.get(ThreadLocalRandom.current().nextInt(PROVINCE_AND_CITY_LIST.size()));}
}
2.3,插入数据
配置UserRepository接口,需要加上 @Repository 注解
@Repository
public interface UserRepository extends ElasticsearchRepository<UserEO, Long> {
}
随后定义一个 UserMatchService 接口,里面先定义一个插入方法
public interface UserMatchService {AjaxResult matchSave();
}
随后实现上面的这个接口以及方法,循环向线程池中提交1000个任务
/**** @Author zhenghuisheng* @Date:2025/6/23 15:50*/
@Service
public class UserMatchServiceImpl implements UserMatchService {@Resourceprivate UserRepository userRepository;//获取线程池ThreadPoolExecutor threadPool = ThreadPoolUtil.getThreadPool();/*** 线程池批量生成100000个用户* @return*/@Overridepublic AjaxResult matchSave() {for (int i = 0; i < 1000; i++) {//提交任务threadPool.submit(new UserSaveTask(userRepository));}return AjaxResult.success("数据生成完毕");}}
最后配置Controller即可
@RestController
@RequestMapping("/es/user")
public class UserMatchController {@Resourceprivate UserMatchService userMatchService;@GetMapping("/matchSave")public AjaxResult matchSave() {return userMatchService.matchSave();}}
3,kibana查看数据
项目启动执行完上面的接口之后,可以查看一下这个索引对应的数据,其总数据如下
get /user/_count
看一下其mapping映射,就是每个字段的数据类型映射
GET /user/_mapping
{"user" : {"mappings" : {"properties" : {"_class" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"birthDay" : {"type" : "long"},"birthMonth" : {"type" : "long"},"birthYear" : {"type" : "long"},"delFlag" : {"type" : "long"},"eduLevel" : {"type" : "long"},"height" : {"type" : "long"},"id" : {"type" : "long"},"liveCity" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"liveProvince" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"nickName" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"regCity" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"regProvince" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"sex" : {"type" : "long"},"weight" : {"type" : "long"}}}}
}
查看数据,并且分页
GET /user/_search?from=1&size=5
{"query": {"match_all": {}}
}
其返回数据如下
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : 1.0,"hits" : [{"_index" : "user","_type" : "_doc","_id" : "1937084380165083277","_score" : 1.0,"_source" : {"_class" : "com.zhs.elasticsearch.match.eo.UserEO","id" : 1937084380165083277,"nickName" : "用户Rxo729","sex" : 0,"birthYear" : 1998,"birthMonth" : 6,"birthDay" : 16,"height" : 179,"weight" : 153,"eduLevel" : 4,"liveProvince" : "河北省","liveCity" : "石家庄市","regProvince" : "辽宁省","regCity" : "沈阳市","delFlag" : 0}},{"_index" : "user","_type" : "_doc","_id" : "1937084380165083281","_score" : 1.0,"_source" : {"_class" : "com.zhs.elasticsearch.match.eo.UserEO","id" : 1937084380165083281,"nickName" : "用户pLNM3B","sex" : 0,"birthYear" : 2007,"birthMonth" : 7,"birthDay" : 14,"height" : 172,"weight" : 131,"eduLevel" : 7,"liveProvince" : "西藏自治区","liveCity" : "拉萨市","regProvince" : "内蒙古自治区","regCity" : "呼和浩特市","delFlag" : 0}},{"_index" : "user","_type" : "_doc","_id" : "1937084380165083286","_score" : 1.0,"_source" : {"_class" : "com.zhs.elasticsearch.match.eo.UserEO","id" : 1937084380165083286,"nickName" : "用户yupBE5","sex" : 0,"birthYear" : 1999,"birthMonth" : 10,"birthDay" : 29,"height" : 166,"weight" : 140,"eduLevel" : 7,"liveProvince" : "贵州省","liveCity" : "贵阳市","regProvince" : "澳门特别行政区","regCity" : "澳门","delFlag" : 0}},{"_index" : "user","_type" : "_doc","_id" : "1937084380165083290","_score" : 1.0,"_source" : {"_class" : "com.zhs.elasticsearch.match.eo.UserEO","id" : 1937084380165083290,"nickName" : "用户fTGRMJ","sex" : 1,"birthYear" : 2003,"birthMonth" : 7,"birthDay" : 9,"height" : 182,"weight" : 128,"eduLevel" : 6,"liveProvince" : "海南省","liveCity" : "海口市","regProvince" : "辽宁省","regCity" : "沈阳市","delFlag" : 0}},{"_index" : "user","_type" : "_doc","_id" : "1937084380165083295","_score" : 1.0,"_source" : {"_class" : "com.zhs.elasticsearch.match.eo.UserEO","id" : 1937084380165083295,"nickName" : "用户v6ZwfS","sex" : 0,"birthYear" : 1995,"birthMonth" : 12,"birthDay" : 11,"height" : 173,"weight" : 140,"eduLevel" : 5,"liveProvince" : "湖南省","liveCity" : "长沙市","regProvince" : "江苏省","regCity" : "南京市","delFlag" : 0}}]}
}
那么此时数据制造成功
详细代码可以直接gitee获取:https://gitee.com/zhenghuisheng/elasticsearch_study