Redis之布隆过滤器
面试场景切入
针对于电话号码问题的痛点
布隆过滤器是什么?
由一个初值都为0的bit数组和多个哈希函数构成,用来快速判断集合中是否存在某个元素。
设计思想
本质就是判断具体数据是否存在于一个大的集合中。布隆过滤器是一种类似Set的数据结构,只是在统计结果在巨量数据下有点小瑕疵,不够完美。
布隆过滤器能干嘛?
高效的插入和查询,占用空间少,返回的结果是不确定性+不够完美
重点:一个元素如果判断结果为存在是,元素不一定存在,但判定结果为不存在是,则一定不存在。 布隆过滤器可以添加元素,但是不能删除元素,由于涉及hashcode判断依据,删除元素会导致误判率增加。
布隆过滤器的原理
布隆过滤器的实现原理和数据结构
添加key、查询key
hash冲突导致数据不精准
哈希函数
java中的hash冲突的案例
package com.atguigu.redis7.demo;
import java.util.HashSet;
import java.util.Set;
public class HashCodeConflictDemo
{
public static void main(String[] args)
{
Set<Integer> hashCodeSet = new HashSet<>();
for (int i = 0; i <200000; i++) {
int hashCode = new Object().hashCode();
if(hashCodeSet.contains(hashCode)) {
System.out.println("出现了重复的hashcode: "+hashCode+"\t 运行到"+i);
break;
}
hashCodeSet.add(hashCode);
}
System.out.println("Aa".hashCode());
System.out.println("BB".hashCode());
System.out.println("柳柴".hashCode());
System.out.println("柴柕".hashCode());
}
}
结果
布隆过滤器的使用步骤
-
初始化bitmap
-
添加占坑位
-
判断是否存在
布隆过滤器误判率,为什么不要删除??
布隆过滤器中的有是很可能有,无是肯定无,100%的无。使用时,最好不要让实际元素数量远大于初始化数量,一次给够避免扩容。当实际元素数量超过初始数量时,应该对布隆过滤器进行重建,重新分配一个size更大的过滤器,再将所有的历史元素批量add进行添加。
布隆过滤器的使用场景
解决缓存穿透问题,和redis的bitmap结合使用
黑名单校验,识别垃圾邮件
尝试手写布隆过滤器,结合bitmap自研一下体会思想
整体架构
步骤设计
springboot+redis+mybatis实现
数据库DDL脚本
CREATE TABLE `t_customer` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`cname` varchar(50) NOT NULL,
`age` int(10) NOT NULL,
`phone` varchar(20) NOT NULL,
`sex` tinyint(4) NOT NULL,
`birth` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_cname` (`cname`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4
pom文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.atguigu.redis7</groupId>
<artifactId>redis7_study</artifactId>
<version>1.0-SNAPSHOT</version>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.6.10</version>
<relativePath/>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<junit.version>4.12</junit.version>
<log4j.version>1.2.17</log4j.version>
<lombok.version>1.16.18</lombok.version>
</properties>
<dependencies>
<!--guava Google 开源的 Guava 中自带的布隆过滤器-->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>23.0</version>
</dependency>
<!--SpringBoot通用依赖模块-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--jedis-->
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>4.3.1</version>
</dependency>
<!--lettuce-->
<!--<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>6.2.1.RELEASE</version>
</dependency>-->
<!--SpringBoot与Redis整合依赖-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
</dependency>
<!--swagger2-->
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger2</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger-ui</artifactId>
<version>2.9.2</version>
</dependency>
<!--Mysql数据库驱动-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.47</version>
</dependency>
<!--SpringBoot集成druid连接池-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid-spring-boot-starter</artifactId>
<version>1.1.10</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.1.16</version>
</dependency>
<!--mybatis和springboot整合-->
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>1.3.0</version>
</dependency>
<!--hutool-->
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>5.2.3</version>
</dependency>
<!--persistence-->
<dependency>
<groupId>javax.persistence</groupId>
<artifactId>persistence-api</artifactId>
<version>1.0.2</version>
</dependency>
<!--通用Mapper-->
<dependency>
<groupId>tk.mybatis</groupId>
<artifactId>mapper</artifactId>
<version>4.1.5</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-autoconfigure</artifactId>
</dependency>
<!--通用基础配置junit/devtools/test/log4j/lombok/-->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
<optional>true</optional>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
application.properties配置文件
server.port=7777
spring.application.name=redis7_study
# ========================logging=====================
logging.level.root=info
logging.level.com.atguigu.redis7=info
logging.pattern.console=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger- %msg%n
logging.file.name=redis7_study.log
logging.pattern.file=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger- %msg%n
# ========================swagger=====================
spring.swagger2.enabled=true
#\u5728springboot2.6.X\u7ED3\u5408swagger2.9.X\u4F1A\u63D0\u793AdocumentationPluginsBootstrapper\u7A7A\u6307\u9488\u5F02\u5E38\uFF0C
#\u539F\u56E0\u662F\u5728springboot2.6.X\u4E2D\u5C06SpringMVC\u9ED8\u8BA4\u8DEF\u5F84\u5339\u914D\u7B56\u7565\u4ECEAntPathMatcher\u66F4\u6539\u4E3APathPatternParser\uFF0C
# \u5BFC\u81F4\u51FA\u9519\uFF0C\u89E3\u51B3\u529E\u6CD5\u662Fmatching-strategy\u5207\u6362\u56DE\u4E4B\u524Dant_path_matcher
spring.mvc.pathmatch.matching-strategy=ant_path_matcher
# ========================redis\u5355\u673A=====================
spring.redis.database=0
# \u4FEE\u6539\u4E3A\u81EA\u5DF1\u771F\u5B9EIP
spring.redis.host=172.18.8.229
spring.redis.port=6379
spring.redis.password=root
spring.redis.lettuce.pool.max-active=8
spring.redis.lettuce.pool.max-wait=-1ms
spring.redis.lettuce.pool.max-idle=8
spring.redis.lettuce.pool.min-idle=0
# ========================alibaba.druid=====================
spring.datasource.type=com.alibaba.druid.pool.DruidDataSource
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/data?useUnicode=true&characterEncoding=utf-8&useSSL=false
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.druid.test-while-idle=false
# ========================mybatis===================
mybatis.mapper-locations=classpath:mapper/*.xml
mybatis.type-aliases-package=com.atguigu.redis7.entities
t_customer表的entity\mapper 等ORM映射的操作以及代码可以使用代码生成器生成,主要的是service模块的业务代码
package com.atguigu.redis7.service;
import com.atguigu.redis7.entities.Customer;
import com.atguigu.redis7.mapper.CustomerMapper;
import com.atguigu.redis7.utils.CheckUtils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
@Service
@Slf4j
public class CustomerFilterSerivce {
public static final String CACHE_KEY_CUSTOMERS="customers:";
@Resource
private CustomerMapper customerMapper;
@Resource
private RedisTemplate redisTemplate;
@Resource
private CheckUtils checkUtils;
//写操作
public void addCustomer(Customer customer) {
int i = customerMapper.insertSelective(customer);
if(i>0){
//mysql插入成功,需要重新查询一次将数据捞出来,写进redis
Customer select = customerMapper.selectByPrimaryKey(customer.getId());
//redis 缓存的key
String key=CACHE_KEY_CUSTOMERS+customer.getId();
//捞出来的数据写入redis
redisTemplate.opsForValue().set(key,select);
}
}
//读操作
public Customer findCustomerById(int customerId) {
Customer customer=null;
//缓存redis的key名称
String key=CACHE_KEY_CUSTOMERS+customerId;
//先去redis中查询有没有
customer = (Customer) redisTemplate.opsForValue().get(key);
//判断 redis中直接返回,redis中没有再去查询数据库
if(customer==null){
//查询数据库
customer = customerMapper.selectByPrimaryKey(customerId);
// mysql中有,redis中无
if(customer!=null){
//回写redis,保持双写一致性
redisTemplate.opsForValue().set(key,customer);
}
}
return customer;
}
/**
* BloomFilter → redis → mysql
* 白名单:whitelistCustomer
* @param customerId
* @return
*/
public Customer findCustomerByIdWithBloomFilter (Integer customerId)
{
Customer customer = null;
//缓存key的名称
String key = CACHE_KEY_CUSTOMERS + customerId;
//布隆过滤器check,无是绝对无,有是可能有
//===============================================
if(!checkUtils.checkWithBloomFilter("whitelistCustomer",key))
{
log.info("白名单无此顾客,不可以访问: "+key);
return null;
}
//===============================================
//1 查询redis
customer = (Customer) redisTemplate.opsForValue().get(key);
//redis无,进一步查询mysql
if (customer == null) {
//2 从mysql查出来customer
customer = customerMapper.selectByPrimaryKey(customerId);
// mysql有,redis无
if (customer != null) {
//3 把mysql捞到的数据写入redis,方便下次查询能redis命中。
redisTemplate.opsForValue().set(key, customer);
}
}
return customer;
}
}
controller层代码
package com.atguigu.redis7.controller;
import com.atguigu.redis7.entities.Customer;
import com.atguigu.redis7.service.CustomerFilterSerivce;
import com.atguigu.redis7.service.CustomerSerivce;
import io.swagger.annotations.Api;
import io.swagger.annotations.ApiOperation;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;
import javax.annotation.Resource;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.Random;
import java.util.Date;
import java.util.concurrent.ExecutionException;
/**
* @auther zzyy
* @create 2022-07-23 13:55
*/
@Api(tags = "客户Customer接口+布隆过滤器讲解")
@RestController
@Slf4j
public class CustomerController
{
@Resource
private CustomerSerivce customerSerivce;
@Resource
private CustomerFilterSerivce customerFilterSerivce;
@ApiOperation("数据库初始化2条Customer记录插入")
@RequestMapping(value = "/customer/add",method = RequestMethod.POST)
public void addCustomer()
{
for (int i = 0; i < 2; i++) {
Customer customer = new Customer();
customer.setCname("customer"+i);
customer.setAge(new Random().nextInt(30)+1);
customer.setPhone("1381111XXXX");
customer.setSex((byte) new Random().nextInt(2));
customer.setBirth(Date.from(LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant()));
customerFilterSerivce.addCustomer(customer);
}
}
@ApiOperation("单个customer查询操作,按照customerid查询")
@RequestMapping(value = "/customer/{id}",method = RequestMethod.GET)
public Customer findCustomerById(@PathVariable Integer id)
{
return customerFilterSerivce.findCustomerById(id);
}
@ApiOperation("BloomFilter,按照customerid查询")
@RequestMapping(value = "/customerbloomfilter/{id}",method = RequestMethod.GET)
public Customer findCustomerByIdWithBloomFilter(@PathVariable int id)
{
return customerFilterSerivce.findCustomerByIdWithBloomFilter(id);
}
}
bloom过滤器的初始化代码
package com.atguigu.redis7.filter;
import ch.qos.logback.classic.util.StatusViaSLF4JLoggerFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.ibatis.transaction.managed.ManagedTransaction;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.Resource;
/**
*
* 布隆过滤器白名单初始化工具类,一开始就设置一部分数据为白名单所有,
* 白名单业务默认规定:布隆过滤器有,redis是极大可能有。
* 白名单:whitelistCustomer
*/
@Component
@Slf4j
public class BloomFilterInit
{
@Resource
private RedisTemplate redisTemplate;
@PostConstruct //初始化白名单数据,暂时注释省的后台打印
public void init(){
//1. 白名单客户10加载到布隆过滤器
String key="customers:10";
//2. 计算hash值,存在计算出来为负数的情况,取绝对值
int hashValue = Math.abs(key.hashCode());
//3.将计算的hash值于2的32次方取余数,得到相关的槽位坐标
long index = (long) (hashValue % Math.pow(2, 32));
log.info(key+"对应的index:{}",index);
//4.设置redis里面的bitmap对应类型白名单:whitelistCustomer的坑位,将该值设置为1
redisTemplate.opsForValue().setBit("whitelistCustomer",index,true);
}
}
bloom的校验代码逻辑
package com.atguigu.redis7.utils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Component;
import javax.annotation.Resource;
/**
* 布隆过滤器检查逻辑
*/
@Component
@Slf4j
public class CheckUtils
{
@Resource
private RedisTemplate redisTemplate;
public boolean checkWithBloomFilter(String checkItem,String key)
{
int hashValue = Math.abs(key.hashCode());
long index = (long)(hashValue % Math.pow(2,32));
boolean existOK = redisTemplate.opsForValue().getBit(checkItem,index);
log.info("--->key:"+key+" 对应坑位下标index: "+index+" 是否存在:"+existOK);
return existOK;
}
}
演示说明
数据库:
redis
后台日志
小总结
布隆过滤器的优缺点
优点
- 高效的插入和查询,内存占用bit空间少
缺点
-
不能删除元素
因为删除元素会导致误判率增加,因为hash冲突同一个位置可能存在的东西是多个共有的,你删除一个元素的同时可能也把其他的删除了 -
存在误判,不能精准过滤
有,是很可能有,无,是肯定无,100%无
布谷鸟过滤器
视频链接
布隆过滤器