当前位置：首页 > wzjs >正文

信息类网站郴州网站建设

wzjs 2025/7/19 0:25:05

信息类网站,郴州网站建设,做网站领券收佣金,常州网上房地产官网目录一，unordered系列关联式容器 1，unordered_map 2，unordered_set 二、底层结构 1， 哈希概念 2，哈希函数 1）哈希函数设计原则 2）常见哈希函数 3， 哈希冲突 1）…

一，unordered系列关联式容器

1，unordered_map

2，unordered_set

二、底层结构

1，哈希概念

2，哈希函数

1）哈希函数设计原则

2）常见哈希函数

3，哈希冲突

1）闭散列

2）开散列

三、哈希的实现

四、哈希的应用

1，位图

1）经典题目

2）位图的概念

3）位图的应用

4）位图的实现

2，布隆过滤器

1）布隆过滤器的概念

2）布隆过滤器的应用

3）布隆过滤器的实现

点个赞吧！666

一，unordered系列关联式容器

在C++98中，STL提供了底层为红黑树结构的一系列关联式容器，在查询时效率可达到$log_2N$，即最差情况下需要比较红黑树的高度次，当树中的节点非常多时，查询效率也不理想。

最好的查询是，进行很少的比较次数就能够将元素找到，因此在C++11中，STL又提供了4个unordered系列的关联式容器，这四个容器与红黑树结构的关联式容器使用方式基本类似，只是其底层结构不同。

对比map/set的区别：

1，map和set遍历是有序的，unordered系列是无序的

2，map和set是双向迭代器，unordered系列是单向的

1，unordered_map

文档链接：https://cplusplus.com/reference/unordered_map/unordered_map/?kw=unordered_map

文档翻译：

1. unordered_map是存储<key, value>键值对的关联式容器，其允许通过keys快速的索引到与其对应的value。

2. 在unordered_map中，键值通常用于惟一地标识元素，而映射值是一个对象，其内容与此键关联。键和映射值的类型可能不同。

3. 在内部,unordered_map没有对<kye, value>按照任何特定的顺序排序, 为了能在常数范围内找到key所对应的value，unordered_map将相同哈希值的键值对放在相同的桶中。

4. unordered_map容器通过key访问单个元素要比map快，但它通常在遍历元素子集的范围迭代方面效率较低。

5. unordered_maps实现了直接访问操作符(operator[])，它允许使用key作为参数直接访问value。

6. 它的迭代器至少是前向迭代器。

使用实例：

#pragma onceusing namespace std;#include "Hash.h"namespace ccc
{template<class K, class Hash = HashFunc<K>>class unordered_set{struct SetKeyOfT{const K& operator()(const K& key){return key;}};public:typedef typename Bucket::HashTable<K, K, Hash, SetKeyOfT>::iterator iterator;iterator begin(){return _ht.begin();}iterator end(){return _ht.end();}pair<iterator, bool> insert(const K& key){return _ht.Insert(key);}private:Bucket::HashTable<K, K, Hash, SetKeyOfT> _ht;};void test_set(){unordered_set<int> s;s.insert(2);s.insert(3);s.insert(1);s.insert(2);s.insert(5);unordered_set<int>::iterator it = s.begin();//auto it = s.begin();while (it != s.end()){cout << *it << " ";++it;}cout << endl;}}

2，unordered_set

文档链接：https://cplusplus.com/reference/unordered_set/unordered_set/?kw=unordered_set

文档翻译：

1，无序集是不按特定顺序存储唯一元素的容器，它允许根据各个元素的值快速检索各个元素。
2，在unordered_set中，元素的值同时是其键，用于唯一标识它。键是不可变的，因此，unordered_set中的元素在容器中一次就不能修改 - 不过，它们可以入和删除。
3，在内部，unordered_set中的元素不按任何特定顺序排序，而是根据其哈希值组织到桶中，以允许直接通过其值快速访问各个元素（平均平均时间复杂度保持不变）。
4，unordered_set 容器通过键访问单个元素的速度比 set 容器更快，尽管它们通常对其元素子集进行范围迭代的效率较低。
5，容器中的迭代器至少是前向迭代器。

使用实例：

#pragma onceusing namespace std;#include "Hash.h"namespace csy
{template<class K, class V, class Hash = HashFunc<K>>class unordered_map{struct MapKeyOfT{const K& operator()(const pair<K, V>& kv){return kv.first;}};public:typedef typename Bucket::HashTable<K, pair<K, V>, Hash, MapKeyOfT>::iterator iterator;iterator begin(){return _ht.begin();}iterator end(){return _ht.end();}pair<iterator, bool> Insert(const pair<K, V>& kv){return _ht.Insert(kv);}V& operator[](const K& key){pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));return ret.first->second;}private:Bucket::HashTable<K, pair<K, V>, Hash, MapKeyOfT> _ht;};void test_map(){unordered_map<string, string> dict;dict.Insert(make_pair("sort", "排序"));dict.Insert(make_pair("string", "字符串"));dict.Insert(make_pair("left", "左边"));unordered_map<string, string>::iterator it = dict.begin();while (it != dict.end()){cout << it->first << ":" << it->second << endl;++it;}cout << endl;unordered_map<string, int> countMap;string arr[] = { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11" };for (auto e : arr){countMap[e]++;}for (auto& kv : countMap){cout << kv.first << ":" << kv.second << endl;}}
}

二、底层结构

1，哈希概念

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即O($log_2 N$)，搜索的效率取决于搜索过程中元素的比较次数。

理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。

如果构造一种存储结构，通过某种函数(hashFunc)，使元素的存储位置与它的关键码之间，能够建立一一对应的映射关系，那么在查找时通过该函数时，可以很快找到该元素。

当向该结构中：

1）插入元素

根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放。

2）搜索元素

对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功。

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表) 。

例如：数据集合{1，7，6，4，5，9}；哈希函数设置为：hash(key) = key % capacity; capacity为存储元素底层空间总的大小。

用该方法进行搜索不必进行多次关键码的比较，因此搜索的速度比较快。

2，哈希函数

引起哈希冲突的一个原因可能是：哈希函数设计不够合理。

1）哈希函数设计原则

哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值域必须在0到m-1之间

哈希函数计算出来的地址能均匀分布在整个空间中

哈希函数应该比较简单

2）常见哈希函数

A，直接定址法--(常用)

取关键字的某个线性函数为散列地址：Hash（Key）= A*Key + B

优点：简单、均匀。

缺点：需要事先知道关键字的分布情况。

使用场景：适合查找比较小且连续的情况。

B，除留余数法--(常用)

设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址。

3，哈希冲突

对于两个数据元素的关键字 $k_i$和 $k_j$(i != j)，有$k_i$ != $k_j$，但有：Hash($k_i$) == Hash($k_j$)，即：不同关键字通过相同哈希哈数计算出相同的哈希地址，该种现象称为哈希冲突或哈希碰撞。把具有不同关键码而具有相同哈希地址的数据元素称为“同义词”。

解决哈希冲突的思路是：扩容和负载因子。

解决哈希冲突两种常见的方法是：闭散列和开散列

1）闭散列

A，线性探测，往后占据下一个位置，容易大片冲突

B，二次探测，往后跳跃的占据位置，减少大片冲突，但还是无法避免

2）开散列

A，拉链法（哈希桶），用单链表指向下一个冲突的数据（冲突超过一定数量，就改成搜索树提高效率）

三、哈希的实现

#pragma once// 仿函数，转换类型，就能取模找位置啦
template<class K>
struct HashFunc
{size_t operator()(const K& key){return (size_t)key;}
};
// 仿函数——string特化版本
template<>
struct HashFunc<string>
{// BKDR算法size_t operator()(const string& key){size_t val = 0;for (auto ch : key){val *= 131;val += ch;}return val;}
};namespace CloseHash
{enum State{EMPTY,EXIST,DELETE};template<class K, class V>struct HashData{pair<K, V> _kv; // 键值key对应的数据valueState _state = EMPTY; // 内存状态标识：空，存在，删除};// 设置第三个模板参数，方便各种类型键值的取模查找template<class K, class V, class Hash = HashFunc<K>>class HashTable{public:bool Insert(const pair<K, V>& kv){// 如果数据存在，插入错误if (Find(kv.first)) return false;// 扩容，载荷控制在70%以下if (_table.size() == 0 || 10 * _size / _table.size() >= 7){// 计算新空间，创建新表，开辟空间size_t newSize = _table.size() == 0 ? 10 : _table.size() * 2;HashTable<K, V, Hash> newHT;newHT._table.resize(newSize);// 把旧表的数据映射到新表for (auto e : _table){if (e._state == EXIST){newHT.Insert(e._kv);}}// 把旧表转换成新表_table.swap(newHT._table);}// 线性探测查找位置, 存在数据的位置不能放值Hash hfc;size_t index = hfc(kv.first) % _table.size();while (_table[index]._state == EXIST){index++;index = index % _table.size();}// 放数据，标识存在数据_table[index]._kv = kv;_table[index]._state = EXIST;_size++;// 插入成功return true;}HashData<K, V>* Find(const K& key){if (_table.size() == 0) return nullptr;Hash hfc;size_t index = hfc(key) % _table.size();size_t sign = index;while (_table[index]._state != EMPTY){if (_table[index]._state != DELETE&& _table[index]._kv.first == key){return &_table[index];}index++;index = index % _table.size();if (index == sign) break;}return nullptr;}bool Erase(const K& key){HashData<K, V>* ret = Find(key);if (ret){ret->_state = DELETE;--_size;return true;}else{return false;}}void Print(){for (size_t i = 0; i < _table.size(); i++){if (_table[i]._state == EXIST){printf("[%d:%d] ", i, _table[i]._kv.second);}else{printf("[%d:*] ", i);}}cout << endl;}private:// vector<pair<K, V>> _table // 原始写法不带sizevector<HashData<K, V>> _table; // 哈希表size_t _size = 0; // 表里存储了多少个数据};void test3(){HashTable<int, int> h1;int a[] = { 1, 11, 4, 15, 26, 7, 44, 9 ,10 };for (auto e : a){h1.Insert(make_pair(e, e));}h1.Erase(4);cout << h1.Find(4) << endl;cout << h1.Find(44) << endl;h1.Print();}void test4(){HashTable<string, int> h2;string a[] = { "苹果", "草莓", "西瓜", "苹果", "草莓", "西瓜", "苹果", "草莓", "西瓜" };for (auto& e : a){auto ptr = h2.Find(e);if (ptr){ptr->_kv.second++;}else{h2.Insert(make_pair(e, 1));}}h2.Print();}void test5(){HashFunc<string> hash;// 如果算出来的值一样，位置会冲突cout << hash("abcd") << endl;cout << hash("cdab") << endl;cout << hash("cadb") << endl;cout << hash("dcba") << endl;}
}namespace HashBucket
{template<class K, class V>struct HashNode{pair<K, V> _kv; // 键值key对应的数据valueHashNode<K, V>* _next; // 链接下一个节点HashNode(const pair<K, V>& kv):_kv(kv), _next(nullptr){}};template<class K,class V,class Hash = HashFunc<K>>class HashTable{typedef HashNode<K, V> Node;public:~HashTable(){for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;free(cur);cur = next;}_tables[i] = nullptr;}}// 开辟空间大小为质数，能减少哈希冲突size_t Hash_tables_new_capacity(size_t n){static const size_t arr_num = 28;static const size_t arr[arr_num] ={53,			 97,		  193,	      389,	      769,1543,		 3079,		  6151,	      12289,	  24593,49157,		 98317,		  196613,	  393241,	  786433,1572869,	 3145739,	  6291469,    12582917,   25165843,50331653,	 100663319,	  201326611,  402653189,  805306457,1610612741,  3221225473,  4294967291};for (size_t i = 0; i < arr_num; ++i){if (arr[i] > n){return arr[i];}}return -1;}bool Insert(const pair<K, V>& kv){// 去重if (Find(kv.first)) return false;//扩容,负载因子到100%if (_size == _tables.size()){// 计算新空间，创建新表，开辟空间//size_t newSize = _tables.size() == 0 ? 10 : _tables.size() * 2;size_t newSize = Hash_tables_new_capacity(_tables.size());vector<Node*> newTable;newTable.resize(newSize, nullptr);// 旧表中的节点，移动到新表for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;Hash hfc;size_t hashi = hfc(cur->_kv.first) % newSize;cur->_next = newTable[hashi];newTable[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newTable);}// 找位置Hash hfc;size_t hashi = hfc(kv.first) % _tables.size();// 头插Node* newnode = new Node(kv);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_size;// 返回return true;}Node* Find(const K& key){if (_tables.size() == 0) return nullptr;Hash hfc;size_t hashi = hfc(key) % _tables.size();Node * cur = _tables[hashi];while (cur){if (cur->_kv.first == key){return cur;}cur = cur->_next;}return nullptr;}bool Erase(const K& key){if (_tables.size() == 0) return false;Hash hfc;size_t hashi = hfc(key) % _tables.size();Node* prev = nullptr;Node* cur = _tables[hashi];while (cur){if (cur->_kv.first == key) // 找到{if (prev == nullptr) // 头删{_tables[hashi] = cur->_next;}else // 中间删{prev = cur->_next;}delete cur;return true;}else // 没找到{prev = cur;cur = cur->_next;}}return false;}// 数据个数size_t Size(){return _size;}// 表的长度size_t TablesSize(){return _tables.size();}// 桶的个数size_t BucketNum(){size_t num = 0;for (size_t i = 0; i < _tables.size(); ++i){if (_tables[i]) { ++num;};}return num;}// 桶的长度size_t BucketSize(){size_t maxLen = 0;for (size_t i = 0; i < _tables.size(); ++i){size_t len = 0;Node* cur = _tables[i];while (cur){++len;cur = cur->_next;}if (len > maxLen){maxLen = len;}}return maxLen;}private:vector<Node*> _tables;size_t _size = 0;	};void test6(){HashTable<int, int> h1;int a[] = { 1, 11, 4, 15, 26, 7, 44, 55, 99, 78 };for (auto e : a){h1.Insert(make_pair(e, e));}h1.Insert(make_pair(22, 22));h1.Erase(1);h1.Erase(11);h1.Erase(4);h1.Erase(44);}void test7(){HashTable<string, int> h2;string a[] = { "苹果", "草莓", "西瓜", "苹果", "草莓", "西瓜", "苹果", "草莓", "西瓜" };for (auto& e : a){auto ptr = h2.Find(e);if (ptr){ptr->_kv.second++;}else{h2.Insert(make_pair(e, 1));}}h2.Insert(make_pair("葡萄", 7));}void test8(){int n = 1000000;vector<int> v;v.reserve(n);srand(time(0));for (int i = 0; i < n; ++i){//v.push_back(i);v.push_back(rand() + i);  // 重复少//v.push_back(rand());  // 重复多}size_t begin1 = clock();HashTable<int, int> ht;for (auto e : v){ht.Insert(make_pair(e, e));}size_t end1 = clock();cout << "数据个数:" << ht.Size() << endl;cout << "表的长度:" << ht.TablesSize() << endl;cout << "桶的个数:" << ht.BucketNum() << endl;cout << "平均每个桶的长度:" << (double)ht.Size() / (double)ht.BucketNum() << endl;cout << "最长的桶的长度:" << ht.BucketSize() << endl;cout << "负载因子:" << (double)ht.Size() / (double)ht.TablesSize() << endl;}
};namespace Bucket
{template<class T>struct HashNode{T _data; // T类型数据dataHashNode<T>* _next; // 链接下一个节点HashNode(const T& data):_data(data), _next(nullptr){}};// 前置声明template<class K, class T, class Hash, class KeyOfT>class HashTable;// 迭代器，只有单向template<class K, class T, class Hash, class KeyOfT>struct __HashIterator{typedef HashNode<T> Node;typedef HashTable<K, T, Hash, KeyOfT> HT;typedef __HashIterator<K, T, Hash, KeyOfT> Self;Node* _pnode;HT* _pht;__HashIterator(Node* pnode, HT* pht):_pnode(pnode), _pht(pht){}T& operator * (){return _pnode->_data;}T* operator -> (){return &_pnode->_data;}bool operator == (const Self& s) const{return _pnode == s._pnode;}bool operator != (const Self& s) const{return _pnode != s._pnode;}Self& operator ++ (){if (_pnode->_next) // 当前桶中迭代{_pnode = _pnode->_next;}else // 走到下一个桶{Hash hfc;KeyOfT kot;size_t i = hfc(kot(_pnode->_data)) % _pht->_tables.size();++i;for (; i < _pht->_tables.size(); ++i){if (_pht->_tables[i]){_pnode = _pht->_tables[i];break;}}if (i == _pht->_tables.size()){_pnode = nullptr;}}return *this;}};template<class K, class T, class Hash, class KeyOfT>class HashTable{typedef HashNode<T> Node;template<class K, class T, class Hash, class KeyOfT>friend struct __HashIterator;public:typedef __HashIterator<K, T, Hash, KeyOfT> iterator;iterator begin(){for (size_t i = 0; i < _tables.size(); ++i){if (_tables[i]){return iterator(_tables[i], this);}}return end();}iterator end(){return iterator(nullptr, this);}~HashTable(){for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}// 开辟空间大小为质数，能减少哈希冲突inline size_t Hash_tables_new_capacity(size_t n){static const size_t arr_num = 28;static const size_t arr[arr_num] ={53,			 97,		  193,	      389,	      769,1543,		 3079,		  6151,	      12289,	  24593,49157,		 98317,		  196613,	  393241,	  786433,1572869,	 3145739,	  6291469,    12582917,   25165843,50331653,	 100663319,	  201326611,  402653189,  805306457,1610612741,  3221225473,  4294967291};for (size_t i = 0; i < arr_num; ++i){if (arr[i] > n){return arr[i];}}return -1;}pair<iterator, bool> Insert(const T& data){Hash hfc;KeyOfT kot;// 去重//if (Find(kv.first)) return false;iterator ret = Find(kot(data));if (ret != end()){return make_pair(ret, false);}//扩容,负载因子到100%if (_size == _tables.size()){// 计算新空间，创建新表，开辟空间//size_t newSize = _tables.size() == 0 ? 10 : _tables.size() * 2;size_t newSize = Hash_tables_new_capacity(_tables.size());vector<Node*> newTable;newTable.resize(newSize, nullptr);// 旧表中的节点，移动到新表for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;size_t hashi = hfc(kot(cur->_data)) % newSize;cur->_next = newTable[hashi];newTable[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newTable);}// 找位置size_t hashi = hfc(kot(data)) % _tables.size();// 头插Node* newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_size;// 返回return make_pair(iterator(newnode, this), true);}iterator Find(const K& key){if (_tables.size() == 0) return end();Hash hfc;KeyOfT kot;size_t hashi = hfc(key) % _tables.size();Node* cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){return iterator(cur, this);}cur = cur->_next;}return end();}bool Erase(const K& key){if (_tables.size() == 0) return false;Hash hfc;KeyOfT kot;size_t hashi = hfc(key) % _tables.size();Node* prev = nullptr;Node* cur = _tables[hashi];while (cur){if (cur->_data == key) // 找到{if (prev == nullptr) // 头删{_tables[hashi] = cur->_next;}else // 中间删{prev = cur->_next;}delete cur;return true;}else // 没找到{prev = cur;cur = cur->_next;}}return false;}// 数据个数size_t Size(){return _size;}// 表的长度size_t TablesSize(){return _tables.size();}// 桶的个数size_t BucketNum(){size_t num = 0;for (size_t i = 0; i < _tables.size(); ++i){if (_tables[i]){++num;};}return num;}// 桶的长度size_t BucketSize(){size_t maxLen = 0;for (size_t i = 0; i < _tables.size(); ++i){size_t len = 0;Node* cur = _tables[i];while (cur){++len;cur = cur->_next;}if (len > maxLen){maxLen = len;}}return maxLen;}private:vector<Node*> _tables;size_t _size = 0;};
}

四、哈希的应用

1，位图

1）经典题目

1，给40亿个不重复的无符号整数和一个无符号整数，没排过序，如何快速判断这一个数是否在这40亿个数中？

2. 给两个文件，分别有100亿个整数，我们只有1G内存，如何找到两个文件交集？

3. 位图应用变形：1个文件有100亿个int，1G内存，设计算法找到出现次数不超过2次的所有整数

解决方法：

A. 遍历，时间复杂度O(N)

B. 排序(O(NlogN))，利用二分查找: logN

C. 位图解决

2）位图的概念

所谓位图，就是用每一位来存放某种状态，适用于海量数据，数据无重复的场景。

通常是用来判断某个数据存不存在的。

位图的特点是：快、节省空间，但相对局限，只能映射处理整形。

数据是否在给定的整形数据中，结果是在或者不在，刚好是两种状态，那么可以使用一个二进制比特位来代表数据是否存在的信息，如果二进制比特位为1，代表存在，为0 代表不存在。比如下图：

3）位图的应用

1. 快速查找某个数据是否在一个集合中

2. 排序 + 去重

3. 求两个集合的交集、并集等

4. 操作系统中磁盘块标记

4）位图的实现

template<size_t N>
class BitMap
{
public:BitMap(){_bits.resize(N / 8 + 1, 0);}void push(size_t x){size_t i = x / 8;size_t j = x % 8;_bits[i] |= (1 << j);// 或等}void pop(size_t x){size_t i = x / 8;size_t j = x % 8;_bits[i] &= ~(1 << j);// 与等}bool empty(size_t x){size_t i = x / 8;size_t j = x % 8;return _bits[i] & (1 << j);// 与}
private:vector<char> _bits;
};void test_bitmap1()
{BitMap<100> bm1;bm1.push(8);bm1.push(9);bm1.push(20);cout << bm1.empty(8) << endl;cout << bm1.empty(9) << endl;cout << bm1.empty(20) << endl;bm1.pop(8);bm1.pop(9);bm1.pop(20);cout << bm1.empty(8) << endl;cout << bm1.empty(9) << endl;cout << bm1.empty(20) << endl;BitMap<-1> bm2;BitMap<1024*1024*1024*4> bm3;BitMap<0xffffffff> bm4;
}// 小题目
// 1个文件有100亿个int，1G内存，设计算法找到出现次数不超过2次的所有整数
// kv模型统计次数
// 0次以上		00
// 1次以上		01
// 2次以上		10
template<size_t N>
class TowBitMap
{
public:void push(size_t x){bool insert1 = _tbm1.empty(x);bool insert2 = _tbm2.empty(x);// 00if (insert1 == false && insert2 == false){// -> 01_tbm2.push(x);}else if (insert1 == false && insert2 == true){// -> 10_tbm1.push(x);_tbm2.pop(x);}else if (insert1 == true && insert2 == false){// -> 01_tbm1.push(x);_tbm2.push(x);}}// 记录出现只一次的数void print_once_num(){for (size_t i = 0; i < N; ++i){if (_tbm1.empty(i) == false && _tbm2.empty(i) == true){cout << i << endl;}}}private:BitMap<N> _tbm1;BitMap<N> _tbm2;
};void test_bitmap2()
{int a[] = { 3, 4, 5, 2, 3, 4, 4, 4, 4, 12, 77, 65, 44, 4, 44, 99, 33, 33, 33, 6, 5, 34, 12 };TowBitMap<100> tbm1;for (auto e : a){tbm1.push(e);}tbm1.print_once_num();
}

2，布隆过滤器

1）布隆过滤器的概念

布隆过滤器是由布隆（Burton Howard Bloom）在1970年提出的，一种紧凑型的、比较巧妙的概率型数据结构，特点是高效地插入和查询。

布隆过滤器其实就是位图的一个变形和延申，虽然无法避免存在哈希冲突，但我们可以想办法降低误判的概率。

当一个数据映射到位图中时，布隆过滤器会用多个哈希函数将其映射到多个比特位，当判断一个数据是否在位图当中时，需要分别根据这些哈希函数计算出对应的比特位，如果这些比特位都被设置为1则判定为该数据存在，否则则判定为该数据不存在。

布隆过滤器使用多个哈希函数进行映射，目的就在于降低哈希冲突的概率，一个哈希函数产生冲突的概率可能比较大，但多个哈希函数同时产生冲突的概率可就没那么大了。

2）布隆过滤器的应用

通常用于快速查询数据是否存在，并且可以存在误判。

3）布隆过滤器的实现

struct HashBKDR
{size_t operator()(const string& key){size_t val = 0;for (auto ch : key){val *= 131;val += ch;}return val;}
};struct HashAP
{size_t operator()(const string& key){size_t hash = 0;for (size_t i = 0; i < key.size(); i++){if ((i & 1) == 0){hash ^= ((hash << 7) ^ key[i] ^ (hash >> 3));}else{hash ^= (~((hash << 11) ^ key[i] ^ (hash >> 5)));}}return hash;}
};struct HashDJB
{size_t operator()(const string& key){size_t hash = 5381;for (auto ch : key){hash += (hash << 5) + ch;}return hash;}
};// N表示准备映射N个数据
// K表示数据类型
// Hash123表示映射的3个位置
template<size_t N,class K = string,class Hash1 = HashBKDR, class Hash2 = HashAP, class Hash3 = HashDJB>
class BloomFilter
{
public:void Set(const K& key){size_t hash1 = Hash1()(key) % (_ratio * N);_bits->push(hash1);size_t hash2 = Hash2()(key) % (_ratio * N);_bits->push(hash2);size_t hash3 = Hash3()(key) % (_ratio * N);_bits->push(hash3);}bool Test(const K& key){size_t hash1 = Hash1()(key) % (_ratio * N);if (!_bits->empty(hash1)) return false; // 准确的size_t hash2 = Hash2()(key) % (_ratio * N);if (!_bits->empty(hash2)) return false; // 准确的size_t hash3 = Hash3()(key) % (_ratio * N);if (!_bits->empty(hash3)) return false; // 准确的return true; // 可能存在误判}
private:const static size_t _ratio = 5; // 1个值映射3个位置，开5倍的空间就比较合适BitMap<N*_ratio>* _bits = new BitMap<N* _ratio>; // 默认初始化
};void Test_BloomFilter_1()
{BloomFilter<10> bf;string arr1[] = { "苹果", "西瓜", "阿里", "美团", "苹果", "字节", "西瓜", "苹果", "香蕉", "苹果", "腾讯" };for (auto& str : arr1){bf.Set(str);}for (auto& str : arr1){cout << bf.Test(str) << endl;}cout << endl << endl;string arr2[] = { "苹果111", "西瓜", "阿里2222", "美团", "苹果dadcaddxadx", "字节", "西瓜sSSSX", "苹果 ", "香蕉", "苹果$", "腾讯" };for (auto& str : arr2){cout << str << ":" << bf.Test(str) << endl;}cout << endl << endl;
}