当前位置：首页 > news >正文

【C++闯关笔记】哈希表模拟实现unordered_map与unordered_set

news 2025/11/6 7:00:49

系列文章目录

【C++闯关笔记】使用红黑树简单模拟实现map与set-CSDN博客

【C++闯关笔记】unordered_map与unordered_set的底层：哈希表（哈希桶）-CSDN博客

文章目录

目录

系列文章目录

文章目录

前言

unordered_map、unordered_set与map、set的区别与联系

一、核心框架

1.unordered_map、unordered_set结构分析

HashData结构

KeyOfT

2.框架实现

二、迭代器iterator的模拟实现

1.思路分析

operator++

2.模拟实现

三、重载unordered_map的[ ]

1.insert返回pair< >的原因解答

2.模拟实现operator [ ]

四、代码整合

1.底层哈希表代码

2.模拟实现的unordered_set

3.模拟实现的unordered_map

本文总结

前言

unordered_map、unordered_set与map、set的区别与联系

unordered_map、unordered_set与map、set都属于C++标准模板库中的关联式容器，即通过键来访问的容器。map 和 set 是有序容器，而 unordered_map 和 unordered_set 自C++11后引入的无序容器。

特性	`map/set` (有序)	`unordered_map/unordered_set` (无序)
底层结构	红黑树	哈希表
时间复杂度	O(log n)	平均O(1)，最坏O(n)
元素顺序	按键排序	无序（由哈希函数决定）
键的要求	必须定义 `<` 或自定义比较器	必须定义 `==` 和自定义哈希函数
内存使用	通常更紧凑	因预分配桶而可能有额外开销
迭代器稳定性	插入/删除稳定（除被删除元素）	插入可能导致全部迭代器失效（rehash时）
主要用途	需要有序遍历、范围查询	需要极速单点访问、不关心顺序

一、核心框架

1.unordered_map、unordered_set结构分析

上面说到unordered_map、unordered_set与map、set都是靠键来访问的关联式容器，实际中使用unordered_map、unordered_set与map、set的方法过程也几乎完全类似。

观察上图，可以发现在结构上unordered_map、unordered_set和map、set的完全类似。

我们已经知道了map与set的底层是用的红黑树，unordered_map与unordered_set底层则用的哈希表。

现在问题是：unordered_map与unordered_set能否像map与set那样复用一套底层实现呢？

HashData结构

假设复用同一个Hashtable实现key和key/value结构，那么unordered_set就应该传给hashtable的是两个 key，unordered_map传给hashtable的是是pair<key,value>。

这能实现吗？能，HashTable的底层数据即HashNode可以用模板T代替数据类型，上面传的是key，那就存储key；上面传的是pair<key,value>那就存储pair。

namespace karsen
{template<class T>struct HashNode{T _data;HashNode* _next = nullptr;HashNode(const T& data):_data(data), _next(nullptr){}};
}

KeyOfT

上面的HashNode解决了unordered_map与unordered_det用同一个容器存储，可是紧随其后的又是另一个问题：哈希表维持结构的关键是通过哈希函数计算得出key与内存的映射关系，因为HashTable实现了泛型不知道T参数导致是K，还是pair<K,V>，那么u_set与u_map怎么复用同一个一个函数insert实现插入呢？

因为HashTable实现了泛型不知道T参数导致是K，还是pair<K,V>，而insert内部进行插入时要用key转换成整形取模构建与内存的映射，而这里如果是u_set还好直接用HashData 中的key即可，可是u_map怎么办？

所以我们在unordered_map和unordered_set中分别实现一个MapKeyOfT和SetKeyOfT的仿函数传给 HashTable的KeyOfT，然后HashTable中通过KeyOfT仿函数取出T类型对象中的K对象，再转换成整形取模供给哈希函数比较使用。

如下所示：

unordered_set:

namespace karsen
{template<class K,class hash=HashFunc<K>>class unordered_set{struct set_KeyOfT{const K& operator()(const K& key){return key;}};private:HashTable<K, K, hash, set_KeyOfT> _set;};
}

unordered_map:

namespace karsen
{template<class K, class V, class Hash = HashFunc<K>>class unordered_map{struct map_KeyOfT{const K& operator()(const std::pair<const K, V>& kv){return kv.first;}};private:karsen::HashTable<K, std::pair<K, V>, Hash, map_KeyOfT> _map;};
}

读者可能会好奇上面代码中HashTable中传入的 Hash = HashFunc<K>是什么，这在上一篇笔记【C++闯关笔记】unordered_map与unordered_set的底层：哈希表（哈希桶）-CSDN博客中提到HashFunc的作用是将关键字转为整数，以便哈希函数使用key与内存进行映射。

下方的代码框架实现是建立在哈希表之上的，如果读者对哈希表不太熟悉，可以点击上方蓝字了解哈希表的概念与实现，考虑到篇幅问题，本文不再赘述哈希表的逻辑与实现。

2.框架实现

综合上述我们就可以用哈希表简单搭建unordered_map与unordered_set框架了。

#pragma once
#include<iostream>
#include<vector>//insert扩容时取得下一次空间的大小
//返回一个不小于n的质数
size_t next_prime(size_t n)
{// Note: assumes long is at least 32 bits.static const int nums = 28;static const unsigned long _prime_list[nums] ={53, 97, 193, 389, 769,1543, 3079, 6151, 12289, 24593,49157, 98317, 196613, 393241, 786433,1572869, 3145739, 6291469, 12582917, 25165843,50331653, 100663319, 201326611, 402653189, 805306457,1610612741, 3221225473, 4294967291};const unsigned long* first = _prime_list;const unsigned long* last = _prime_list + nums;//  [first,last)const unsigned long* pos = std::lower_bound(first, last, n);return pos == last ? *(last - 1) : *pos;
}//将key键转化为整数
template<class K>
struct HashFunc
{size_t operator()(const K& key){return size_t(key);}
};//针对string特化
template<>
struct HashFunc<std::string>
{size_t ret = 0;size_t operator()(const std::string& key){for (auto& ch : key)ret += ch;return ret;}
};namespace karsen
{template<class T>struct HashNode{T _data;HashNode* _next = nullptr;HashNode(const T& data):_data(data), _next(nullptr){}};//这里将V视作原来的Ttemplate<class K, class V,class HashFunc, class KeyOfT>class HashTable{typedef HashNode<V> Node;public:HashTable():_tables(next_prime(0)), _cnt(0){}//拷贝构造HashTable(const HashTable<K, V, HashFunc, KeyOfT>& kv){_tables.resize(kv._tables.size(), nullptr);_cnt = kv._cnt;for (size_t i = 0; i < kv._tables.size(); ++i){//头插if (kv._tables[i]){Node* cur = kv._tables[i];while (cur){Node* newNode = new Node(cur->_data);newNode->_next = _tables[i];_tables[i] = newNode;cur = cur->_next;}}}}HashTable<K,V,HashFunc,KeyOfT>& operator=(HashTable<K, V, HashFunc, KeyOfT> kv){std::swap(kv._tables, _tables);std::swap(kv._cnt, _cnt);return *this;}~HashTable(){for (size_t i = 0; i < _tables.size(); ++i){if (_tables[i] != nullptr){Node* cur = _tables[i];Node* next = nullptr;while (cur){next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}}std::pair<Iterator,bool> Insert(const V& data){HashFunc ht;KeyOfT kot;Iterator it = Find(kot(data));if (it != End())return{ it,false };//扩容if (_cnt == _tables.size()){std::vector<Node*> newTable(next_prime(_tables.size() + 1));for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;//头插size_t newPos = ht(kot(cur->_data)) % newTable.size();//这里跳过第一次，就能明显看出头插逻辑了cur->_next = newTable[newPos];newTable[newPos] = cur;cur = next;}}_tables.swap(newTable);}size_t hashi = ht(kot(data)) % _tables.size();//头插Node* newNode = new Node(data);newNode->_next = _tables[hashi];_tables[hashi] = newNode;_cnt++;return { Iterator(newNode,this),true };}Iterator Find(const K& key){HashFunc ht;KeyOfT kot;size_t pos = ht(key) % _tables.size();Node* cur = _tables[pos];while (cur){if (ht(kot(cur->_data)) == key)return Iterator(cur,this);cur = cur->_next;}return End();}bool Erase(const K& key){//if (Find(key) == End())return false;HashFunc ht;KeyOfT kot;size_t goalPos = ht(key) % _tables.size();Node* cur = _tables[goalPos];Node* prev = nullptr;while (cur){if (kot(cur->_data) == key){if (!prev){_tables[goalPos] = cur->_next;}else{prev->_next = cur->_next;}delete cur;--_cnt;return true;}prev = cur;cur = cur->_next;}return false;}private:std::vector<HashNode<V>*>_tables;size_t _cnt = 0;};}

二、迭代器iterator的模拟实现

1.思路分析

由于哈希表中的结构是vector + 单链表，所以哈希表的迭代器是单向迭代器。

哈希表iterator实现的思路架跟list的iterator思路是基本一致：用一个类封装结点的指针，再通过重载运算符实现迭代器像指针一样访问的行为。

begin()返回第一个桶中第一个节点指针构造的迭代器，end()返回迭代器用空表示。

再考虑unordered_map与unordered_set迭代器的差异：unordered_set的iterator也不支持修改，unordered_map的iterator不支持修改key但是可以修改value。所以我们可以将把unordered_set的第二个模板参数改成const K；把unordered_map的第二个模板参数pair的第一个参数改成const K即可。如下方代码所示：
unordered_set：

namespace karsen
{template<class K,class hash=HashFunc<K>>class unordered_set{private:HashTable<K, const K, hash, set_KeyOfT> _set;};
}

unordered_map：

namespace karsen
{template<class K, class V, class Hash = HashFunc<K>>class unordered_map{private:karsen::HashTable<K, std::pair<const K, V>, Hash, map_KeyOfT> _map;};
}

operator++

最麻烦的还是是operator++的实现。因为这里需要分两种情况：

①如果当前桶的该节点下面还有结点，则结点的指针指向下一个结点即可；

②如果当前桶走完了，则需要想办法计算找到下一个不为空的桶。

这里的难点在于需要访问哈希表本身，也就是说iterator中得又哈希表对象的指针，这样当前桶走完了，才能找到下一个桶：用key值计算出当前桶位置，依次往后找下一个不为空的桶即可。

2.模拟实现

上面将可能的情况分析清楚了，现在直接模拟实现iterator，代码中夹杂着注释，方便读者理解它们的作用。

class Ref,class Ptr。这里将V类型当作T，Ref则是T&，Ptr这是T*，这样写的目的在于可以用同一个迭代器类实现普通迭代器和const_iterator迭代器，仅需在实例化时传不同的参数。

如

typedef HashTableIterator<K, V, V&, V*, HashFunc, KeyOfT> Iterator;
typedef HashTableIterator<K, V, const V&, const V*, HashFunc, KeyOfT> ConstIterator;

iterator代码实现

namespace karsen
{//迭代器中要访问HashTable，所以这里提前声明template<class K, class V, class HashFunc, class KeyOfT>class HashTable;//这里将V视作原来的Ttemplate<class K,class V,class Ref,class Ptr,class HashFunc,class KeyOfT>struct HashTableIterator{typedef HashNode<V> Node;typedef HashTableIterator<K,V, Ref, Ptr, HashFunc, KeyOfT> Self;//将哈希表类型typedef为HTtypedef HashTable<K, V, HashFunc, KeyOfT> HT;Node* _node = nullptr;//哈希表指针const HT* _ht = nullptr;		HashTableIterator(Node * node , HT*ht):_node(node),_ht(ht){ }Self& operator++(){if (_node->_next){_node = _node->_next;return *this;}else{//这个桶走完了，找下一个有数据的桶HashFunc hf;KeyOfT kot;size_t hashi = hf(kot(_node->_data)) % _ht->_tables.size();hashi++;while (hashi < _ht->_tables.size()){_node = _ht->_tables[hashi];if (_node)return *this;else hashi++;}//后面所有桶都为空，着返回end()即nullptr；if (hashi == _ht->_tables.size()){_node = nullptr;}}return *this;}Ref operator*(){return _node->_data;}Ptr operator->(){return &(_node->_data);}bool operator==(const Self& it){return it._node == _node;}bool operator!=(const Self& it){return it._node != _node;}};
}

实现了iterator之后，上述的核心框架也可以加入begin、end等函数了。

namespace karsen
{//这里将V视作原来的Ttemplate<class K, class V,class HashFunc, class KeyOfT>class HashTable{typedef HashNode<V> Node;//在迭代器operator++中需要访问私有成员tables，所以需要友元template<class K, class V, class Ref, class Ptr, class HashFunc, class KeyOfT>friend struct HashTableIterator;public:typedef HashTableIterator<K, V, V&, V*, HashFunc, KeyOfT> Iterator;typedef HashTableIterator<K, V, const V&, const V*, HashFunc, KeyOfT> ConstIterator;//找到第一个桶中第一个节点，没找到直接返回End()Iterator Begin(){if (_cnt == 0)return End();size_t hashPos = 0;while (hashPos < _tables.size()){if (_tables[hashPos])return Iterator(_tables[hashPos], this);else hashPos++;}return End();}Iterator End(){return Iterator(nullptr, this);}ConstIterator Begin()const{if (_cnt == 0)return End();size_t hashPos = 0;while (hashPos < _tables.size()){if (_tables[hashPos])return ConstIterator(_tables[hashPos], this);else hashPos++;}return End();}ConstIterator End()const{return ConstIterator(nullptr, this);}
}

三、重载unordered_map的[ ]

1.insert返回pair< >的原因解答

库中unordered_map的insert函数与map的insert函数一样，返回一个pair<iterator, bool>对象。

其中第一个数据成员first，存储的是插入成功或失败（数据已存在）后数据在哈希表中的位置封装成的迭代器；第二个数据成员second，存储的是表示是否插入成功的布尔值，成功即为true，失败为false。

值得一提的是，find函数正是利用了insert的返回值实现的查找。

2.模拟实现operator [ ]

值得注意的是仅unordered_map支持[ ]修改，所以重载的operator[ ]在模拟实现的unordered_map类中。

namespace karsen
{template<class K, class V, class Hash = HashFunc<K>>class unordered_map{V& operator[](const K& key){std::pair<iterator, bool> it = insert({ key,V() });return it.first->second;}private:karsen::HashTable<K, std::pair<const K, V>, Hash, map_KeyOfT> _map;};
}

四、代码整合

1.底层哈希表代码

#pragma once
#include<iostream>
#include<vector>size_t next_prime(size_t n)
{// Note: assumes long is at least 32 bits.static const int nums = 28;static const unsigned long _prime_list[nums] ={53, 97, 193, 389, 769,1543, 3079, 6151, 12289, 24593,49157, 98317, 196613, 393241, 786433,1572869, 3145739, 6291469, 12582917, 25165843,50331653, 100663319, 201326611, 402653189, 805306457,1610612741, 3221225473, 4294967291};const unsigned long* first = _prime_list;const unsigned long* last = _prime_list + nums;//  [first,last)const unsigned long* pos = std::lower_bound(first, last, n);return pos == last ? *(last - 1) : *pos;
}template<class K>
struct HashFunc
{size_t operator()(const K& key){return size_t(key);}
};//针对string特化
template<>
struct HashFunc<std::string>
{size_t ret = 0;size_t operator()(const std::string& key){for (auto& ch : key)ret += ch;return ret;}
};namespace karsen
{template<class T>struct HashNode{T _data;HashNode* _next = nullptr;HashNode(const T& data):_data(data), _next(nullptr){}};//声明template<class K, class V, class HashFunc, class KeyOfT>class HashTable;//这里将V视作原来的Ttemplate<class K,class V,class Ref,class Ptr,class HashFunc,class KeyOfT>struct HashTableIterator{typedef HashNode<V> Node;typedef HashTableIterator<K,V, Ref, Ptr, HashFunc, KeyOfT> Self;typedef HashTable<K, V, HashFunc, KeyOfT> HT;Node* _node = nullptr;const HT* _ht = nullptr;		HashTableIterator(Node * node , HT*ht):_node(node),_ht(ht){ }Self& operator++(){if (_node->_next){_node = _node->_next;return *this;}else{//这个桶走完了，找下一个有数据的桶HashFunc hf;KeyOfT kot;size_t hashi = hf(kot(_node->_data)) % _ht->_tables.size();hashi++;while (hashi < _ht->_tables.size()){_node = _ht->_tables[hashi];if (_node)return *this;else hashi++;}//后面所有桶都为空，着返回end()即nullptr；if (hashi == _ht->_tables.size()){_node = nullptr;}}return *this;}Ref operator*(){return _node->_data;}Ptr operator->(){return &(_node->_data);}bool operator==(const Self& it){return it._node == _node;}bool operator!=(const Self& it){return it._node != _node;}};//这里将V视作原来的Ttemplate<class K, class V,class HashFunc, class KeyOfT>class HashTable{typedef HashNode<V> Node;//在迭代器operator++中需要访问私有成员tablestemplate<class K, class V, class Ref, class Ptr, class HashFunc, class KeyOfT>friend struct HashTableIterator;public:typedef HashTableIterator<K, V, V&, V*, HashFunc, KeyOfT> Iterator;typedef HashTableIterator<K, V, const V&, const V*, HashFunc, KeyOfT> ConstIterator;Iterator Begin(){if (_cnt == 0)return End();size_t hashPos = 0;while (hashPos < _tables.size()){if (_tables[hashPos])return Iterator(_tables[hashPos], this);else hashPos++;}return End();}Iterator End(){return Iterator(nullptr, this);}ConstIterator Begin()const{if (_cnt == 0)return End();size_t hashPos = 0;while (hashPos < _tables.size()){if (_tables[hashPos])return ConstIterator(_tables[hashPos], this);else hashPos++;}return End();}ConstIterator End()const{return ConstIterator(nullptr, this);}HashTable():_tables(next_prime(0)), _cnt(0){}HashTable(const HashTable<K, V, HashFunc, KeyOfT>& kv){_tables.resize(kv._tables.size(), nullptr);_cnt = kv._cnt;for (size_t i = 0; i < kv._tables.size(); ++i){//头插if (kv._tables[i]){Node* cur = kv._tables[i];while (cur){Node* newNode = new Node(cur->_data);newNode->_next = _tables[i];_tables[i] = newNode;cur = cur->_next;}}}}HashTable<K,V,HashFunc,KeyOfT>& operator=(HashTable<K, V, HashFunc, KeyOfT> kv){std::swap(kv._tables, _tables);std::swap(kv._cnt, _cnt);return *this;}~HashTable(){for (size_t i = 0; i < _tables.size(); ++i){if (_tables[i] != nullptr){Node* cur = _tables[i];Node* next = nullptr;while (cur){next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}}std::pair<Iterator,bool> Insert(const V& data){HashFunc ht;KeyOfT kot;Iterator it = Find(kot(data));if (it != End())return{ it,false };//扩容if (_cnt == _tables.size()){std::vector<Node*> newTable(next_prime(_tables.size() + 1));for (size_t i = 0; i < _tables.size(); ++i){Node* cur = _tables[i];while (cur){Node* next = cur->_next;//头插size_t newPos = ht(kot(cur->_data)) % newTable.size();//这里跳过第一次，就能明显看出头插逻辑了cur->_next = newTable[newPos];newTable[newPos] = cur;cur = next;}}_tables.swap(newTable);}size_t hashi = ht(kot(data)) % _tables.size();//头插Node* newNode = new Node(data);newNode->_next = _tables[hashi];_tables[hashi] = newNode;_cnt++;return { Iterator(newNode,this),true };}Iterator Find(const K& key){HashFunc ht;KeyOfT kot;size_t pos = ht(key) % _tables.size();Node* cur = _tables[pos];while (cur){if (ht(kot(cur->_data)) == key)return Iterator(cur,this);cur = cur->_next;}return End();}bool Erase(const K& key){//if (Find(key) == End())return false;HashFunc ht;KeyOfT kot;size_t goalPos = ht(key) % _tables.size();Node* cur = _tables[goalPos];Node* prev = nullptr;while (cur){if (kot(cur->_data) == key){if (!prev){_tables[goalPos] = cur->_next;}else{prev->_next = cur->_next;}delete cur;--_cnt;return true;}prev = cur;cur = cur->_next;}return false;}private:std::vector<HashNode<V>*>_tables;size_t _cnt = 0;};}

2.模拟实现的unordered_set

#pragma once
#include"HashTable.h"namespace karsen
{template<class K,class hash=HashFunc<K>>class unordered_set{struct set_KeyOfT{const K& operator()(const K& key){return key;}};public:typedef typename karsen::HashTable<K, const K, hash, set_KeyOfT>::Iterator iterator;typedef typename karsen::HashTable<K, const K, hash, set_KeyOfT>::ConstIterator const_iterator;iterator begin(){return _set.Begin();}iterator end(){return _set.End();}const_iterator begin()const{return _set.Begin();}const_iterator end()const{return _set.End();}std::pair<iterator, bool>insert(const K& key){return _set.Insert(key);}iterator find(const K& key){return _set.Find(key);}bool erase(const K& key){return _set.Erase(key);}private:HashTable<K, const K, hash, set_KeyOfT> _set;};
}

3.模拟实现的unordered_map

#pragma once
#include"HashTable.h"namespace karsen
{template<class K, class V, class Hash = HashFunc<K>>class unordered_map{struct map_KeyOfT{const K& operator()(const std::pair<const K, V>& kv){return kv.first;}};public:typedef typename karsen::HashTable<K, std::pair<const K, V>, Hash, map_KeyOfT>::Iterator iterator;typedef typename karsen::HashTable<K, std::pair<const K, V>, Hash, map_KeyOfT>::ConstIterator const_iterator;iterator begin(){return _map.Begin();}iterator end(){return _map.End();}const_iterator begin()const{return _map.Begin();}const_iterator end()const{return _map.End();}std::pair<iterator, bool> insert(const std::pair<const K, V>& kv){return _map.Insert(kv);}V& operator[](const K& key){std::pair<iterator, bool> it = insert({ key,V() });return it.first->second;}iterator find(const K& key){return _map.Find(key);}bool erase(const K& key){return _map.Erase(key);}private:karsen::HashTable<K, std::pair<const K, V>, Hash, map_KeyOfT> _map;};
}

本文总结

本文首先总结了unordered_map、unordered_set与map、set的区别与联系，其次分析了unordered_map、unordered_set的共同底层——哈希表的结构，其中着重介绍了HashData的结构，以及KeyOfT的原理。之后，本文又尝试模拟实现哈希表的迭代器Iterator，并介绍loperator++的实现原理，以及unordered_map中[ ]与insert的复用关系。最后，总结了哈希表HashTable、unordered_map、unordered_set的实现代码。

希望本文能对你有所帮助。

读完点赞，手留余香~

查看全文

http://www.dtcms.com/a/572963.html