networkx图节点重要性分析探索-k_core和betweenness_centrality
k-core用来寻找符合k核心度的子图,betweenness_centrality用来计算节点中介中心度,两者都能分析图中位置核心和重要的节点。
这里基于python平台的图谱工具networkx,分析示例k-core和betweenness_centrality的计算过程,以及他们在RAG图中的实际应用。
示例代码整理自网络资料。
1 networkx
networkx提供了简单的创建节点和边的方法,以及访问网络信息的直观方式。
覆盖从基本的网络结构和分析工具到复杂的网络算法,支持多种类型网络如无向图、有向图、多重图等。
python环境networkx的安装方式如下
pip install networkx -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple
matplotlib主要为画图和展示用。
2 k_core
k-core一种子图挖掘算法,通常用来找出一个图中符合指定k核心度的子图,每个节点至少与该子图中的其他k个节点相关联。核心度k越高,子图越小,该子图对应的核心度也越大。某种意义上核心度划分的子图在原图中承担着比较重要的角色,如图的起源和演化趋势追溯,图中介识别等。
以下展示3-core结构
import networkx as nx
import networkx.algorithms as algos
import matplotlib.pyplot as plt
#创建karate_club
G = nx.karate_club_graph()
#nx.draw(G, with_labels = True)
#返回每个顶点的核心数
print(algos.core_number(G))#返回每个顶点的onion_layers
print(algos.onion_layers(G))#展示3-core结构
nx.draw(algos.k_core(G, 3), with_labels = True)
以下展示4-core结构
#展示4-core结构
nx.draw(algos.k_core(G, 4), with_labels = True)
3 betweenness_centrality
中介中心度betweenness_centrality,表示的是一个节点被图上的最短路径经过(连接不同子图)的次数所占比例。
一个节点被图上的最短路径经过的越多,这个节点的中介中心度就越大,表示这个节点越关键,相应地这个节点也越重要。
import networkx as nx
G = nx.Graph()
G.add_edges_from([[0, 1], [0, 2], [1, 2], [2,3], [3,4], [4,5], [3,5]])
nx.draw(G, with_labels=True)
图示例如下
计算node的between_centrality
nx.betweenness_centrality(G, k=None)
输入如下所示,可见只有node 2和3的中介值非0,因为只有2和3将两个子图关联起来,事实上这两个节点也是最重要的节点。
{0: 0.0, 1: 0.0, 2: 0.6000000000000001, 3: 0.6000000000000001, 4: 0.0, 5: 0.0}
4 NodeRAG示例
NodeRAG采用异构图方式解析和存储内容关联复杂的文档,提高了人类感觉文本检索增强效率。
NodeRAG采用k_core和between_centrality筛选图中重要节点,构建异构图,以下是示例代码。
import networkx as nx
import numpy as np
import math
import asyncio
import os
from sortedcontainers import SortedDict
from rich.console import Consolefrom ...storage import (Mapper,storage
)
from ..component import Attribute
from ...config import NodeConfig
from ...logging import info_timerclass NodeImportance:def __init__(self,graph:nx.Graph,console:Console):self.G = graphself.important_nodes = []self.console = consoledef K_core(self,k:int|None = None):if k is None:k = self.defult_k()self.k_subgraph = nx.core.k_core(self.G,k=k)for nodes in self.k_subgraph.nodes():if self.G[nodes]['type'] == 'entity' and self.G[nodes]['weight'] > 1:self.important_nodes.append(nodes)def avarege_degree(self):print(f"self.G.number_of_nodes(): {self.G.number_of_nodes()}")print(f"dict(self.G.degree()).values()): {dict(self.G.degree()).values()}")average_degree = sum(dict(self.G.degree()).values())/self.G.number_of_nodes()return average_degreedef defult_k(self):k = round(np.log(self.G.number_of_nodes())*self.avarege_degree()**(1/2))return kdef betweenness_centrality(self):self.betweenness = nx.betweenness_centrality(self.G,k=10)average_betweenness = sum(self.betweenness.values())/len(self.betweenness)scale = round(math.log10(len(self.betweenness)))for node in self.betweenness:if self.betweenness[node] > average_betweenness*scale:if self.G.nodes[node]['type'] == 'entity' and self.G.nodes[node]['weight'] > 1:self.important_nodes.append(node)def main(self):self.K_core()self.console.print('[bold green]K_core done[/bold green]')self.betweenness_centrality()self.console.print('[bold green]Betweenness done[/bold green]')self.important_nodes = list(set(self.important_nodes))return self.important_nodes
https://github.com/Terry-Xu-666/NodeRAG/blob/main/NodeRAG/build/pipeline/attribute_generation.py
---
networkx
https://networkx.org/
linux环境conda安装NodeRAG示例
https://blog.csdn.net/liliang199/article/details/151101894
结合源码分析NodeRAG图构建过程
https://blog.csdn.net/liliang199/article/details/151217412
networkx计算边的重要性:边介数或者中介中心性edge_betweenness
https://blog.csdn.net/weixin_39925939/article/details/121767972
python图算法库Networkx笔记 - Node and Centrality
https://zhuanlan.zhihu.com/p/145601101
NodeRAG
https://github.com/Terry-Xu-666/NodeRAG/tree/main
Networkx入门指南——图分析之k-core
https://www.jianshu.com/p/a1f7f8d6c6d0
图算法之k-Core
https://blog.csdn.net/ningyanggege/article/details/117708034