当前位置：首页 > news >正文

隐语开源隐私计算SecretFlow，实测性能提升10倍，纵向联邦SecureBoost算法（已开源)

news 2025/9/19 13:23:06

打开链接点亮社区Star，照亮技术的前进之路。每一个点赞，都是社区技术大佬前进的动力

Github 项目群仓库地址： https://github.com/secretflow

隐语近期开源了基于纵向联邦算法SecureBoost算法，并进行了高性能实现。

与秘密分享方案的SS-XGB相比，SecureBoost性能具有更好的表现，不过由于是非MPC算法，在安全方面低于SS-XGB。

隐语SecureBoost（下文简称：隐语SGB）利用了安全底座和多方联合计算的分布式架构, 极大提高了密态计算效率和灵活性。只需要通过简单配置, 隐语SGB即可切换同态加密协议, 例如Paillier和OU, 满足不同场景下的安全和计算效率的需求。

本文将介绍隐语SGB的具体测试环境、步骤和数据, 方便您了解协议的使用方法和性能数据, 从而更好地了解隐语 SGB, 满足您的业务需求。让我们一起来领略隐语SGB的魅力吧！

测试方法和步骤

一、测试机型

Python：3.8
pip: >= 19.3
OS: CentOS 7
CPU/Memory: 推荐最低配置是 8C16G
硬盘：500G

二、安装conda

使用conda管理python环境，如果机器没有conda需要先安装。


#sudo apt-get install wget
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh#安装
bash Miniconda3-latest-Linux-x86_64.sh# 一直按回车然后输入yes
please answer 'yes' or 'no':
>>> yes# 选择安装路径, 文件名前加点号表示隐藏文件
Miniconda3 will now be installed into this location:
>>> ~/.miniconda3# 添加配置信息到 ~/.bashrc文件
Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]
[no] >>> yes#运行配置信息文件或重启电脑
source ~/.bashrc#测试是否安装成功，有显示版本号表示安装成功
conda --version

三、安装secretflow

conda create -n sf-benchmark python=3.8conda activate sf-benchmarkpip install -U secretflow

四、数据要求

两方数据规模：

alice方：100万50维
bob方：100万50维

三方数据规模：

alice方：100万34维
bob方：100万33维
carol：100万33维

五、Benchmark脚本

import logging
import socket
import sys
import timeimport spu
from sklearn.metrics import mean_squared_error, roc_auc_scoreimport secretflow as sf
from secretflow.data import FedNdarray, PartitionWay
from secretflow.device.driver import reveal, wait
from secretflow.ml.boost.sgb_v import Sgb
from secretflow.utils.simulation.datasets import create_df
from secretflow.data.vertical import read_csv as v_read_csv# init log
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.info("test")_parties = {# you may change the addresses# 将alice、bob、carol的ip替换为实际ip'alice': {'address': '192.168.0.1:23041'},'bob': {'address': '192.168.0.2:23042'},'carol': {'address': '192.168.0.3:23043'},}def setup_sf(party, alice_ip, bob_ip, carol_ip):cluster_conf = {'parties': _parties,'self_party': party,}# init cluster_system_config = {'lineage_pinning_enabled': False}sf.init(address='local',num_cpus=8,log_to_driver=True,cluster_config=cluster_conf,exit_on_failure_cross_silo_sending=True,_system_config=_system_config,_memory=5 * 1024 * 1024 * 1024,cross_silo_messages_max_size_in_bytes = 2 * 1024 * 1024 * 1024 -1,object_store_memory=5 * 1024 * 1024 * 1024,)# SPU settingscluster_def = {'nodes': [{'party': 'alice', 'id': 'local:0', 'address': alice_ip},{'party': 'bob', 'id': 'local:1', 'address': bob_ip},{'party': 'carol', 'id': 'local:1', 'address': carol_ip},],'runtime_config': {# SEMI2K support 2/3 PC, ABY3 only support 3PC, CHEETAH only support 2PC.# pls pay attention to size of nodes above. nodes size need match to PC setting.'protocol': spu.spu_pb2.ABY3,'field': spu.spu_pb2.FM64,},}# HEU settingsheu_config = {'sk_keeper': {'party': 'alice'},'evaluators': [{'party': 'bob'},{'party': 'carol'}],'mode': 'PHEU',  # 这里修改同态加密相关配置'he_parameters': {'schema': 'paillier','key_pair': {'generate': {'bit_size': 2048,},},},'encoding': {'cleartext_type': 'DT_I32','encoder': "IntegerEncoder",'encoder_args': {"scale": 1},},}return cluster_def, heu_configclass SGB_benchmark:def __init__(self, cluster_def, heu_config):self.alice = sf.PYU('alice')self.bob = sf.PYU('bob')self.carol = sf.PYU('carol')self.heu = sf.HEU(heu_config, cluster_def['runtime_config']['field'])def run_sgb(self, test_name, v_data, label_data, y, logistic, subsample, colsample):sgb = Sgb(self.heu)start = time.time()params = {'num_boost_round': 5,'max_depth': 5,'sketch_eps': 0.08,'objective': 'logistic' if logistic else 'linear','reg_lambda': 0.3,'subsample': subsample,'colsample_by_tree': colsample,}model = sgb.train(params, v_data, label_data)#    reveal(model.weights[-1])print(f"{test_name} train time: {time.time() - start}")start = time.time()yhat = model.predict(v_data)yhat = reveal(yhat)print(f"{test_name} predict time: {time.time() - start}")if logistic:print(f"{test_name} auc: {roc_auc_score(y, yhat)}")
else:print(f"{test_name} mse: {mean_squared_error(y, yhat)}")fed_yhat = model.predict(v_data, self.alice)assert len(fed_yhat.partitions) == 1 and self.alice in fed_yhat.partitionsyhat = reveal(fed_yhat.partitions[self.alice])assert yhat.shape[0] == y.shape[0], f"{yhat.shape} == {y.shape}"if logistic:print(f"{test_name} auc: {roc_auc_score(y, yhat)}")
else:print(f"{test_name} mse: {mean_squared_error(y, yhat)}")def test_on_linear(self, sample_num, total_num):"""sample_num: int. this number * 10000 = sample number in dataset."""io_start = time.perf_counter()common_path = "/root/sf-benchmark/data/{}w_{}d_3pc/independent_linear.".format(sample_num, total_num)vdf = v_read_csv({self.alice: common_path + "1.csv", self.bob: common_path + "2.csv", self.carol: common_path + "3.csv"},keys='id',drop_keys='id',)# split y out of dataset,# <<< !!! >>> change 'y' if label column name is not y in dataset.label_data = vdf["y"]# v_data remains all features.v_data = vdf.drop(columns="y")# <<< !!! >>> change bob if y not belong to bob.y = reveal(label_data.partitions[self.alice].data)wait([p.data for p in v_data.partitions.values()])io_end = time.perf_counter()print("io takes time", io_end - io_start)self.run_sgb("independent_linear", v_data, label_data, y, True, 1, 1)def run_test(party):cluster_def, heu_config = setup_sf(party, _parties['alice'], _parties['bob'], _parties['carol'])test_suite = SGB_benchmark(cluster_def, heu_config)test_suite.test_on_linear(100, 100)sf.shutdown()if __name__ == '__main__':import argparseparser = argparse.ArgumentParser(prog='sgb benchmark remote')parser.add_argument('party')args = parser.parse_args()run_test(args.party)

将脚本下载到测试机上，可命名为sgb_benchmark.py，alice、bob、carol三方共用1个脚本。

2方SGB启动方式如下：

alice方：pythonsgb_benchmark.py alice
bob方：pythonsgb_benchmark.py bob

3方SGB启动方式如下：

alice方：pythonsgb_benchmark.py alice
bob方：pythonsgb_benchmark.py bob
carol方：pythonsgb_benchmark.py carol