当前位置：首页 > news >正文

C# 大文件分割

news 2025/11/1 1:36:21

需求：

在项目开发中，我们会遇到单个文件大小超过1TB的文件，这样的文件只能进行单文件读取，往往会造成读取完成耗时过长，导致客户在使用体验过程中不满意。

为了解决提升大文件的解析速度，我想到了先分割大文件为小文件，之后进行并行多个文件同时解析入库方案。

那么，怎么才可以把一个大文件分割为多个小文件呢?

如果我按照大小来控制分割出来的小文件，会造成文件的丢失问题，如果按照行数来分割，一行一行进行读取务必会造成分割文件耗时过长。

讨论：如果一个1TB的文件，我们按照大小来控制文件个数，假设每个分割出来的文件大小为200M，这样的话1TB分割出来约5200个文件，这样子的话最多造成约10000行信息被破坏，可以忽略不计。

所以我们为了减少分割文件带来的耗时时间长度，采取分割方案采用定长控制分割出来的文件大小。

实现方案1：

一次性读取1M，直到读取到200M为止，开始写入下一个分割文件。
using (FileStream readerStream = new FileStream(file, FileMode.Open, FileAccess.Read))
{
// 如果大于1GB
using (BinaryReader reader = new BinaryReader(readerStream))
{
int fileCursor = 0;
int readerCursor = 0;
char[] buffer = new char[1024 * 1024];
int length = 0;

NextFileBegin:
string filePath = string.Format(splitFileFormat, fileCursor);

Console.WriteLine("开始读取文件【{1}】：{0}", filePath, DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));
using (FileStream writerStream = new FileStream(filePath, FileMode.OpenOrCreate, FileAccess.Write))
{
using (BinaryWriter writer = new BinaryWriter(writerStream))
{
while ((length = reader.Read(buffer, 0, buffer.Length)) > 0)
{
readerCursor++;

writer.Write(buffer, 0, length);

if (readerCursor >= splitFileSize)
{
Console.WriteLine("结束读取文件【{1}】：{0}", filePath, DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));

readerCursor = 0;
fileCursor++;

goto NextFileBegin;
}
}
}
}
}
}

实现方案2：

一次性读取200M，立即写入分割文件，开始下一个分割文件操作。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Configuration;

namespace BigFileSplitTest
{
class Program
{
static void Main(string[] args)
{
/*
* 
<add key="BigFile.Split" value="true"/>

<add key="BigFile.SplitMinFileSize" value="10" />

<add key="BigFile.SplitFileSize" value="200"/>
* <add key="BigFile.FilePath" value="\\172.x1.xx.xx\文件拷贝\xx\FTP\xx\2016-04-07\x_20160407.txt"/>
<add key="BigFile.FileSilitPathFormate" value="\\172.x1.xx.xx\文件拷贝\liulong\FTP\xx\2016-04-07\x_20160407{0}.txt"/>
*/

string file = ConfigurationManager.AppSettings.Get("BigFile.FilePath");
string splitFileFormat = ConfigurationManager.AppSettings.Get("BigFile.FileSilitPathFormate");
int splitMinFileSize = Convert.ToInt32(ConfigurationManager.AppSettings.Get("BigFile.SplitMinFileSize")) * 1024 * 1024 * 1204;
int splitFileSize = Convert.ToInt32(ConfigurationManager.AppSettings.Get("BigFile.SplitFileSize")) * 1024 * 1024;

FileInfo fileInfo = new FileInfo(file);
if (fileInfo.Length > splitMinFileSize)
{
Console.WriteLine("判定结果：需要分隔文件！");
}
else
{
Console.WriteLine("判定结果：不需要分隔文件！");
Console.ReadKey();
return;
}

int steps = (int)(fileInfo.Length / splitFileSize);
using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
using (BinaryReader br = new BinaryReader(fs))
{
int couter = 1;
bool isReadingComplete = false;
while (!isReadingComplete)
{
string filePath = string.Format(splitFileFormat, couter);
Console.WriteLine("开始读取文件【{1}】：{0}", filePath, DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));

byte[] input = br.ReadBytes(splitFileSize);
using (FileStream writeFs = new FileStream(filePath, FileMode.Create))
{
using (BinaryWriter bw = new BinaryWriter(writeFs))
{
bw.Write(input);
}
}

isReadingComplete = (input.Length != splitFileSize);
if (!isReadingComplete)
{
couter += 1;
}
Console.WriteLine("完成读取文件【{1}】：{0}", filePath, DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss.fff"));
}
}
}

Console.WriteLine("分隔完成，请按下任意键结束操作。。。");
Console.ReadKey();

}
}
}

从实验结果发现：方案一的性能较方案二的性能约耗时10倍。

具体原因为什么？请思考下：

一次性读取1M，直到读取到200M为止，开始写入下一个分割文件。

一次性读取200M，立即写入分割文件，开始下一个分割文件操作。

参考：https://www.cnblogs.com/yy3b2007com/p/5558877.html

如果您喜欢此文章，请收藏、点赞、评论，谢谢，祝您快乐每一天。

查看全文

http://www.dtcms.com/a/206462.html