DeepSeek辅助编写在windows中利用mingw编写用到内存映射文件和expat功能的C程序
同样的XMLTOCSV C程序编译后在8G内存arm64机器上运行比16GB内存amd64 机器上还快,估计是又是被WSL的慢IO拖了后腿。所以想在原生的WIndows下测试一下。
1.内存映射文件的改写
为了防止意外,我在网上找了一个windows内存映射文件例子,在mingw中编译通过,再把它上传给DeepSeek,
#include <windows.h>
#include <stdio.h>int main() {
HANDLE hFile = CreateFile("test.txt", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
printf("创建文件失败!\n");
return -1;
}HANDLE hMapping = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, 1024, NULL);
if (hMapping == NULL) {
printf("创建文件映射失败!\n");
CloseHandle(hFile);
return -1;
}char* pData = (char*)MapViewOfFile(hMapping, FILE_MAP_WRITE | FILE_MAP_READ, 0, 0, 0);
if (pData == NULL) {
printf("映射视图失败!\n");
CloseHandle(hMapping);
CloseHandle(hFile);
return -1;
}// 使用 pData 操作内存映射区域
UnmapViewOfFile(pData);
CloseHandle(hMapping);
CloseHandle(hFile);
return 0;
}
提示词
参考附件例子,把我们的xml解析mmap版本改为windows函数调用,只返回完整代码,不做别的。
再把张泽鹏先生的to-csv.c程序也上传给他,
把附件mmap版本改为windows函数调用,只返回完整代码,不做别的。
因为我的程序还依赖expat库,所以先编译张先生的程序。
REM 将gcc目录加入PATH
set path=%path%;C:\d\mingw64\bin
gcc tocsvwin.c -o tocsvwin.exe -O3In file included from C:/d/mingw64/x86_64-w64-mingw32/include/windef.h:9,from C:/d/mingw64/x86_64-w64-mingw32/include/windows.h:69,from tocsvwin.c:1:
tocsvwin.c:12:9: error: expected identifier or '(' before 'int'12 | int min(int, int);| ^~~...
tocsvwin.c: In function 'read_text':
tocsvwin.c:257:25: error: empty character constant257 | } else if (value == '') {
结果有错,原因是Windows中已经有min和max库函数,重名了。
第二个错误有点傻,它把’>‘抄成了空字符’'。很容易改好了,编译通过,执行成功。
C:\d>timer64 tocsvwin lineitem/xl/worksheets/sheet1.xml A1:P1000000 out.csvKernel Time = 0.156 = 11%
User Time = 1.046 = 79%
Process Time = 1.203 = 91% Virtual Memory = 2 MB
Global Time = 1.310 = 100% Physical Memory = 321 MB
快得没有天理, 60万行16列的324MB xml文件,1秒半全写到了csv文件。
2.expat预编译库的使用
没有VC环境,从源代码编译expat库有点麻烦,看网上有没有现成编译好的,还真有。
我是从sourceforge下载的win64版本,github上也有, 两个是一模一样的。
把它解压缩到c:\d\expatwin目录,dll文件位于,而示例程序位于
所以编写如下命令行
C:\d\expatwin\Source\examples>gcc outline.c -I ../lib -L ../../bin -o outline.exe -lexpat
编译通过,它是解析从标准输入读入的xml,我用了重定向来执行,就列出了xml的内容
C:\d\expatwin\Bin>..\Source\examples\outline <c:\d\sheet.xml
worksheet xmlns='http://schemas.openxmlformats.org/spreadsheetml/2006/main' xmlns:r='http://schemas.openxmlformats.org/officeDocument/2006/relationships'sheetViewssheetView showGridLines='false' workbookViewId='0'c r='F23' s='24'c r='H23' s='30' t='s'vmergeCells count='3'mergeCell ref='D3:D7'mergeCell ref='F3:F7'mergeCell ref='H3:H7'
再来编译我的内存映射文件版本程序,又是一个好笑的抄写错误,
C:\d>gcc expatxmlwin.c -I expatwin\Source\lib -L expatwin\bin -o expatxmlwinm.exe -lexpat -O3
expatxmlwin.c: In function 'main':
expatxmlwin.c:259:67: error: 'ParseRange' has no member named 'start_pos'; did you mean 'start_row'?259 | long start_pos = binary_search_row(xml_data, file_size, range.start_pos);| ^~~~~~~~~| start_row
按照提示改了编译就通过了,执行成功
C:\d\expatwin\Bin>\d\timer64 \d\expatxmlwinm.exe /d/lineitem/xl/worksheets/sheet1.xml A2:P1000000
CSV宸蹭繚瀛樺埌 /d/lineitem/xl/worksheets/sheet1.csvKernel Time = 0.093 = 2%
User Time = 2.937 = 93%
Process Time = 3.031 = 96% Virtual Memory = 515 MB
Global Time = 3.140 = 100% Physical Memory = 639 MB
乱码是因为源代码是UTF8, 而Windows控制台字符集是cp936。用时差不多是1:2,和arm64平台一致。
而WSL2的比例,明显是IO不正常所致。
root@6ae32a5ffcde:/par# time ./expatmmap lineitem/xl/worksheets/sheet1.xml A2:H1000000
start_pos=1192
CSV已保存到 lineitem/xl/worksheets/sheet1.csvreal 0m12.528s
user 0m2.027s
sys 0m1.044s
root@6ae32a5ffcde:/par# gcc to-csv.c -o tocsv -O3root@6ae32a5ffcde:/par# time ./tocsv lineitem/xl/worksheets/sheet1.xml A1:H1000000 out.csvreal 0m10.735s
user 0m0.881s
sys 0m0.689s