当前位置: 首页 > wzjs >正文

网站建设企业企业培训系统

网站建设企业,企业培训系统,网站做排名2015新年,自己做网站的视频最近博主被xml搞得很奔溃,因为在之前的小作业中使用ElementTree(元素树)解析xml文件非常的丝滑,但是当将xml文件改成大约2G的大文件时,他就罢工啦! xml的文件格式如下图所示: 之前的解析代码如下: def lo…

最近博主被xml搞得很奔溃,因为在之前的小作业中使用ElementTree(元素树)解析xml文件非常的丝滑,但是当将xml文件改成大约2G的大文件时,他就罢工啦!
xml的文件格式如下图所示:
在这里插入图片描述
之前的解析代码如下:

def load_xml(self, file):'''function: use the file in XML formatArgs:file (str): path to the xml fileReturns:         '''   # with open(file, 'r', encoding='utf-8') as f:#     xml = f.read()#load the xml file# root = ET.fromstring(xml) #root.tag is rowtree = ET.parse(file)root = tree.getroot()for child in root:docID = child.find('DOCNO').text #get docID# print(docID)content = child.find('TITLE').text +child.find('ARTIST').text +child.find('YEAR').text +child.find('LYRICS').text + child.find('GENRE').text #get content# print("content",content)self.docs_df.loc[docID] = content# use a dataframe to store each doc:docID,content(headline+text)  

之前的报错信息:

Traceback (most recent call last):File "/Users/xiaoxudemacbook/edinburgh/TTDS/cw/cw3/code/index.py", line 115, in <module>index = Index('data.xml', 'englishST.txt')File "/Users/xiaoxudemacbook/edinburgh/TTDS/cw/cw3/code/index.py", line 21, in __init__self.load_xml(xml_path)File "/Users/xiaoxudemacbook/edinburgh/TTDS/cw/cw3/code/index.py", line 34, in load_xmlroot = ET.fromstring(xml) #root.tag is documentFile "/Users/xiaoxudemacbook/opt/anaconda3/lib/python3.9/xml/etree/ElementTree.py", line 1344, in XMLparser.feed(text)
xml.etree.ElementTree.ParseError: out of memory: line 1, column 0

博主搜索过如何修复这个问题的方法,大部分人提到的是用lxml和iterparse,虽然它在google中是排列第一的解决方案,但是博主发现它已经是2014的,而且感觉修改起来比较复杂,博主还在CSDN中发了问答,然后突然想起我们的chatGPT小宝贝。它给出了非常清楚的4种解决方案,博主就贴在下面啦(ps:博主用第一种很简单的方法就解决了,但是根据菜鸟教程中Python XML解析中提供的3中方法,博主感觉好像SAX (simple API for XML )是针对大型xml文件的良方,有空的时候要回头看看,目前还是赶due要紧):

This error message suggests that the XML file being parsed by the load_xml() function is too large to fit in memory. When an XML file is parsed using the ET.fromstring() function, the entire contents of the file are loaded into memory as a string, which can be a problem for very large files.

To resolve this error, you can try one or more of the following solutions:

1. Use the ET.parse() function instead of ET.fromstring() to parse the XML file.

This function takes a file path as an argument, so the XML file is read and parsed incrementally, which can be more memory-efficient. Here’s an example of how you can modify the code:
makefile
Copy code

Replace this line:

root = ET.fromstring(xml)

With this line:

tree = ET.parse(xml_path)
root = tree.getroot()

2. Split the XML file into smaller files before parsing them

If the XML file contains a large number of documents, you can split it into smaller files that each contain a subset of the documents. This way, you can parse each file separately and avoid loading the entire XML file into memory at once.

3. Increase the amount of memory available to the Python process.

If you have enough memory available on your system, you can try increasing the amount of memory allocated to the Python process. One way to do this is by using a virtual environment and setting the PYTHONMALLOC environment variable to malloc. For example, you can try running the following command before running your Python script:

PYTHONMALLOC=malloc python index.py

This will tell Python to use the malloc memory allocator, which can increase the amount of memory available to the process.

4. Use a streaming XML parser.

If the XML file is too large to fit in memory even after trying the above solutions, you can use a streaming XML parser such as xml.sax or lxml.etree. These parsers allow you to process the XML file incrementally, without loading the entire file into memory at once. However, using a streaming parser can be more complex and require more code changes than the other solutions.

附:菜鸟教程Python XML解析:https://www.runoob.com/python/python-xml.html

http://www.dtcms.com/wzjs/81121.html

相关文章:

  • 可视化编辑网站开发高级seo招聘
  • 起点签约的书网站给做封面吗网络推广计划制定步骤
  • 新手用jsp做网站磁力多多
  • 网站建设中效果seo排名赚app最新版本
  • 做网站需要考虑seo吗北京网站设计公司
  • 什么网站可以做报名系统关键词排名优化江苏的团队
  • wordpress 做网课网站适合seo软件
  • 石家庄建站模板外贸seo站
  • 网站制作公司很好 乐云践新合肥网站排名
  • 北大青鸟网站建设课程建网站模板
  • 好看的做地图分析图的网站搜索优化引擎
  • 做外贸网站怎么样友情链接网自动收录
  • 重庆最火的网站正规电商培训学校排名
  • hbuilder网页设计代码seo顾问什么职位
  • 陕西省住房和城乡建设厅官网证件seo整站优化吧
  • 如何用html做网站企业培训机构
  • amazon虚拟机免费做网站网站如何注册
  • 南山电商网站建设南宁seo关键词排名
  • 网站优化简历模板软文发布的平台与板块
  • 做音乐网站需要什么百度seo最成功的优化
  • 鞍山网站制作招聘网长沙网站seo报价
  • 南京市建设工程档案馆网站合肥网站关键词排名
  • 公司网站管理制度百度官方app免费下载
  • 鄞州区卖场设计网站建设seo教程培训
  • 福建住房和城乡建设厅网站百度提交
  • 做app的网站有哪些功能吗大亚湾发布
  • 计算机网站开发论文文献引用陕西seo排名
  • 做微信商城设计网站网站推广优化是什么意思
  • 济南本地网站建设网络营销方案的制定
  • 好学校平台网站模板下载不了今天全国疫情最新消息