当前位置: 首页 > news >正文

python 项目编号 2025821 有关于中英文数据的收集、处理

python专栏记录:



前言

批量读取单词 JSON 文件 → 解析出单词、释义、例句、短语 → 数据清洗(去掉特殊符号) → 同步更新到 MySQL 数据库。

内容

import json
import pymysql
import re
import time
from pymysql.converters import escape_string
coon = pymysql.connect(host='1.116.27.26', port=3306, user='root', passwd='LzyLyy', charset='utf8mb4', autocommit=True)
coon.select_db('rookie_word')
cur = coon.cursor()
def replaceFran(str):fr_en = [['é', 'e'], ['ê', 'e'], ['è', 'e'], ['ë', 'e'], ['à', 'a'], ['â', 'a'], ['ç', 'c'], ['î', 'i'], ['ï', 'i'],['ô', 'o'], ['ù', 'u'], ['û', 'u'], ['ü', 'u'], ['ÿ', 'y']]for i in fr_en:str = str.replace(i[0], i[1])return str
file = open('初高中词汇/GaoZhongluan_2.json','r',encoding='utf-8')
word_list = []
for line in file.readlines():words = line.strip()word_json = json.loads(words)#单词word = word_json['headWord']try:sentence = word_json['content']['word']['content']['sentence']['sentences']  # 例句sentence_list = []for se in sentence:enSe = se['sContent']enTrans = se['sCn']sen = str(enSe) + str(enTrans)sentence_list.append(sen)# 例句资源sentence_ressentence_res = ''if(len(sentence_list)>=3):for i in range(3):sentence_res += str(sentence_list[i]) + ' | ' if(i < 2) else str(sentence_list[i])else:for l in range(len(sentence_list)):sentence_res += str(sentence_list[l])+ ' | ' if(l == len(sentence_list)) else str(sentence_list[l])except:sentence_res = ''# try:#     usphone = '美[' + word_json['content']['word']['content']['usphone']+']'  # 美式音标#     usphone_er = usphone.find('(for')#     if (usphone_er != -1):#         usphone = ''# except:#     usphone = ''# try:#     ukphone = '英[' + word_json['content']['word']['content']['ukphone']+']'  # 英式音标#     ukphone_er = ukphone.find('(for')#     if (ukphone_er != -1):#         ukphone = ''# except:#     ukphone = ''# #音标pronounce# pronounce = (usphone + '|' + ukphone).replace('ˈ', '\'')   # .replace('\'', '-').replace('ˈ', '-')try:phrase = word_json['content']['word']['content']['phrase']['phrases']# 短语资源pharse_rephrase_re = ''if(len(phrase)>=4):for ph in range(4):ph_content = phrase[ph]['pContent']ph_cn = phrase[ph]['pCn']phrase_re += (str(ph_content) + ' ' + str(ph_cn))+' | ' if ph < 3 else (str(ph_content) + ' ' + str(ph_cn))else:for ph in range(len(phrase)):ph_content = phrase[ph]['pContent']ph_cn = phrase[ph]['pCn']phrase_re += (str(ph_content) + ' ' + str(ph_cn)) if ph < len(phrase) - 1 else (str(ph_content) + ' ' + str(ph_cn) + ' | ')except:phrase_re = ''trans = word_json['content']['word']['content']['trans']  # 释义# 释义资源trans_re = ''for tr in range(len(trans)):try:pos = trans[tr]['pos']except:pos = ''tranCn = trans[tr]['tranCn']try:tranOther = trans[tr]['tranOther']except:tranOther = ''trans_re += (str(pos) + ' ' + str(tranCn) + ' ' + str(tranOther)) if tr < len(trans) - 1 else (str(pos) + ' ' + str(tranCn) + ' ' + str(tranOther) + ' | ')#释义trans_re = replaceFran(trans_re)#例句sentence_res = replaceFran(sentence_res)#短语phrase_re = replaceFran(phrase_re)# print(word_json['wordRank'])# print('单词:'+word)# print('例句:' + sentence_res)# print('音标:' + str(pronounce))# print('短语:' + phrase_re)# print('释义:' + trans_re)sql = 'select * from word where word=%s'word_res = cur.execute(sql, word)if(word_res!=0):print(str(word) + '已存在')sql_l = 'select * from word_label where word=%s and label_id=%s'label_res = cur.execute(sql_l, [word, 18])if(label_res==0):sql_word_label = 'insert into word_label values(%s,%s)'cur.execute(sql_word_label, [word, 18])coon.commit()print(str(word + '已加入新标签'))time.sleep(0.5)else:print(str(word) + '标签已经满')else:sql_word = 'UPDATE word SET explain_word= %s,sentence= %s,other= %s WHERE word= %s'cur.execute(sql_word, [trans_re, sentence_res, phrase_re, word])# sql_word = 'insert into word(word, pronounce, explain_word, sentence, other, word_source) values(%s,%s,%s,%s,%s,%s)'# cur.execute(sql_word, [word, pronounce, trans_re, sentence_res, phrase_re, 0])coon.commit()# sql_label = 'insert into word_label values(%s,%s)'# cur.execute(sql_label, [word, 18])# coon.commit()time.sleep(0.1)print('单词:' + word + ' 已更新')

知识内容

数据库
文本清洗
json数据处理


总结

阿巴。。。今天又水了一天

需要json文件的联系我


致谢

靠咖啡续命的牛马,👍点赞 📁 关注 💬评论 💰打赏。


参考

[1] deepseek等ai


往期回顾

  • 无,新手上车
http://www.dtcms.com/a/343231.html

相关文章:

  • mac的m3芯片通过Homebrew安装git
  • ES_分词
  • 2025-08-21 Python进阶9——__main__与lambda
  • Harbor私有仓库实战配置
  • FLUX-Text模型完全配置指南:从环境搭建到故障排除
  • 用例完备性1:用例模板
  • 数据结构-HashMap
  • Kubernetes“城市规划”指南:告别资源拥堵与预算超支,打造高效云原生都市
  • Typora 快速使用入门:15分钟掌握高效写作
  • 锅炉铸造件三维扫描尺寸及形位公差检测技术方案-中科米堆CASAIM
  • ⸢ 啟 ⸥ ⤳ 为什么要开这个专栏?
  • Ubuntu Server 系统安装 Docker
  • uni-app:实现文本框的自动换行
  • SpringBoot + Vue实现批量导入导出功能的标准方案
  • k8sday13数据存储(1.5/2)
  • 基于Matlab多技术融合的红外图像增强方法研究
  • C++---滑动窗口平滑数据
  • 瑞派亚宠展专访 | 以数智化重塑就医体验,共筑宠物健康新生态
  • 区块链存证操作
  • echarts关系图(Vue3)节点背景图连线设置
  • 2025.7.19卡码刷题-回溯算法-组合
  • IOS购买订阅通知信息解析说明Java
  • 设计模式3-模板方法模式
  • 爬虫基础学习-项目实践:每次请求,跟换不同的user-agent
  • 茶饮业内卷破局,从人力管理入手
  • iOS 手势与控件事件冲突解决清单
  • 一本通1342:【例4-1】最短路径问题
  • 【Docker基础】Docker-Compose核心配置文件深度解析:从YAML语法到高级配置
  • 一个状态机如何启动/停止另一个状态机
  • C++ 常见的排序算法详解