当前位置：首页 > wzjs >正文

自媒体短视频制作教程seo引擎

wzjs 2025/9/9 5:38:26

自媒体短视频制作教程,seo引擎,wordpress 博客同步,网站设计专家🍨 本文为🔗365天深度学习训练营中的学习记录博客🍖 原作者：K同学啊 Onhot编码 one-hot编码的基本思想是将每个类别映射到一个向量，其中只有一个元素的值为1，其余元素的值为0。这样，每个类别…

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊

Onhot编码

one-hot编码的基本思想是将每个类别映射到一个向量，其中只有一个元素的值为1，其余元素的值为0。这样，每个类别之间就是相互独立的，不存在顺序或距离关系。例如，对于三个类别的情况，可以使用如下的one-hot编码：
类别1：[1, 0, 0]
类别2：[0, 1, 0]
类别3：[0, 0, 1]
这样的表示方式有助于模型更好地理解文本含义。在深度学习中，神经网络的输入层通常使用one-hot编码来表示分类变量。这种编码方式不仅能够避免不必要的关系假设，还能够提供清晰的输入表示，有助于模型的学习和泛化。

例如：

John likes to watch movies. Mary likes too
John also likes to watch football games.
以上两句可以构造一个词典：
{“John”: 1, “likes”: 2, “to”: 3, “watch”: 4, “movies”: 5, “also”: 6, “football”: 7, “games”: 8, “Mary”: 9, “too”: 10}
one-hot可表示为：

John: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
likes: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
…等等，以此类推。
one-hot优点：
解决了分类器不好处理离散数据的问题，能够处理非连续型数值特征。
one-hot缺点：
在文本表征表示上有些缺点非常突出，首先one-hot 编码是一个词袋模型，是不考虑词和词之间的顺序问题，它是假设词和词之间是相互独立的，但是在大部分情况下词和词之间是相互影响的。
one-hot编码得到的特征是离散稀疏的，每个单词的one-hot编码维度是整个词汇表的大小，维度非常巨大，编码稀疏，会使得计算代价变大。

import torch
import torch.nn.functional as F# 示例文本
texts = ['Hello, how are you?', 'I am doing well, thank you!', 'Goodbye.']# 构建词汇表
word_index = {}
index_word = {}
for i, word in enumerate(set(" ".join(texts).split())):word_index[word] = iindex_word[i] = word# 将文本转化为整数序列
sequences = [[word_index[word] for word in text.split()] for text in texts]# 获取词汇表大小
vocab_size = len(word_index)# 将整数序列转化为one-hot编码
one_hot_results = torch.zeros(len(texts), vocab_size)
for i, seq in enumerate(sequences):one_hot_results[i, seq] = 1# 打印结果
print("词汇表:")
print(word_index)
print("\n文本:")
print(texts)
print("\n文本序列:")
print(sequences)
print("\nOne-Hot编码:")
print(one_hot_results)

词汇表:
{'doing': 0, 'you?': 1, 'am': 2, 'thank': 3, 'how': 4, 'are': 5, 'well,': 6, 'you!': 7, 'I': 8, 'Hello,': 9, 'Goodbye.': 10}文本:
['Hello, how are you?', 'I am doing well, thank you!', 'Goodbye.']文本序列:
[[9, 4, 5, 1], [8, 2, 0, 6, 3, 7], [10]]One-Hot编码:
tensor([[0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0.],[1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0.],[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

import torch
import torch.nn.functional as F# 示例中文文本
texts = ['你好，最近怎么样？', '我过得很好，谢谢！', 'K同学啊']# 构建词汇表
word_index = {}
index_word = {}
for i, word in enumerate(set("".join(texts))):word_index[word] = iindex_word[i] = word# 将文本转化为整数序列
sequences = [[word_index[word] for word in text] for text in texts]# 获取词汇表大小
vocab_size = len(word_index)# 将整数序列转化为one-hot编码
one_hot_results = torch.zeros(len(texts), vocab_size)
for i, seq in enumerate(sequences):one_hot_results[i, seq] = 1# 打印结果
print("词汇表:")
print(word_index)
print("\n文本:")
print(texts)
print("\n文本序列:")
print(sequences)
print("\nOne-Hot编码:")
print(one_hot_results)

词汇表:
{'好': 0, '很': 1, '啊': 2, '同': 3, '过': 4, '最': 5, '样': 6, '，': 7, '学': 8, '我': 9, 'K': 10, '谢': 11, '怎': 12, '？': 13, '近': 14, '！': 15, '你': 16, '得': 17, '么': 18}文本:
['你好，最近怎么样？', '我过得很好，谢谢！', 'K同学啊']文本序列:
[[16, 0, 7, 5, 14, 12, 18, 6, 13], [9, 4, 17, 1, 0, 7, 11, 11, 15], [10, 3, 8, 2]]One-Hot编码:
tensor([[1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0.,1.],[1., 1., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 1.,0.],[0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0.,0.]])

import torch
import torch.nn.functional as F
import jieba# 示例中文文本
texts = ['你好，最近怎么样？', '我过得很好，谢谢！', '再见。']# 使用结巴分词进行分词
tokenized_texts = [list(jieba.cut(text)) for text in texts]# 构建词汇表
word_index = {}
index_word = {}
for i, word in enumerate(set([word for text in tokenized_texts for word in text])):word_index[word] = iindex_word[i] = word# 将文本转化为整数序列
sequences = [[word_index[word] for word in text] for text in tokenized_texts]# 获取词汇表大小
vocab_size = len(word_index)# 将整数序列转化为one-hot编码
one_hot_results = torch.zeros(len(texts), vocab_size)
for i, seq in enumerate(sequences):one_hot_results[i, seq] = 1# 打印结果
print("词汇表:")
print(word_index)
print("\n文本:")
print(texts)
print("\n分词结果")
print(tokenized_texts)
print("\n文本序列:")
print(sequences)
print("\nOne-Hot编码:")
print(one_hot_results)

Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\11054\AppData\Local\Temp\jieba.cache
Loading model cost 0.619 seconds.
Prefix dict has been built successfully.词汇表:
{'？': 0, '再见': 1, '！': 2, '怎么样': 3, '好': 4, '得': 5, '很': 6, '。': 7, '你好': 8, '我过': 9, '，': 10, '最近': 11, '谢谢': 12}文本:
['你好，最近怎么样？', '我过得很好，谢谢！', '再见。']分词结果
[['你好', '，', '最近', '怎么样', '？'], ['我过', '得', '很', '好', '，', '谢谢', '！'], ['再见', '。']]文本序列:
[[8, 10, 11, 3, 0], [9, 5, 6, 4, 10, 12, 2], [1, 7]]One-Hot编码:
tensor([[1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0.],[0., 0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 1.],[0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])