当前位置：首页 > news >正文

【CSP】202403-1词频统计

news 2025/7/9 5:15:33

文章目录

算法思路
1. 数据结构选择
2. 输入处理
3. 统计出现的文章数
4. 输出结果

代码示例
代码优化

在这里插入图片描述
样例输入

4 3
5 1 2 3 2 1
1 1
3 2 2 2
2 3 2

样例输出

2 3
3 6
2 2

算法思路

1. 数据结构选择

vector<int>：用于存储每篇文章的单词列表（可能包含重复）。
unordered_set<int>：用于统计每篇文章中出现的不同单词（自动去重）。
两个统计数组：
- totalCount[i]：记录单词i在所有文章中的总出现次数。
- articleCount[i]：记录单词i出现在多少篇文章中。

2. 输入处理

读取文章数n和单词上限m：确定处理范围。
逐篇处理文章：
- 读取文章长度l。
- 读取l个单词，并存入words数组。
- 遍历words数组，累加每个单词的总出现次数到totalCount。

3. 统计出现的文章数

使用集合去重：
- 将words数组中的单词存入unordered_set，自动去除重复。
- 遍历集合中的每个单词，将其对应的articleCount加 1（每篇文章只统计一次）。

4. 输出结果

按单词编号1到m的顺序，输出每个单词的articleCount和totalCount。

代码示例

#include<iostream>
#include<vector>
#include<unordered_set>
using namespace std;

int main(){
	int n,m;//n篇文章，单词编号上限m 
	cin>>n>>m; 
	vector<int> totalCount(m+1,0);//单词i在文章中的总出现次数
	vector<int> articleCount(m+1,0);//单词i出现在多少篇文章中
	
	//遍历每一篇文章
	for(int i=0;i<n;i++){ 
		int l;//当前文章的单词数量 
		cin>>l; 
		
		//存储当前文章的所有单词
		vector<int> words(l); 
		for(int j=0;j<l;++j){
			cin>>words[j];//读取每个单词
			//更新总出现次数，每出现一次就加1
			totalCount[words[j]]++; 
		}
		
		//使用集合统计当前文章中出现的不同单词（自动去重）
		unordered_set<int> seen;
		for(int word:words){
			seen.insert(word);//插入集合自动去重 
		} 
		
		//遍历集合中的单词，统计出现的文章数
		for(int word:seen){
			articleCount[word]++;//每篇文章只算一次 
		} 
	}
	
	//输出结果：按单词编号1到m依次输出
	for(int i=1;i<=m;++i){
		cout<<articleCount[i]<<" "<<totalCount[i]<<endl;
	} 

	return 0;
}

代码优化

减少不必要的vector存储

原代码中使用vector<int> words(l)来存储每篇文章的所有单词，实际上可以直接在读取单词时进行统计，无需额外存储，这样可以减少内存使用。

减少集合的插入操作

在统计文章中出现的不同单词时，可以在读取单词时判断是否已经在集合中，避免不必要的插入操作

【代码示例】

#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;

int main() {
    int n, m;
    cin >> n >> m;

    vector<int> totalCount(m + 1, 0);   // 总出现次数（1-based）
    vector<int> articleCount(m + 1, 0); // 出现的文章数（1-based）

    for (int i = 0; i < n; ++i) {
        int l;
        cin >> l;  // 读取文章长度

        unordered_set<int> seen;
        for (int j = 0; j < l; ++j) {
            int word;
            cin >> word;
            totalCount[word]++;  // 累加总次数
            if (seen.find(word) == seen.end()) {
                seen.insert(word);
                articleCount[word]++;  // 如果是第一次出现，更新文章数
            }
        }
    }

    // 输出结果（1-based）
    for (int i = 1; i <= m; ++i) {
        cout << articleCount[i] << " " << totalCount[i] << endl;
    }

    return 0;
}