Clickhouse官方文档学习笔记
文章目录
- What is ClickHouse?
- Data Replication and Integrity
- Approximate calculation
- Superior query performance
- Quick Start
What is ClickHouse?
ClickHouse® is a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP)
关键词: high-performance, column-based, OLAP
Data Replication and Integrity
ClickHouse uses an asynchronous multi-master replication scheme to ensure that data is stored redundantly on multiple nodes. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.
依然是sharding+replica的思路, 跟绝大多数cluster一样
Approximate calculation
ClickHouse provides ways to trade accuracy for performance. For example, some of its aggregate functions calculate the distinct value count, the median, and quantiles approximately. Also, queries can be run on a sample of the data to compute an approximate result quickly. Finally, aggregations can be run with a limited number of keys instead of for all keys. Depending on how skewed the distribution of the keys is, this can provide a reasonably accurate result that uses far fewer resources than an exact calculation.
近似/采样计算确实是个新鲜玩意
Superior query performance
ClickHouse is well known for having extremely fast query performance. To learn why ClickHouse is so fast, see the Why is ClickHouse fast? guide.
最大的卖点来了, 就是快。
Quick Start
略过
唯一值得注意的可能就是尽量使用呢bulk insert。数据可以先写到本地postgres或者s3然后一次性导入。
Done on 2025-06-22