Deploy StarRocks with Docker
官网文档:Deploy StarRocks with Docker | StarRocks
如果Downloading 不动,停止后再启动。
#启动:starrocks
docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 -itd --name quickstart starrocks/allin1-ubuntu#下载数据包
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csvcurl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv# MySql终端链接
docker exec -it quickstart mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
## --prompt 解释:是 MySQL 客户端的一个选项,用于自定义命令行提示符。
连接成功:
或使用Mysql工具连接 ROOT 密码为空
创建数据库、表
CREATE DATABASE IF NOT EXISTS quickstart;
USE quickstart;
报错:curl: (3) URL using bad/illegal format or missing URL
原因:PowerShell 中使用的是交互式输入方式(即每行手动输入),这种方式容易导致
curl
解析参数失败。特别是当你在终端中逐行输入命令时,PowerShell 的
curl.exe
会尝试立即执行它已经“看到”的内容,而不是等待整个命令拼接完成。
优化通过脚本通过Python写入
import requests
from requests.auth import HTTPBasicAuth
import os# 配置参数
STARROCKS_URL = "http://localhost:8030/api/quickstart/crashdata/_stream_load"
CSV_FILE_PATH = "./NYPD_Crash_Data.csv"HEADERS = {"label": "crashdata-0","column_separator": ",","skip_header": "1","enclose": '"',"max_filter_ratio": "1","columns": ("tmp_CRASH_DATE, tmp_CRASH_TIME, ""CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),""BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,""ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,""NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,""NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,""NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,""NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,""CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,""CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,""CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,""VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,""VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5"),"Expect": "100-continue"
}USER = "root"
PASSWORD = "" # 如果设置了密码,请填写(如 'your_password')def upload_to_starrocks():if not os.path.exists(CSV_FILE_PATH):print(f"❌ 文件 {CSV_FILE_PATH} 不存在")returnprint("⏳ 正在上传文件...")with open(CSV_FILE_PATH, "rb") as f:try:response = requests.put(STARROCKS_URL,auth=HTTPBasicAuth(USER, PASSWORD),headers=HEADERS,data=f,timeout=6000 # 设置最大等待时间)except requests.exceptions.Timeout:print("❌ 请求超时,请检查网络或 StarRocks 是否正常")returnexcept Exception as e:print(f"❌ 发生异常:{e}")returnprint("✅ 响应状态码:", response.status_code)try:print("📄 响应内容:\n", response.json())except Exception:print("📄 原始响应内容:\n", response.text)if __name__ == "__main__":upload_to_starrocks()
这个脚本试了很慢,采用文件上传至容器内的方式,导入成功
docker cp ../weather/output/isd_lite_2021_china_with_station_info.csv quickstart:/data/tmp
root@46b4bd1c3a6a:/data/tmp# curl --location-trusted -u root \
> -T ./isd_lite_2021_china_with_station_info.csv \
> -H "label:gz-weather-0" \
> -H "column_separator:," \
> -H "skip_header:1" \
> -H "enclose:\"" \
> -H "max_filter_ratio:1" \
> -H "columns:year,month,day,hour,temp,dew_point,slp,wind_dir,wind_speed,sky_cover,precip_1hr,precip_6hr,station_id,station_name,country,latitude,longitude,elevation,datetime" \
成功导入2000万条数据速度极快