SQL进阶:从基础语法到实战技巧
1. SQL 简介
定义及作用
结构化查询语言(Structured Query Language)是一种专门用于管理关系型数据库系统的标准语言。它最早由IBM在1970年代开发,现已成为ANSI和ISO的标准。SQL不仅用于查询数据,还提供了完整的数据定义、操作和控制功能,是数据库管理员和开发人员的必备技能。
常见数据库管理系统
-
开源数据库:
- MySQL:最流行的开源关系数据库,被广泛应用于Web应用
- PostgreSQL:功能强大的开源对象关系数据库,支持JSON和地理空间数据
- MariaDB:MySQL的分支,保持高度兼容性
-
商业数据库:
- Oracle Database:企业级数据库解决方案,功能全面但价格昂贵
- Microsoft SQL Server:Windows平台首选,与.NET生态深度集成
- IBM DB2:适用于大型企业应用,支持多种数据模型
-
嵌入式数据库:
- SQLite:轻量级的零配置数据库,广泛用于移动应用和本地存储
-
云数据库:
- AWS RDS:亚马逊提供的托管关系数据库服务
- Google Cloud SQL:谷歌云平台上的全托管数据库服务
- Azure SQL Database:微软Azure的PaaS数据库解决方案
SQL 的分类
-
数据定义语言(DDL):用于定义和管理数据库对象
- CREATE:创建数据库对象
- ALTER:修改对象结构
- DROP:删除对象
- 示例:
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100));
-
数据操作语言(DML):用于数据操作
- SELECT:查询数据
- INSERT:添加数据
- UPDATE:修改数据
- DELETE:删除数据
- 示例:
UPDATE employees SET salary = 5000 WHERE id = 101;
-
数据控制语言(DCL):用于权限控制
- GRANT:授予权限
- REVOKE:撤销权限
- 示例:
GRANT SELECT ON employees TO user1;
-
事务控制语言(TCL):用于事务管理
- COMMIT:提交事务
- ROLLBACK:回滚事务
- SAVEPOINT:设置保存点
- 示例:
BEGIN TRANSACTION; ... COMMIT;
2. 数据定义语言(DDL)
CREATE 命令详解
创建数据库:
CREATE DATABASE company_db
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
参数说明:CHARACTER SET指定字符集,COLLATE指定排序规则
创建表:
CREATE TABLE employees (emp_id INT PRIMARY KEY AUTO_INCREMENT,name VARCHAR(50) NOT NULL,email VARCHAR(100) UNIQUE,hire_date DATE DEFAULT CURRENT_DATE,salary DECIMAL(10,2) CHECK (salary > 0),dept_id INT,CONSTRAINT fk_dept FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
约束类型:PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, DEFAULT
创建索引:
-- 普通索引
CREATE INDEX idx_name ON employees(name);-- 唯一索引
CREATE UNIQUE INDEX idx_email ON employees(email);-- 复合索引
CREATE INDEX idx_dept_salary ON employees(dept_id, salary DESC);
创建视图:
CREATE VIEW emp_dept_view AS
SELECT e.emp_id, e.name, d.dept_name, e.salary
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE e.salary > 5000;
ALTER 命令应用场景
添加列:
ALTER TABLE employees ADD COLUMN phone VARCHAR(20) AFTER email;
修改列属性:
ALTER TABLE employees MODIFY COLUMN salary DECIMAL(12,2) NOT NULL;
重命名表:
ALTER TABLE old_emp RENAME TO employees;
DROP 的注意事项
-- 删除表(带条件检查)
DROP TABLE IF EXISTS temp_employees;-- 级联删除(删除主表时自动删除依赖对象)
DROP TABLE departments CASCADE CONSTRAINTS;
TRUNCATE 与 DELETE 对比
特性 | TRUNCATE | DELETE |
---|---|---|
日志记录 | 不记录 | 记录每行删除 |
触发器 | 不触发 | 触发 |
自增ID | 重置 | 不重置 |
回滚 | 不可回滚 | 可回滚 |
性能 | 更快 | 较慢 |
条件删除 | 不支持 | 支持WHERE子句 |
3. 数据操作语言(DML)
SELECT 查询进阶
基础查询:
-- 选择特定列
SELECT emp_id, name, salary FROM employees;-- 使用列别名
SELECT emp_id AS "Employee ID", name "Full Name" FROM employees;
条件筛选:
-- 比较运算符
SELECT * FROM employees WHERE salary > 5000;-- 逻辑运算符
SELECT * FROM employees
WHERE dept_id = 10 AND salary BETWEEN 4000 AND 8000;-- NULL值处理
SELECT * FROM employees WHERE phone IS NOT NULL;
排序与分页:
-- 多重排序
SELECT * FROM employees
ORDER BY dept_id ASC, salary DESC;-- 分页查询
SELECT * FROM employees
LIMIT 10 OFFSET 20; -- 获取第3页,每页10条
INSERT 操作变体
批量插入:
INSERT INTO employees (name, email, dept_id) VALUES
('张三', 'zhang@example.com', 10),
('李四', 'li@example.com', 20),
('王五', 'wang@example.com', 10);
插入查询结果:
INSERT INTO manager_employees (emp_id, name, salary)
SELECT emp_id, name, salary FROM employees
WHERE dept_id = 10 AND salary > 7000;
冲突处理:
-- MySQL语法
INSERT INTO employees (emp_id, name)
VALUES (101, '张三')
ON DUPLICATE KEY UPDATE name = VALUES(name);-- PostgreSQL语法
INSERT INTO employees (emp_id, name)
VALUES (101, '张三')
ON CONFLICT (emp_id) DO UPDATE SET name = EXCLUDED.name;
UPDATE 高级用法
基于CASE的条件更新:
UPDATE employees
SET salary = CASEWHEN performance = 'A' THEN salary * 1.2WHEN performance = 'B' THEN salary * 1.1ELSE salary * 1.05
END
WHERE dept_id = 10;
多表更新:
-- MySQL语法
UPDATE employees e, departments d
SET e.salary = e.salary * 1.1
WHERE e.dept_id = d.dept_id AND d.location = '北京';-- 标准SQL语法
UPDATE employees
SET salary = salary * 1.1
WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = '北京');
DELETE 操作注意事项
安全删除实践:
- 先使用SELECT验证删除条件
- 使用事务确保可回滚
- 考虑外键约束的影响
BEGIN TRANSACTION;
-- 先查询确认
SELECT * FROM employees WHERE hire_date < '2000-01-01';-- 确认无误后执行删除
DELETE FROM employees WHERE hire_date < '2000-01-01';-- 如有问题可以回滚
-- ROLLBACK;
COMMIT;
4. 数据查询进阶
多表连接实战
INNER JOIN:
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
LEFT JOIN:
SELECT d.dept_name, COUNT(e.emp_id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name;
自连接(查询员工及其经理):
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
子查询应用
标量子查询:
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;
EXISTS子查询:
SELECT d.dept_name
FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id AND e.salary > 8000
);
聚合函数与GROUP BY
基础聚合:
SELECT dept_id,COUNT(*) AS emp_count,AVG(salary) AS avg_salary,MAX(salary) AS max_salary
FROM employees
GROUP BY dept_id;
HAVING筛选:
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 5000;
窗口函数详解
ROW_NUMBER():
SELECT name, salary,ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
DENSE_RANK():
SELECT dept_id, name, salary,DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;
累积计算:
SELECT name, hire_date, salary,SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM employees;
5. 数据控制语言(DCL)
权限管理实战
授予表权限:
-- 授予查询权限
GRANT SELECT ON employees TO user1;-- 授予特定列权限
GRANT SELECT (name, email) ON employees TO user2;-- 授予所有权限
GRANT ALL PRIVILEGES ON employees TO db_admin;
角色管理:
-- 创建角色
CREATE ROLE sales_team;-- 授予角色权限
GRANT SELECT, INSERT ON sales.* TO sales_team;-- 将角色授予用户
GRANT sales_team TO user3;
权限回收
-- 撤销特定权限
REVOKE INSERT ON employees FROM user1;-- 级联撤销
REVOKE GRANT OPTION FOR SELECT ON departments FROM user2 CASCADE;-- 撤销所有权限
REVOKE ALL PRIVILEGES ON employees FROM user3;
6. 事务控制语言(TCL)
事务管理
基本事务:
BEGIN TRANSACTION;
INSERT INTO orders (order_id, customer_id) VALUES (1001, 101);
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 200;
COMMIT;
保存点使用:
BEGIN;
INSERT INTO log_entries (message) VALUES ('Transaction started');
SAVEPOINT step1;
-- 执行一些操作
SAVEPOINT step2;
-- 如果出现问题
ROLLBACK TO SAVEPOINT step1;
-- 继续执行
COMMIT;
隔离级别
不同数据库支持的隔离级别:
- READ UNCOMMITTED:可能读取未提交数据
- READ COMMITTED(大多数数据库默认):只能读取已提交数据
- REPEATABLE READ:保证同一事务中多次读取结果一致
- SERIALIZABLE:最高隔离级别,完全串行执行
设置隔离级别:
-- MySQL
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;-- SQL Server
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
7. 高级 SQL 特性
索引优化指南
创建高效索引:
-- 覆盖索引
CREATE INDEX idx_emp_dept ON employees (dept_id, salary) INCLUDE (name);-- 函数索引(PostgreSQL)
CREATE INDEX idx_name_lower ON employees (LOWER(name));-- 部分索引
CREATE INDEX idx_high_salary ON employees (salary) WHERE salary > 8000;
索引使用原则:
- 为高频查询条件创建索引
- 遵循最左前缀原则设计复合索引
- 避免过度索引(每个索引增加写入开销)
- 定期分析索引使用情况
存储过程开发
CREATE PROCEDURE increase_salary(IN dept_id INT,IN rate DECIMAL(5,2),OUT affected_rows INT
)
BEGINDECLARE EXIT HANDLER FOR SQLEXCEPTIONBEGINROLLBACK;SET affected_rows = -1;END;START TRANSACTION;UPDATE employees SET salary = salary * (1 + rate/100)WHERE dept_id = dept_id;SET affected_rows = ROW_COUNT();COMMIT;
END;
触发器应用
CREATE TRIGGER update_employee_audit
AFTER UPDATE ON employees
FOR EACH ROW
BEGININSERT INTO employee_audit (emp_id, changed_at, old_salary, new_salary,changed_by)VALUES (NEW.emp_id,NOW(),OLD.salary,NEW.salary,CURRENT_USER());
END;
8. SQL 性能优化
查询优化技巧
避免全表扫描:
-- 不好的写法
SELECT * FROM employees WHERE YEAR(hire_date) = 2020;-- 优化写法
SELECT * FROM employees
WHERE hire_date BETWEEN '2020-01-01' AND '2020-12-31';
使用EXISTS替代IN:
-- 当子查询结果集大时
SELECT * FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id AND e.salary > 8000
);
执行计划分析
EXPLAIN ANALYZE
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE e.salary > 5000;
解读执行计划要点:
- 查找全表扫描(Seq Scan)操作
- 检查是否使用了正确的索引
- 注意预估行数与实际行数的差异
- 查看JOIN操作的成本
数据库规范化实践
规范化示例:
-
第一范式(1NF):确保每列都是原子的
- 错误示例:
tags
列存储"java,python,sql" - 正确做法:拆分为单独的关系表
- 错误示例:
-
第二范式(2NF):消除部分依赖
- 主键是复合键时,确保非主键列依赖整个主键
-
第三范式(3NF):消除传递依赖
- 非主键列之间不应有依赖关系
反规范化场景:
- 频繁查询的聚合数据
- 报表系统需要跨多表连接
- 读多写少的应用场景
9. 常见 SQL 问题与解决方案
SQL 注入防护
安全编码实践:
-
使用参数化查询:
// Java示例 String sql = "SELECT * FROM users WHERE username = ? AND password = ?"; PreparedStatement stmt = connection.prepareStatement(sql); stmt.setString(1, username); stmt.setString(2, password);
-
输入验证:
-- 使用正则表达式验证 SELECT * FROM products WHERE product_name REGEXP '^[a-zA-Z0-9 ]+$';
-
最小权限原则:
-- 为应用创建专用用户 CREATE USER 'app_user'@'%' IDENTIFIED BY 'secure_password'; GRANT SELECT, INSERT ON app_db.* TO 'app_user'@'%';
死锁处理策略
常见死锁场景:
- 事务1锁定A表等待B表,事务2锁定B表等待A表
- 同一事务中多个语句以不同顺序锁定资源
解决方案:
-- 设置死锁超时(SQL Server)
SET LOCK_TIMEOUT 5000; -- 5秒-- 重试机制
BEGIN TRYBEGIN TRANSACTION;-- 业务逻辑COMMIT;
END TRY
BEGIN CATCHIF ERROR_NUMBER() = 1205 -- 死锁错误码BEGINWAITFOR DELAY '00:00:00.1';-- 重试逻辑ENDELSEBEGINROLLBACK;-- 其他错误处理END
END CATCH
大数据量处理
分页优化:
-- 传统分页(大数据量时性能差)
SELECT * FROM large_table LIMIT 10 OFFSET 10000;-- 优化分页(使用索引列过滤)
SELECT * FROM large_table
WHERE id > (SELECT id FROM large_table ORDER BY id LIMIT 10000, 1)
ORDER BY id LIMIT 10;
批处理示例:
-- 批量插入
INSERT INTO target_table (col1, col2)
SELECT src_col1, src_col2 FROM source_table
WHERE batch_condition = true;-- 批量更新(每次处理1000条)
WHILE EXISTS (SELECT 1 FROM temp_table WHERE processed = 0) DOUPDATE main_table mJOIN (SELECT id, new_value FROM temp_table WHERE processed = 0 LIMIT 1000) t ON m.id = t.idSET m.value = t.new_value;UPDATE temp_table SET processed = 1 WHERE id IN (SELECT id FROM temp_table WHERE processed = 0 LIMIT 1000);
END WHILE;
10. 实际应用示例
电商系统查询
商品搜索:
-- 使用全文索引(MySQL)
SELECT product_id, name, MATCH(name, description) AGAINST('手机' IN NATURAL LANGUAGE MODE) AS relevance
FROM products
WHERE MATCH(name, description) AGAINST('手机' IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC
LIMIT 10;
销售分析:
-- 月度销售趋势
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,COUNT(DISTINCT order_id) AS order_count,SUM(amount) AS total_sales,SUM(amount) / COUNT(DISTINCT order_id) AS avg_order_value
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;
数据分析应用
用户留存分析:
WITH user_first_activity AS (SELECT user_id,MIN(activity_date) AS first_dateFROM user_activitiesGROUP BY user_id
),
retention_data AS (SELECTfirst_date,COUNT(DISTINCT a.user_id) AS cohort_size,COUNT(DISTINCT CASE WHEN DATEDIFF(a.activity_date, f.first_date) = 1 THEN a.user_id END) AS day1_retained,COUNT(DISTINCT CASE WHEN DATEDIFF(a.activity_date, f.first_date) = 7 THEN a.user_id END) AS day7_retainedFROM user_activities aJOIN user_first_activity f ON a.user_id = f.user_idGROUP BY first_date
)
SELECTfirst_date,cohort_size,ROUND(day1_retained * 100.0 / cohort_size, 2) AS day1_retention_rate,ROUND(day7_retained * 100.0 / cohort_size, 2) AS day7_retention_rate
FROM retention_data
ORDER BY first_date;
11. 总结与学习资源
核心要点回顾
- SQL基础:掌握SELECT, INSERT, UPDATE, DELETE等基本操作
- 性能优化:理解索引、执行计划、查询重写技巧
- 事务管理:ACID特性、隔离级别、锁机制
- 安全实践:防范SQL注入、权限最小化原则
- 高级特性:窗口函数、CTE、JSON处理等现代SQL功能
推荐学习路径
-
初学者:
- 书籍:《SQL必知必会》、《MySQL入门很简单》
- 在线练习:SQLZoo、LeetCode简单题
-
中级开发者:
- 书籍:《SQL进阶教程》、《高性能MySQL》
- 实践:设计复杂报表、优化慢查询
-
高级专家:
- 书籍:《SQL编程风格》、《数据库系统概念》
- 深入研究:执行计划优化、分布式SQL
实用资源
-
官方文档:
- MySQL 8.0 Reference Manual
- PostgreSQL Documentation
-
在线学习:
- Coursera: SQL for Data Science
- Udemy: The Complete SQL Bootcamp
-
社区支持:
- Stack Overflow的sql标签
- DBA Stack Exchange数据库管理问答