SQL basic commands
这里写自定义目录标题
- Lesson 1
- Some of the most common data types are:
- Create table
- Insert
- Select
- Alter
- Update
- Delete
- Constraints
- Lesson 2: SQLite
- As
- Distinct
- Where
- Like
- Like II
- Is Null & Is not Null
- Between
- And
- Or
- Order by
- Limit
- Case
- Lesson 3
- Count
- Sum
- Max/Min
- Average
- Round
- Group By I
- Group By II
- Having
- Lesson 4
- Join
- Left Join
- Primary Key vs Foreign Key
- Cross Join
- Union
- With
Lesson 1
Line breaks don’t mean anything specific in SQL!!
Some of the most common data types are:
INTEGER, a positive or negative whole number
TEXT, a text string
DATE, the date formatted as YYYY-MM-DD
REAL, a decimal value
Create table
CREATE TABLE table_name (
column_1 data_type,
column_2 data_type,
column_3 data_type
);
Insert
INSERT INTO celebs (id, name, age)
VALUES (1, 'Justin Bieber', 29);
INSERT INTO is a clause that adds the specified row or rows.
celebs is the table the row is added to.
(id, name, age) is a parameter identifying the columns that data will be inserted into.
VALUES is a clause that indicates the data being inserted.
(1, ‘Justin Bieber’, 29) is a parameter identifying the values being inserted.
1: an integer that will be added to id column
‘Justin Bieber’: text that will be added to name column
29: an integer that will be added to age column
Select
SELECT name FROM celebs;
SELECT * FROM celebs;
Alter
ALTER TABLE celebs
ADD COLUMN twitter_handle TEXT;
- ALTER TABLE is a clause that lets you make the specified changes.
- celebs is the name of the table that is being changed.
- ADD COLUMN is a clause that lets you add a new column to a table:
twitter_handle is the name of the new column being added
TEXT is the data type for the new column
4. NULL is a special value in SQL that represents missing or unknown data. Here, the rows that existed before the column was added have NULL (∅) values for twitter_handle.
Update
UPDATE celebs
SET twitter_handle = '@taylorswift13'
WHERE id = 4;
- UPDATE is a clause that edits a row in the table.
- celebs is the name of the table.
- SET is a clause that indicates the column to edit.
twitter_handle is the name of the column that is going to be updated
@taylorswift13 is the new value that is going to be inserted into the twitter_handle column.
4. WHERE is a clause that indicates which row(s) to update with the new column value. Here the row with a 4 in the id column is the row that will have the twitter_handle updated to @taylorswift13.
Delete
DELETE FROM celebs
WHERE twitter_handle IS NULL;
Constraints
CREATE TABLE celebs (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
date_of_birth TEXT NOT NULL,
date_of_death TEXT DEFAULT 'Not Applicable'
);
-
PRIMARY KEY columns can be used to uniquely identify the row. Attempts to insert a row with an identical value to a row already in the table will result in a constraint violation which will not allow you to insert the new row.
-
UNIQUE columns have a different value for every row. This is similar to PRIMARY KEY except a table can have many different UNIQUE columns.
-
NOT NULL columns must have a value. Attempts to insert a row without a value for a NOT NULL column will result in a constraint violation and the new row will not be inserted.
-
DEFAULT columns take an additional argument that will be the assumed value for an inserted row if the new row does not specify a value for that column.
Lesson 2: SQLite
Good: In SQLite, a database is stored in a single file — a trait that distinguishes it from other database engines. This fact allows for a great deal of accessibility: copying a database is no more complicated than copying the file that stores the data, sharing a database can mean sending an email attachment.
Bad: SQLite’s signature portability unfortunately makes it a poor choice when many different users are updating the table at the same time (to maintain integrity of data, only one user can write to the file at a time). It also may require some more work to ensure the security of private data due to the same features that make SQLite accessible. Furthermore, SQLite does not offer the same exact functionality as many other database systems, limiting some advanced features other relational database systems offer. Lastly, SQLite does not validate data types. Where many other database software would reject data that does not conform to a table’s schema, SQLite allows users to store data of any type into any column.
As
SELECT name AS 'Titles'
FROM movies;
AS is a keyword in SQL that allows you to rename a column or table using an alias.
Notice:
Although it’s not always necessary, it is considered best practice to surround your aliases with single quotes.
Note that this practice is specific to SQLite, the RDBMS used in this exercise. When you work with other RDBMSs, notably PostgreSQL, no quotes or double quotes may be required in place of single quotes.
When using AS, the columns are not being renamed in the table. The aliases only appear in the result.
Distinct
DISTINCT
is used to return unique values in the output. It filters out all duplicate values in the specified column(s).
For instance,
SELECT tools
FROM inventory;
may produce:
By adding DISTINCT before the column name,
SELECT DISTINCT tools
FROM inventory;
the result would be:
Where
SELECT *
FROM movies
WHERE imdb_rating > 8;
-
The WHERE clause filters the result set to only include rows where the following condition is true.
-
imdb_rating > 8 is the condition. Here, only rows with a value greater than 8 in the imdb_rating column will be returned.
The > is an operator. Operators create a condition that can be evaluated as either true or false.
Comparison operators used with the WHERE clause are:
= equal to
!= not equal to
greater than
< less than
= greater than or equal to
<= less than or equal to
Like
How could we select all movies that start with ‘Se’ and end with ‘en’ and have exactly one character in the middle?
SELECT *
FROM movies
WHERE name LIKE 'Se_en';
The _ means you can substitute any individual character here without breaking the pattern. The names Seven and Se7en both match this pattern.
Like II
The percentage sign % is another wildcard character that can be used with LIKE.
This statement below filters the result set to only include movies with names that begin with the letter ‘A’:
SELECT *
FROM movies
WHERE name LIKE 'A%';
% is a wildcard character that matches zero or more missing characters in the pattern. For example:
A% matches all movies with names that begin with letter ‘A’
%a matches all movies that end with ‘a’
Is Null & Is not Null
To filter for all movies with an IMDb rating:
SELECT name
FROM movies
WHERE imdb_rating IS NOT NULL;
SELECT name
FROM movies
WHERE imdb_rating IS NULL;
Between
The BETWEEN
operator is used in a WHERE clause to filter the result set within a certain range. It accepts two values that are either numbers, text or dates.
For example, this statement filters the result set to only include movies with years from 1990 up to, and including 1999.
SELECT *
FROM movies
WHERE year BETWEEN 1990 AND 1999;
When the values are text, BETWEEN filters the result set for within the alphabetical range.
In this statement, BETWEEN filters the result set to only include movies with names that begin with the letter ‘A’ up to, but not including ones that begin with ‘J’.
SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'J';
And
Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more specific and useful.
One way of doing this is to use the AND operator. Here, we use the AND operator to only return 90’s romance movies.
SELECT *
FROM movies
WHERE year BETWEEN 1990 AND 1999
AND genre = 'romance';
Or
SELECT *
FROM movies
WHERE year > 2014
OR genre = 'action';
Order by
We can sort the results using ORDER BY, either alphabetically or numerically. Sorting the results often makes the data more useful and easier to analyze.
For example, if we want to sort everything by the movie’s title from A through Z:
SELECT *
FROM movies
ORDER BY name;
ORDER BY is a clause that indicates you want to sort the result set by a particular column.
name is the specified column.
SELECT *
FROM movies
WHERE imdb_rating > 8
ORDER BY year DESC;
DESC is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A).
ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z).
Limit
SELECT *
FROM movies
LIMIT 10;
LIMIT is a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster.
LIMIT always goes at the very end of the query. Also, it is not supported in all SQL databases.
Case
A CASE statement allows us to create different outputs (usually in the SELECT statement). It is SQL’s way of handling if-then logic.
Suppose we want to condense the ratings in movies to three levels:
- If the rating is above 8, then it is Fantastic.
- If the rating is above 6, then it is Poorly Received.
- Else, Avoid at All Costs.
SELECT name,
CASE
WHEN imdb_rating > 8 THEN 'Fantastic'
WHEN imdb_rating > 6 THEN 'Poorly Received'
ELSE 'Avoid at All Costs'
END
FROM movies;
- Each WHEN tests a condition and the following THEN gives us the string if the condition is true.
- The ELSE gives us the string if all the above conditions are false.
- The CASE statement must end with END.
In the result, you have to scroll right because the column name is very long. To shorten it, we can rename the column to ‘Review’ using AS:
SELECT name,
CASE
WHEN imdb_rating > 8 THEN 'Fantastic'
WHEN imdb_rating > 6 THEN 'Poorly Received'
ELSE 'Avoid at All Costs'
END AS 'Review'
FROM movies;
Lesson 3
COUNT(): count the number of rows
SUM(): the sum of the values in a column
MAX()/MIN(): the largest/smallest value
AVG(): the average of the values in a column
ROUND(): round the values in the column
Count
COUNT() is a function that takes the name of a column as an argument and counts the number of non-empty values in that column.
SELECT COUNT(*)
FROM table_name;
Sum
SUM() is a function that takes the name of a column as an argument and returns the sum of all the values in that column.
SELECT SUM(downloads)
FROM fake_apps;
Max/Min
SELECT MAX(downloads)
FROM fake_apps;
Average
SELECT AVG(downloads)
FROM fake_apps;
Round
ROUND() function takes two arguments inside the parenthesis:
- a column name
- an integer
It rounds the values in the column to the number of decimal places specified by the integer.
SELECT name, ROUND(price, 0)
FROM fake_apps;
SELECT ROUND(AVG(price),2)
FROM fake_apps;
Here, we pass the column price and integer 0 as arguments. SQL rounds the values in the column to 0 decimal places in the output.
Group By I
SELECT year,
AVG(imdb_rating)
FROM movies
GROUP BY year
ORDER BY year;
Group By II
SQL lets us use column reference(s) in our GROUP BY that will make our lives easier.
- 1 is the first column selected
- 2 is the second column selected
- 3 is the third column selected
and so on.
SELECT ROUND(imdb_rating),
COUNT(name)
FROM movies
GROUP BY 1
ORDER BY 1;
SELECT category,
price,
AVG(downloads)
FROM fake_apps
GROUP BY category, price;
Having
HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can be used with HAVING. HAVING is used for filtering groups.
SELECT year,
genre,
COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;
- When we want to limit the results of a query based on values of the individual rows, use WHERE.
- When we want to limit the results of a query based on an aggregate property, use HAVING.
- HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.
Lesson 4
Join
SELECT *
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id;
- The first line selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want.
- The second line specifies the first table that we want to look in, orders
- The third line uses JOIN to say that we want to combine information from orders with customers.
- The fourth line tells us how to combine the two tables. We want to match orders table’s customer_id column with customers table’s customer_id column.
Left Join
A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table. If unmatch, it will omit the unmatched row from the 2nd table.
SELECT *
FROM table1
LEFT JOIN table2
ON table1.c2 = table2.c2;
SELECT *
FROM newspaper
LEFT JOIN online
ON newspaper.id = online.id
WHERE online.id IS NULL;
Primary Key vs Foreign Key
we had three tables: orders, subscriptions, and customers.
Each of these tables has a column that uniquely identifies each row of that table:
- order_id for orders
- subscription_id for subscriptions
- customer_id for customers
These special columns are called primary keys.
Primary keys have a few requirements:
- None of the values can be NULL.
- Each value must be unique (i.e., you can’t have two customers with the same customer_id in the customers table).
- A table can not have more than one primary key column.
When the primary key for one table appears in a different table, it is called a foreign key.
Generally, the primary key will just be called id. Foreign keys will have more descriptive names.
Cross Join
Sometimes, we just want to combine all rows of one table with all rows of another table.
For instance, if we had a table of shirts and a table of pants, we might want to know all the possible combinations to create different outfits.
SELECT shirts.shirt_color,
pants.pants_color
FROM shirts
CROSS JOIN pants;
- The first two lines select the columns shirt_color and pants_color.
- The third line pulls data from the table shirts.
- The fourth line performs a CROSS JOIN with pants.
Notice that cross joins don’t require an ON statement.
Union
Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that.
SELECT *
FROM table1
UNION
SELECT *
FROM table2;
SQL has strict rules for appending data:
- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.
With
WITH previous_results AS (
SELECT ...
...
...
...
)
SELECT *
FROM previous_results
JOIN customers
ON _____ = _____;
- The WITH statement allows us to perform a separate query (such as aggregating customer’s subscriptions)
- previous_results is the alias that we will use to reference any columns from the query inside of the WITH clause
- We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table)
Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.