3.1 Overview of the SQL Query Language
The original version of SQL, called Sequel, was developed by IBM in the early 1970s. It has evolved since then and changed its name to SQL (Structured Query Language).
The SQL language has several important parts
- Data-definition language (DDL)
- The SQL DDL provides commands for defining relation schemas, deleting relations, modifying relation schemas.
- Data-manipulation language (DML)
- The SQL DML provides the ability to query information from DB, insert tules into, delete tuples from, and modify tuples in the database.
- Integrity
- The SQL DDL includes commands for specifying integrity constraints that the data in the database must satisfy.
- Updates that violate integrity constraints are disallowed.
- View definition
- The SQL DDL includes commands for defining views.
- Transaction control
- Commands for specifying the beginning and end-points of transactions.
- Embedded SQL and Dynamic SQL
- How SQL statements can be embedded within general-purpose programming languages
- Authorization
- The SQL DDL includes commands for specifying access rights to relations and views
3.2 SQL Data Definition
The set of relations in a database are specified using a data-definition language (DDL).
The SQL DDL not only allows specification of a set of relations, but also information about each relation, including
- Schema for each relation
- Types of values associated with each attribute
- Integrity constraints
- Set of indices to be maintained for each relation
- Security and authorization information for each relation
- Physical storage structure of each relation on disk
Basic Types
THe SQL standard supports built-in types, including
char(n)
- A fixed-length character string with length
n
.
- A fixed-length character string with length
varchar(n)
: A variable-length character string with user-specified maximum lengthn
.int
- An integer
smallint
- A small integer (machine-dependent subset of int)
numeric(p, d)
- A fixed-point number with
p
digits (plus a sign). d
ofp
digits are to the right of the decimal points.- Ex)
numeric(3,1)
allows44.5
but does not allow444.5
or0.32
- A fixed-point number with
real, double precision
- Floating-point and double-precision floating-point numbers
float(n)
- A floating-point number with precision of at least
n
digits
- A floating-point number with precision of at least
Each type may include a special value called
null
value indicating an absent value.
When storing a
char
value with the less length than the specified lengthn
then the spaces are appended to match the length.When comparing two chars, the shorter one is appended with spaces to match the length.
When char and varchar are compared, extra spaces may be added to the varchar type to match the length. However, it may vary depending on the database system.
Hence, Always use
varchar
instead ofchar
type to avoid these problems.
Basic Schema Definition
- Define an SQL relation by
CREATE TABLE
command.1 2 3 4 5 6
CREATE TABLE department ( dept_name VARCHAR(20), building VARCHAR(15), budget NUMERIC(12, 2), PRIMARY KEY (dept_name) );
- The general form of the
CREATE TABLE
is1 2 3 4 5 6 7 8 9
CREATE TABLE r ( A1 D1, A2 D2, ..., An Dn, <integrity-constraint 1> ..., <integrity-constraint k> );
r
: Name of the relation- Each
Ai
is the name of an attribute in the schema of relationr
Di
is the domain of attribute Ai (type of attribute) with optional constraints that restrict the domain of Ai.
- Some of integrity constraints commands
- PRIMARY KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$
- Attributes$(A_{J_1}, A_{j_2},…,A_{j_m})$ form the primary key for the relation.
- The primary-key attributes MUST be NON-NULL and UNIQUE.
- FOREIGN KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$ REFERENCES $s$
- Attributes $(A_{J_1}, A_{j_2},…,A_{j_m})$ for any tuple in the relation must correspond to values of the primary key attributes of some tuple in the relation $s$.
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE TABLE COURSE ( course_id VARCHAR(7), title VARCHAR(50), dept_name VARCHAR(20), PRIMARY KEY (course_id), FOREIGN KEY (dept_name) REFERENCES department(dept_name) ); CREATE TABLE department ( dept_name VARCHAR(20), building VARCHAR(15), PRIMARY KEY (dept_name) );
FOREIGN KEY (dept_name) REFERENCES department(dept_name)
specifies that for each course tuple,dept_name
must exist in the primary key attributedept_name
of department relation.- Some database systems allow foreign-key constrain definition without specifying referenced attributes explicitly:
FOREIGN KEY (dept_name) REFERENCES department
- NOT NULL
- NULL value is NOT allowed for the specified attributes.
- PRIMARY KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$
A newly created relation is initially empty.
Data Manipulation Statements
- Remove a relation from an SQL database
DROP TABLE r;
- Deletes not only all tuples of
r
but also the schema forr
.
- Deletes not only all tuples of
DELETE FROM r;
- Just deletes all tuples in
r
.
- Just deletes all tuples in
- Alter an existing relation
ALTER TABLE r ADD A D;
r
: Name of existing relationA
: Name of the attribute to be addedD
: Type of the added attribute1
ALTER TABLE course ADD course_tag VARCHAR(255);
ALTER TABLE r DROP A;
- Many database systems do not support dropping of attributes
A
: Name of an attribute to be dropped.
3.3 Basic Structure of SQL Queries
The basic structure of an SQL query consists of three clauses
select
from
where
3.3.1 Queries on a Single Relation
SELECT
SELECT dept_name FROM instructor;
SELECT DISTINCT dept_name FROM instructor;
- remove duplicatesSELECT dept_name FROM instructor;
- allow duplicates- The select may contain arithmetic expressions
+, -, *, /
SELECT ID, name, dept_name, salary * 1.1 FROM instructor;
WHERE
SELECT name FROM instructor WHERE dept_name = 'Comp. Sci';
3.3.2 Queries on Multiple Relations
We often query against multiple relations.
1
2
3
SELECT A1, A2, ..., An
FROM r1, r2, ..., rm
where P;
SELECT
: list the attributes desired in the result of a queryFROM
: list of the relations to be accessed- It defines a Cartesian product of the relations
- The result relation has all attributes from all the relationis in the FROM clause. So the resulted relation might be extremely big. So we usually restrict with WHERE clause.
- If the same attribute name appears in both
r1
andr2
, we prefix the name of the relation. (ex,instructor.ID
,instructor.dept_name
, etc) - For those attributes that appear only one of the schemas, we can drop the relation-name prefix
WHERE
: Predicate involving attributes of the relation in the FROM clause.
In general, the meaning of an SQL query can be understood as
- Generate a Cartesian product of the relations listed in the
FROM
clause. - Apply the predicates specified in the
WHERE
clause - For each tuple in Step (2), output the attributes specified in the
SELECT
clause.
This explains “what” the result of an SQL query, not “how” it should be executed.
- A real implementation optimizes evaluation by generating only elements that satisfy the WHERE clause (as far as possible)
3.4 Additional Basic Operations
Rename operation: AS
- Renaming attributes
1 2 3
SELECT name as instructor_name, course_id FROM instructor, teaches WHERE instructor.ID = teaches.ID;
- Renaming relations
1 2 3
SELECT T.name, S.course_id FROM instructor as T, teaches as S WHERE T.ID = S.ID;
- An identifier such as
T
andS
is referred to as a correlation name or table alias, correlation variable, or tuple variable
String operations
LIKE
for pattern matching%
: matches any substring_
: matches any character- Pattern matching is case-sensitive
1 2 3
SELECT dept_name FROM department WHERE building LIKE '%Watson%';
- Offers
NOT LIKE
for mismatches \
for escape characterLIKE 'ab\%cd%'
matches all strings beginning with “ab%cd”
Attribute Specification in the SELECT Clause
The asterisk *
can be used in SELECT to denote all attributes.
1
2
3
SELECT instructor.*
FROM instructor, teaches
WHERE instructor.ID = teaches.ID;
SELECT * FROM ...
indicates all attributes of the result relation are selected
Ordering the Display of Tuples
SQL offers ORDER BY
clause to sort the result of a query.
1
2
3
4
SELECT name
FROM instructor
WHERE dept_name = 'Physics'
ORDER BY name;
- By default,
ORDER BY
sorts in ascending order - To specify the order, use
DESC
orASC
WHERE-clause Predicates
between
: comparison operator for range1 2 3
SELECT name FROM instructor WHERE salary between 90000 and 100000;
- Similarly for
NOT BETWEEN
- Similarly for
Set Operations
SQL supports the following set operations.
UNION
INTERSECT
EXCEPT
Suppose we have two sets of courses.
1
2
3
SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;
1
2
3
SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018;
Union
- Automatically eliminates duplicates. (use
UNION ALL
to retain duplicates)1 2 3 4 5 6 7
(SELECT course_id FROM section WHERE semester = 'Fall' and year = 2017;) UNION (SELECT course_id FROM section WHERE semester = 'Spring' and year = 2018);
- Automatically eliminates duplicates. (use
INTERSECT
- Automatically eliminates duplicates. (use
INTERSECTION ALL
to retain duplicates)1 2 3 4 5 6 7
(SELECT course_id FROM section WHERE semester = 'Fall' and year = 2017;) INTERSECT (SELECT course_id FROM section WHERE semester = 'Spring' and year = 2018);
- Automatically eliminates duplicates. (use
EXCEPT
- Automatically eliminates duplicates. (use
EXCEPT ALL
to retain duplicates) - To find all courses taught in Fall 2017 but NOT in Spring 2018 semester.
1 2 3 4 5 6 7
(SELECT course_id FROM section WHERE semester = 'Fall' and year = 2017;) INTERSECT (SELECT course_id FROM section WHERE semester = 'Spring' and year = 2018);
- Automatically eliminates duplicates. (use