[Database System Concepts / Ch3] Introduction to SQL

3.1 Overview of the SQL Query Language

The original version of SQL, called Sequel, was developed by IBM in the early 1970s. It has evolved since then and changed its name to SQL (Structured Query Language).

The SQL language has several important parts

Data-definition language (DDL)
- The SQL DDL provides commands for defining relation schemas, deleting relations, modifying relation schemas.
Data-manipulation language (DML)
- The SQL DML provides the ability to query information from DB, insert tules into, delete tuples from, and modify tuples in the database.
Integrity
- The SQL DDL includes commands for specifying integrity constraints that the data in the database must satisfy.
- Updates that violate integrity constraints are disallowed.
View definition
- The SQL DDL includes commands for defining views.
Transaction control
- Commands for specifying the beginning and end-points of transactions.
Embedded SQL and Dynamic SQL
- How SQL statements can be embedded within general-purpose programming languages
Authorization
- The SQL DDL includes commands for specifying access rights to relations and views

3.2 SQL Data Definition

The set of relations in a database are specified using a data-definition language (DDL).

The SQL DDL not only allows specification of a set of relations, but also information about each relation, including

Schema for each relation
Types of values associated with each attribute
Integrity constraints
Set of indices to be maintained for each relation
Security and authorization information for each relation
Physical storage structure of each relation on disk

Basic Types

THe SQL standard supports built-in types, including

char(n)
- A fixed-length character string with length n.
varchar(n): A variable-length character string with user-specified maximum length n.
int
- An integer
smallint
- A small integer (machine-dependent subset of int)
numeric(p, d)
- A fixed-point number with p digits (plus a sign).
- d of p digits are to the right of the decimal points.
- Ex) numeric(3,1) allows 44.5 but does not allow 444.5 or 0.32
real, double precision
- Floating-point and double-precision floating-point numbers
float(n)
- A floating-point number with precision of at least n digits

Each type may include a special value called null value indicating an absent value.

When storing a char value with the less length than the specified length n then the spaces are appended to match the length.
When comparing two chars, the shorter one is appended with spaces to match the length.
When char and varchar are compared, extra spaces may be added to the varchar type to match the length. However, it may vary depending on the database system.
Hence, Always use varchar instead of char type to avoid these problems.

Basic Schema Definition

Define an SQL relation by CREATE TABLE command.

  
  CREATE TABLE department (
      dept_name VARCHAR(20),
      building VARCHAR(15),
      budget NUMERIC(12, 2),
      PRIMARY KEY (dept_name)
  );

The general form of the CREATE TABLE is

  
  CREATE TABLE r (
      A1 D1,
      A2 D2,
      ...,
      An Dn,
      <integrity-constraint 1>
      ...,
      <integrity-constraint k>
  );

r: Name of the relation
Each Ai is the name of an attribute in the schema of relation r
Di is the domain of attribute Ai (type of attribute) with optional constraints that restrict the domain of Ai.

Some of integrity constraints commands

PRIMARY KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$
- Attributes$(A_{J_1}, A_{j_2},…,A_{j_m})$ form the primary key for the relation.
- The primary-key attributes MUST be NON-NULL and UNIQUE.

FOREIGN KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$ REFERENCES $s$

Attributes $(A_{J_1}, A_{j_2},…,A_{j_m})$ for any tuple in the relation must correspond to values of the primary key attributes of some tuple in the relation $s$.

  
CREATE TABLE COURSE (
    course_id VARCHAR(7),
    title VARCHAR(50),
    dept_name VARCHAR(20),
    PRIMARY KEY (course_id),
    FOREIGN KEY (dept_name) REFERENCES department(dept_name)
);

CREATE TABLE department (
    dept_name VARCHAR(20),
    building VARCHAR(15),
    PRIMARY KEY (dept_name)
);

FOREIGN KEY (dept_name) REFERENCES department(dept_name) specifies that for each course tuple, dept_name must exist in the primary key attribute dept_name of department relation.
Some database systems allow foreign-key constrain definition without specifying referenced attributes explicitly: FOREIGN KEY (dept_name) REFERENCES department

NOT NULL
- NULL value is NOT allowed for the specified attributes.

A newly created relation is initially empty.

Data Manipulation Statements

Remove a relation from an SQL database
- DROP TABLE r;
  - Deletes not only all tuples of r but also the schema for r.
- DELETE FROM r;
  - Just deletes all tuples in r.
Alter an existing relation
- ALTER TABLE r ADD A D;
  - r: Name of existing relation
  - A: Name of the attribute to be added
  - D: Type of the added attribute
    1 ALTER TABLE course ADD course_tag VARCHAR(255);
- ALTER TABLE r DROP A;
  - Many database systems do not support dropping of attributes
  - A: Name of an attribute to be dropped.

3.3 Basic Structure of SQL Queries

The basic structure of an SQL query consists of three clauses

select
from
where

3.3.1 Queries on a Single Relation

SELECT

SELECT dept_name FROM instructor;
- SELECT DISTINCT dept_name FROM instructor; - remove duplicates
- SELECT dept_name FROM instructor; - allow duplicates
- The select may contain arithmetic expressions +, -, *, /
  - SELECT ID, name, dept_name, salary * 1.1 FROM instructor;

WHERE

SELECT name FROM instructor WHERE dept_name = 'Comp. Sci';

3.3.2 Queries on Multiple Relations

We often query against multiple relations.

  
SELECT A1, A2, ..., An
FROM r1, r2, ..., rm
where P;

SELECT: list the attributes desired in the result of a query
FROM: list of the relations to be accessed
- It defines a Cartesian product of the relations
- The result relation has all attributes from all the relationis in the FROM clause. So the resulted relation might be extremely big. So we usually restrict with WHERE clause.
- If the same attribute name appears in both r1 and r2, we prefix the name of the relation. (ex, instructor.ID, instructor.dept_name, etc)
- For those attributes that appear only one of the schemas, we can drop the relation-name prefix
WHERE: Predicate involving attributes of the relation in the FROM clause.

In general, the meaning of an SQL query can be understood as

Generate a Cartesian product of the relations listed in the FROM clause.
Apply the predicates specified in the WHERE clause
For each tuple in Step (2), output the attributes specified in the SELECT clause.

This explains “what” the result of an SQL query, not “how” it should be executed.

A real implementation optimizes evaluation by generating only elements that satisfy the WHERE clause (as far as possible)

3.4 Additional Basic Operations

Rename operation: `AS`

Renaming attributes

  
SELECT name as instructor_name, course_id
FROM instructor, teaches
WHERE instructor.ID = teaches.ID;

Renaming relations

  
SELECT T.name, S.course_id
FROM instructor as T, teaches as S
WHERE T.ID = S.ID;

An identifier such as T and S is referred to as a correlation name or table alias, correlation variable, or tuple variable

String operations

LIKE for pattern matching
%: matches any substring
_: matches any character

Pattern matching is case-sensitive

  
SELECT dept_name
FROM department
WHERE building LIKE '%Watson%';

Offers NOT LIKE for mismatches
\ for escape character
LIKE 'ab\%cd%' matches all strings beginning with “ab%cd”

Attribute Specification in the SELECT Clause

The asterisk * can be used in SELECT to denote all attributes.

  
SELECT instructor.*
FROM instructor, teaches
WHERE instructor.ID = teaches.ID;

SELECT * FROM ... indicates all attributes of the result relation are selected

Ordering the Display of Tuples

SQL offers ORDER BY clause to sort the result of a query.

  
SELECT name
FROM instructor
WHERE dept_name = 'Physics'
ORDER BY name;

By default, ORDER BY sorts in ascending order
To specify the order, use DESC or ASC

WHERE-clause Predicates

between: comparison operator for range

  
SELECT name
FROM instructor
WHERE salary between 90000 and 100000;

Similarly for NOT BETWEEN

Set Operations

SQL supports the following set operations.

UNION
INTERSECT
EXCEPT

Suppose we have two sets of courses.

  
SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;

  
SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018;

Union

Automatically eliminates duplicates. (use UNION ALL to retain duplicates)

  
(SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;)
UNION
(SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018);

INTERSECT

Automatically eliminates duplicates. (use INTERSECTION ALL to retain duplicates)

  
(SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;)
INTERSECT
(SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018);

EXCEPT

Automatically eliminates duplicates. (use EXCEPT ALL to retain duplicates)

To find all courses taught in Fall 2017 but NOT in Spring 2018 semester.

  
(SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;)
INTERSECT
(SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018);