Home [Database System Concepts / Ch3] Introduction to SQL
Post
Cancel

[Database System Concepts / Ch3] Introduction to SQL

3.1 Overview of the SQL Query Language

The original version of SQL, called Sequel, was developed by IBM in the early 1970s. It has evolved since then and changed its name to SQL (Structured Query Language).

The SQL language has several important parts

  1. Data-definition language (DDL)
    • The SQL DDL provides commands for defining relation schemas, deleting relations, modifying relation schemas.
  2. Data-manipulation language (DML)
    • The SQL DML provides the ability to query information from DB, insert tules into, delete tuples from, and modify tuples in the database.
  3. Integrity
    • The SQL DDL includes commands for specifying integrity constraints that the data in the database must satisfy.
    • Updates that violate integrity constraints are disallowed.
  4. View definition
    • The SQL DDL includes commands for defining views.
  5. Transaction control
    • Commands for specifying the beginning and end-points of transactions.
  6. Embedded SQL and Dynamic SQL
    • How SQL statements can be embedded within general-purpose programming languages
  7. Authorization
    • The SQL DDL includes commands for specifying access rights to relations and views

3.2 SQL Data Definition

The set of relations in a database are specified using a data-definition language (DDL).

The SQL DDL not only allows specification of a set of relations, but also information about each relation, including

  1. Schema for each relation
  2. Types of values associated with each attribute
  3. Integrity constraints
  4. Set of indices to be maintained for each relation
  5. Security and authorization information for each relation
  6. Physical storage structure of each relation on disk

Basic Types

THe SQL standard supports built-in types, including

  1. char(n)
    • A fixed-length character string with length n.
  2. varchar(n): A variable-length character string with user-specified maximum length n.
  3. int
    • An integer
  4. smallint
    • A small integer (machine-dependent subset of int)
  5. numeric(p, d)
    • A fixed-point number with p digits (plus a sign).
    • d of p digits are to the right of the decimal points.
    • Ex) numeric(3,1) allows 44.5 but does not allow 444.5 or 0.32
  6. real, double precision
    • Floating-point and double-precision floating-point numbers
  7. float(n)
    • A floating-point number with precision of at least n digits

Each type may include a special value called null value indicating an absent value.

When storing a char value with the less length than the specified length n then the spaces are appended to match the length.

When comparing two chars, the shorter one is appended with spaces to match the length.

When char and varchar are compared, extra spaces may be added to the varchar type to match the length. However, it may vary depending on the database system.

Hence, Always use varchar instead of char type to avoid these problems.

Basic Schema Definition

  • Define an SQL relation by CREATE TABLE command.
    1
    2
    3
    4
    5
    6
    
      CREATE TABLE department (
          dept_name VARCHAR(20),
          building VARCHAR(15),
          budget NUMERIC(12, 2),
          PRIMARY KEY (dept_name)
      );
    
  • The general form of the CREATE TABLE is
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
      CREATE TABLE r (
          A1 D1,
          A2 D2,
          ...,
          An Dn,
          <integrity-constraint 1>
          ...,
          <integrity-constraint k>
      );
    
    • r: Name of the relation
    • Each Ai is the name of an attribute in the schema of relation r
    • Di is the domain of attribute Ai (type of attribute) with optional constraints that restrict the domain of Ai.
  • Some of integrity constraints commands
    • PRIMARY KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$
      • Attributes$(A_{J_1}, A_{j_2},…,A_{j_m})$ form the primary key for the relation.
      • The primary-key attributes MUST be NON-NULL and UNIQUE.
    • FOREIGN KEY $(A_{J_1}, A_{j_2},…,A_{j_m})$ REFERENCES $s$
      • Attributes $(A_{J_1}, A_{j_2},…,A_{j_m})$ for any tuple in the relation must correspond to values of the primary key attributes of some tuple in the relation $s$.
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      
      CREATE TABLE COURSE (
          course_id VARCHAR(7),
          title VARCHAR(50),
          dept_name VARCHAR(20),
          PRIMARY KEY (course_id),
          FOREIGN KEY (dept_name) REFERENCES department(dept_name)
      );
      
      CREATE TABLE department (
          dept_name VARCHAR(20),
          building VARCHAR(15),
          PRIMARY KEY (dept_name)
      );
      
      • FOREIGN KEY (dept_name) REFERENCES department(dept_name) specifies that for each course tuple, dept_name must exist in the primary key attribute dept_name of department relation.
      • Some database systems allow foreign-key constrain definition without specifying referenced attributes explicitly: FOREIGN KEY (dept_name) REFERENCES department
    • NOT NULL
      • NULL value is NOT allowed for the specified attributes.

A newly created relation is initially empty.

Data Manipulation Statements

  • Remove a relation from an SQL database
    • DROP TABLE r;
      • Deletes not only all tuples of r but also the schema for r.
    • DELETE FROM r;
      • Just deletes all tuples in r.
  • Alter an existing relation
    • ALTER TABLE r ADD A D;
      • r: Name of existing relation
      • A: Name of the attribute to be added
      • D: Type of the added attribute
        1
        
        ALTER TABLE course ADD course_tag VARCHAR(255);
        
    • ALTER TABLE r DROP A;
      • Many database systems do not support dropping of attributes
      • A: Name of an attribute to be dropped.

3.3 Basic Structure of SQL Queries

The basic structure of an SQL query consists of three clauses

  1. select
  2. from
  3. where

3.3.1 Queries on a Single Relation

SELECT

  • SELECT dept_name FROM instructor;
    • SELECT DISTINCT dept_name FROM instructor; - remove duplicates
    • SELECT dept_name FROM instructor; - allow duplicates
    • The select may contain arithmetic expressions +, -, *, /
      • SELECT ID, name, dept_name, salary * 1.1 FROM instructor;

WHERE

  • SELECT name FROM instructor WHERE dept_name = 'Comp. Sci';

3.3.2 Queries on Multiple Relations

We often query against multiple relations.

1
2
3
SELECT A1, A2, ..., An
FROM r1, r2, ..., rm
where P;
  • SELECT: list the attributes desired in the result of a query
  • FROM: list of the relations to be accessed
    • It defines a Cartesian product of the relations
    • The result relation has all attributes from all the relationis in the FROM clause. So the resulted relation might be extremely big. So we usually restrict with WHERE clause.
    • If the same attribute name appears in both r1 and r2, we prefix the name of the relation. (ex, instructor.ID, instructor.dept_name, etc)
    • For those attributes that appear only one of the schemas, we can drop the relation-name prefix
  • WHERE: Predicate involving attributes of the relation in the FROM clause.

In general, the meaning of an SQL query can be understood as

  1. Generate a Cartesian product of the relations listed in the FROM clause.
  2. Apply the predicates specified in the WHERE clause
  3. For each tuple in Step (2), output the attributes specified in the SELECT clause.

This explains “what” the result of an SQL query, not “how” it should be executed.

  • A real implementation optimizes evaluation by generating only elements that satisfy the WHERE clause (as far as possible)

3.4 Additional Basic Operations

Rename operation: AS

  • Renaming attributes
    1
    2
    3
    
    SELECT name as instructor_name, course_id
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID;
    
  • Renaming relations
    1
    2
    3
    
    SELECT T.name, S.course_id
    FROM instructor as T, teaches as S
    WHERE T.ID = S.ID;
    
  • An identifier such as T and S is referred to as a correlation name or table alias, correlation variable, or tuple variable

String operations

  • LIKE for pattern matching
  • %: matches any substring
  • _: matches any character
  • Pattern matching is case-sensitive
    1
    2
    3
    
    SELECT dept_name
    FROM department
    WHERE building LIKE '%Watson%';
    
  • Offers NOT LIKE for mismatches
  • \ for escape character
  • LIKE 'ab\%cd%' matches all strings beginning with “ab%cd”

Attribute Specification in the SELECT Clause

The asterisk * can be used in SELECT to denote all attributes.

1
2
3
SELECT instructor.*
FROM instructor, teaches
WHERE instructor.ID = teaches.ID;

SELECT * FROM ... indicates all attributes of the result relation are selected

Ordering the Display of Tuples

SQL offers ORDER BY clause to sort the result of a query.

1
2
3
4
SELECT name
FROM instructor
WHERE dept_name = 'Physics'
ORDER BY name;
  • By default, ORDER BY sorts in ascending order
  • To specify the order, use DESC or ASC

WHERE-clause Predicates

  • between: comparison operator for range
    1
    2
    3
    
    SELECT name
    FROM instructor
    WHERE salary between 90000 and 100000;
    
    • Similarly for NOT BETWEEN

Set Operations

SQL supports the following set operations.

  • UNION
  • INTERSECT
  • EXCEPT

Suppose we have two sets of courses.

1
2
3
SELECT course_id
FROM section
WHERE semester = 'Fall' and year = 2017;
1
2
3
SELECT course_id
FROM section
WHERE semester = 'Spring' and year = 2018;
  • Union
    • Automatically eliminates duplicates. (use UNION ALL to retain duplicates)
      1
      2
      3
      4
      5
      6
      7
      
      (SELECT course_id
      FROM section
      WHERE semester = 'Fall' and year = 2017;)
      UNION
      (SELECT course_id
      FROM section
      WHERE semester = 'Spring' and year = 2018);
      
  • INTERSECT
    • Automatically eliminates duplicates. (use INTERSECTION ALL to retain duplicates)
      1
      2
      3
      4
      5
      6
      7
      
      (SELECT course_id
      FROM section
      WHERE semester = 'Fall' and year = 2017;)
      INTERSECT
      (SELECT course_id
      FROM section
      WHERE semester = 'Spring' and year = 2018);
      
  • EXCEPT
    • Automatically eliminates duplicates. (use EXCEPT ALL to retain duplicates)
    • To find all courses taught in Fall 2017 but NOT in Spring 2018 semester.
      1
      2
      3
      4
      5
      6
      7
      
      (SELECT course_id
      FROM section
      WHERE semester = 'Fall' and year = 2017;)
      INTERSECT
      (SELECT course_id
      FROM section
      WHERE semester = 'Spring' and year = 2018);
      
This post is licensed under CC BY 4.0 by the author.
Trending Tags