[Database System Concepts / Ch2] Introduction to the Relational Model

이 포스트는 Database System Concepts 7th Edition의 제 2장을 읽고 정리한 문서입니다.

2.1 Structure of Relational Databases
2.2 Database Schema
- Database Schema vs Database Instance
- Relation vs Relation Schema vs Relation Instance
2.3 Keys

2.1 Structure of Relational Databases

Relational Database는 여러개의 column을 가지고 있는 table의 집합으로 구성된다. 또한, 각 table은 여러개의 row로 구성된다.

Instructor Table

instructor_id	name	dept_name	salary
10101	Srinivasan	Comp. Sci.	65000
12121	Wu	Finance	90000
15151	Mozart	Music	400000
123123	Jason	Mechanics	909090

Course Table

course_id	title	dept_name	credits
BIO101	Intro. to Biology	Biology	4
COMP3234	Database	Comp. Sci.	4
CS101	Game Design	Comp. Sci.	3
EE181	Image Processing	Comp. Sci.	4

Prereq Table

course_id	prereq_id
BIO101	COMP3234
CS101	EE181
COMP3234	EE181

일반적으로 하나의 row는 정보들의 집합사이의 relationship을 나타낸다. Prereq table은 course_id를 가지고있는 과목을 들으려면 prereq_id를 가지고있는 과목을 들어야 한다는 일종의 “관계”를 나타내고 있는것이다. 따라서 이 경우, prereq table의 row는 두개의 과목들간의 관계를 나타내고 있는 것이다. Table은 결국 row의 집합이기 때문에 “table”이라는 개념과 수학적인 의미에서의 “relation”은 깊은 관련이 있다. N개의 값들사이의 관계는 수학적으로 n-tuple로 나타낼 수 있다. 즉, n개의 값을 가지고있는 tuple은 table에서의 row를 나타낸다.

Relational model에서 relation은 table을 의미하고 tuple은 row를 가리키며 attribute은 table의 column을 나타낸다.

relation = table
tuple = row
attribute = column

따라서 위에서의 instructor이란 “relation”은 4개의 “attribute” (ID, name, dept_name, salary)를 가지고있는 것이다.

Relation Instance

Relation instance는 어떠한 relation의 특정 instance를 가리킨다. 예를들어, 위의 course table은 course란 relation의 relation instance인 것이다.

Row의 순서는 상관 없다

Relation은 tuple의 집합(set)을 의미하기 때문에 어떤 순서로 row가 등장하는지는 relation의 속성에 영향을 주지 않는다.

Domain

Relation의 각 attribute이 가질 수 있는 범위를 domain이라고 부른다. 예를들어 instructor table에서 salary란 attribute의 domain은 자연수(ex. USD)일 것이다.

Atomicity of Domain (원자성)

모든 relation r에 대해서, r의 모든 attribute은 atomic 이여야 한다.

원자란 뜻을 가진 atom에서 유추할수 있듯이, atomicity는 더 이상 쪼개질수 없는 unit을 지칭한다.

따라서 domain이 atomic하다는 말은 domain에 속한 모든 원소가 더이상 쪼개질 수 없는 가장 최소한의 단위여야 한다는 말이다. 예를들어 instructor table이 휴대폰 번호의 리스트인 phone_numbers이란 attribute을 가진다고 해보자. 이 경우에 phone_number의 domain은 atomic하다고 볼 수 없다. 만약 단 한개의 핸드폰 번호를 나타내느 phone_number이란 attribute이 있다면 atomic하다고 말할 수 있다.

여기서 중요한 것은 domain 자체가 뭘 나타내는 것보다, 우리가 어떻게 domain을 바라보는지이다. 만약 핸드폰 번호가 항상 국가 코드와 지역 코드로 분리 될 수 있다면, 같은 휴대폰 번호라도 더 이상 atomic하지 않을 것이다.

Null value

Null value는 특정 값이 unknown이거나 존재하지 않는다는 것을 나타내는 특수한 값이다. 만약 위의 예시에서 instructor이 핸드폰 자체가 없다면 phone_number = NULL이 될 것이다.

2.2 Database Schema

Database Schema vs Database Instance

Database Schema: 데이터베이스의 logical design
Database Instance: 특정 시점에서 데이터베이스의 스냅샷

Relation vs Relation Schema vs Relation Instance

Relation: 프로그래밍 언어에서의 변수
Relation Schema: 변수 type
Relation Instance: 변수 값

프로그래밍 언어에서의 변수의 값은 항상 바뀌는것처럼 relation instance역시 항상 바뀐다 (ADD, DELETE, etc). 한편 변수의 type은 바뀌지 않듯이 (파이썬같은 dynamically-typed language를 제외하면) relation schema 역시 일반적으로 바뀌지 않는다.

위의 instructor relation에서 relation schema는 instructor (instructor_id, name, dept_name, salary)가 된다.

2.3 Keys

We need a way to uniquely identify each tuple in a relation.

Superkey

A set of one or more attributes that uniquely identify a tuple in the relation
Formally, let $R$ denote the set of attributes in the schema of relation $r$. If we say a subset $K$ of $R$ is a superkey for $r$, we’re restricting consideration to instances of relations $r$ in which no two distinct tuple have the same values on all attributes in $K$.
- That is, if $t_1$ and $t_2$ are in $r$ and $t_1 \neq t_2$, then $t_1 \cdot K \neq t_2 \cdot K$
A superkey may contain extraneous attributes. That means if $K$ is a superkey, then so is any superset of $K$.
- For example, if tuples in the relation Student could be uniquely identified by an attribute student_id. Then, any set of attributes containing the student id is also a superkey (ex. student_id and name).

Candidate Keys

We’re usually interested in superkeys for which no proper subset is a superkey. That means we’re interested in a “minimal” super key that does not contain any unhelpful and extraneous attributes.
It’s possible that several sets of attributes could serve as a candidate key.
- Suppose student_name and department_name are sufficient to uniquely identify a student. Then, both {student_id} and {student_name, department_name} are candidate keys.
- However, {student_id, student_name} is NOT a candidate key since student_id could solely identify a student uniquely.

Primary Key

On the other hand, a primary key is a candidate key that is chosen by the database designer as the principal means of identifying tuples within a relation.
primary key constraints
- The designation of a key represents a constraint in the real-world enterprise being modeled.
- Thus, primary keys are also referred to as primary key constraints.