226 - Database (SQL)
Lec5 : SQL
Structured Query Language (SQL) :
SQL = DDL + DML
DDL is Data Definition Language : it is used to define data structures.
For example, with SQL, it would be instructions such as create table
, alter table
, drop table
DML is Data Manipulation Language : it is used to manipulate data itself.
For example, with SQL, it would be instructions such as select
, insert
, update
, delete
ERD | Relational Model | RDBMS | Object |
---|---|---|---|
Entity or Entity Type | Relation | Table | Class |
Attribute | Attribute | Column | Property |
Entity instance | Tuple | Row | Object |
Schema:
– A container object
– Elements within the container
• Tables, constraints, views, stored procedures, domains, etc.
Catalog (or “database” in MS SQL, MySQL)
– Collection of schemas
CREATE DATABASE COMPANY;
– INFORMATION_SCHEMA: special schema per catalog
• Information on all schemas in catalog: element descriptions
• self-describing
MySQL: Database == Schema
- Base tables – Created through the CREATE TABLE statement – Rows: persistent data storage
- Virtual tables –Created through the CREATE VIEW statement ★ May or may not have persistent data storage
- Derived tables A derived table is obtained from one or more other tables as the result of a subquery. **–Functionally equivalent to nested query –Temporarily exists in FROM** of a query ★ Not a DB object
★ Some foreign keys may cause error due to ① circular references ② referencing to non-existent tables ===== > CREATE TABLE w/o FK and then ALTER TABLE later
Column Data Types
Data Type | Content |
---|---|
Numeric | INTEGER, INT, and SMALLINT |
Floating-point (real) | FLOAT or REAL, and DOUBLE PRECISION |
Character-string | CHAR(n), CHARACTER(n), VARCHAR(n), CHAR VARYING(n),CLOB |
Bit-string | BIT(n), BIT VARYING(n), BLOB |
Boolean | TRUE or FALSE or NULL |
DATE | YYYY-MM-DD |
TIME | HH:MM:SS |
Timestamp | DATE and TIME |
INTERVAL | Specifies a relative value, used to increment or decrement an absolute value of a data, time, or timestamp |
Domain (PostgreSQL) or Type (MS SQL, Oracle)
▻ Used as if it is a data type ▻ Improves schema readability: easier to change the data type for a domain that is used by numerous columns
CREATE DOMAIN SSN_TYPE AS CHAR(9);
CREATE TABLE (…, Ssn SSN_TYPE NOT NULL, …);
MySQL does not support this
Constraints and Defaults
● [NOT NULL | NULL] : missing means NULL; Column level only
● Default value :Column level only
DEFAULT
<value>
● CHECK clause: apply to individual row; Column level or table level
Dnumber INT NOT NULL CHECK(Dnumber>0 AND Dnumber<21);
CHECK(Dept_create_date<= Mgr_start_date);
**CHECK** clause within Domain
CREATE DOMAIN DNUM AS INTEGER CHECK(D_NUM>0 AND D_NUM<10);
● PRIMARY KEY clause: one or more columns (composite PK)
–Column level or table level; ★At most one PK per table
Dnumber INT PRIMARY KEY
● UNIQUE clause: alternate (secondary) keys
–Column level or table level; 0 or more per table
Dname VARCHAR(15) UNIQUE;
CREATE TABLE X (…, UNIQUE(colA, ColB), …);
Referential Integrity Constraints
● FOREIGN KEY clause
★ Value of FK col in referencing table == value of PK in referenced table
★ Value of FK col can be NULL if col is nullable(no referential check)
–Default operation: reject insert/update/delete on violation
–Optional: referential triggered action clause
★ ON DELETE and ON UPDATE
• Options include SET NULL, CASCADE, and SET DEFAULT
★ CASCADE option suitable for “relationship” tables
[CONSTRAINT [symbol]] FOREIGN KEY
[index_name] (col_name, ...)
REFERENCES tbl_name (col_name,...)
[ON DELETE reference_option]
[ON UPDATE reference_option]
reference_option:
RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT
Constraint name: useful for altering later
If the FOREIGN KEY
clause included a CONSTRAINT
name when you created the foreign key, you can refer to that name to drop the foreign key. Otherwise, the fk_symbol
value is generated internally when the foreign key is created.
SQL allows a table to have two or more rows that are identical in all their column values ★ No PK is OK • Unlike relational model
SELECT-FROM-WHERE
Retrieve some columns from rows satisfying a condition
SELECT <column list>
SELECT <column list>
FROM <table list>
[ WHERE <condition> ] ;
<column list>
: a list of columns (from <table list>) whose values are to be retrieved by the query, **i.e., projection<table list>
: a list of the tables required to process the query<condition>
: a conditional (Boolean) expression (selection condition**)
•May include logical comparison operators: =, <, <=, >, >=,and <>
–w/o WHERE: return all rows
Logical query processing steps:
★ FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER
★ **Actual query processing steps - query optimization
AS qualifier
Same column names may be used in different tables
☞ must qualify the column name with the table name
Column ambiguity can arise when queries that refer to the same table twice
☞ Alias is required
Use qualifier AS followed by desired new name
–Rename any column in the result of a query
–Logical rename for the duration of the query execution
Derived table
SELECT Y.c, Y.d
FROM (SELECT a AS c, b AS d FROM X) AS Y
WHERE ...
Unspecified WHERE Clause
Missing WHERE clause
–Indicates no condition on row selection
CROSS PRODUCT
–All possible row combinations among tables in FROM
SELECT ssn, dname
FROM company.employee, company.department;
Use of the Asterisk
SELECT * FROM …
–Retrieve all the column values of the selected rows
Table as Multi-set
★ duplicate rows can appear more than once
★ SQL does not automatically eliminate duplicate rows
SELECT DISTINCT: Only distinct rows remain
UNION, INTERSECT, EXCEPT
Set : no duplicates | Multi-set : may have dup |
---|---|
s1 UNION s2 | s1 UNION ALL s2 |
s1 INTERSECT s2 | s1 INTERSECT ALL s2 |
s1 EXCEPT s2 | s1 EXCEPT ALL s2 |
Substring Pattern Matching
● LIKE comparison operator
–String pattern matching
–%: arbitrary number of zero or more characters
–underscore (_) : a single character
–% and _ can be escaped : ‘AB\_CD\%EF’ ESCAPE ‘\’
select fname,lname
from employee
where address LIKE '%Houston, TX%';
/* Not Like */
where Address not LIKE '%Huston,TX%';
Arithmetic Operators, Between
● Standard arithmetic operators: +, -, *, /
Show the resulting salaries if every employee working on the ProductX project is given a 10% raise.
select fname,lname,1.1*salary as increased_sal
from employee,works_on,project
where ssn=essn and pno=pnumber and pname='ProductX';
● BETWEEN
Retrieve all employees in department 5 whose salary is between $30,000 and $40,000
select *
from employee
where dno=5 and (salary between 30000 and 40000);
/*OR*/
where salary >= 30000 AND salary <= 40000
Ordering of Query Results -- ORDER
- DESC: result set in descending order
- ASC: result set in ascending order (deafult)
ORDER BY D.Dname DESC, E.Lname ASC, E.Fname ASC
Retrieve a list of employees and the projects they are working on, ordered by department and, within each department, ordered alphabetically by last name, first name
select dname,lname,fname,pnamef
from department,employee,works_on,project
where dnumber=dno and ssn=essn and pno=pnumber
order by dname,lname,fname;
INSERT
Specify a table name and a list of values for the row
- w/o column names : the same order as CREATE TABLE
- w/ column name and Columns not specified in INSERT ➜ DEFAULT or NULL
INSERT INTO company.EMPLOYEE (fname, lname, Dno, Ssn)
VALUES('Richard', 'MArani',4,'43142');
Insertion rejected when violating any constraint
UPDATE
Modify column values of existing rows in a table
–SET: specifies columns to be modified and new values
- w/o WHERE: update all rows
- w/ WHERE: update only selected rows (nested query, etc)
UPDATE EMPLOYEE
SET Salary = Salary * 1.1
WHERE Dno = 5;
Update rejected when violating any constraint, unless “ON UPDATE …” option specified in FK.
Violation possible? | INSERT | UPDATE | DELETE |
---|---|---|---|
Domain constraint | Y | Y | N |
Key constraint | Y | Y | N |
Entity integrity constraint | Y | Y | N |
Referential integrity constraint | Y | Y | Y |
Lec 6 : MORE SQL
NULL
NULL: unknown, not applicable
Three-values logic:
Check if a column value is NULL
–ColName IS NULL
–ColName IS NOT NULL
– NOT (ColName IS NULL)
–Using =, <>, !=with NULL is not correct
–Not correct: ColName IS 'DAVID'
Retrieve the names of all employees who do not have a supervisor
select fname,lname
from employee
where superssn is null;
Nested queries
(or sub-queries)
–Inner query: a separate select-from-where block within WHERE clause of another (outer) query
★ May reference outer query’s column(s) (correlated nested query)
–Outer query
★ Can not reference inner query’s column
–Glue between inner query and outer query: IN, EXISTS
• Can be used in SELECT, UPDATE, DELETE
–Can be nested into multiple levels
IN
SELECT … WHERE v IN (Q)
Compares expression v with the result set of Q
–Evaluates to TRUE if v is a member in the result set of Q
–Q can be a complete SELECT statement
SELECT ...
FROM ...
WHERE Pho IN (SELECT proj_no FROM...);
–Q can be explicit set of values
SELECT … FROM … WHERE Name IN('Bob', 'David');
UPDATE T SET a = a + 10 WHERE b IN(Q)
DELETE T WHERE b IN(Q)
★ C IN (v1, v2) is equivalent to (C = v1 OR C = v2)
★ C NOT IN (v1, v2)is equivalent to (C != v1 AND C != v2)
SELECT … WHERE v = (Q)
–Returns TRUE if v is equal to the single value from Q
SELECT … WHERE v = SOME (Q)
–Returns TRUE if v equals to some value in the result set Q
–SOME is equivalent to ANY
Retrieve the name of each employee who has a dependent with the same first name and same sex as the employee
select e.fname,e.lname
from employee as e
where e.ssn in (select essn
from dependent
where e.fname=dependent_name and e.sex=sex);
★ Def: Inner query references columns from outer query
★ Logically evaluated once for each row in the outer query
EXISTS
SELECT … WHERE EXISTS(Q)
–Check if the result of query Q is empty or not
★ Q returns at least one row ➜ TRUE
★ Empty result set from Q ➜ FALSE
★ Inner query usually references outer query’s columns
• Else there is no correlation between inner and outer queries
SELECT … WHERE NOT EXISTS(Q)
UPDATE T SET a = a + 10 WHERE EXISTS(Q)
DELETE T WHERE EXISTS(Q)
List the names of managers who have at least one dependent.
select fname,lname
from employee
where exists (select * from department where ssn=mgrssn) and
exists (select * from dependent where ssn=essn);
Retrieve the names of each employee who works in all projects controlled by department 5
select fname,lname
from employee
where NOT exists ((select Pnumber from PROJECT where Dnum=5)
EXCEPT (select Pno from WORKS_ON where Ssn = Essn));
JOIN
Join: Treat > 1 tables in SELECT as a single joined table
★ Inner Join: find “matching” rows
SELECT …
FROM T1 [INNER] JOIN T2 ON <join condition>
WHERE …
NATURAL JOIN on two tables R and S : same column combine together
★ No join condition
–Implicit EQUIJOIN condition for each pair of columns with same name from R and S
• Data type of those columns should match
–No duplicated column names in result set
INNER JOIN v.s. NATURAL JOIN
TableA TableB
Column1 Column2 Column1 Column3
1 2 1 3
The INNER JOIN of TableA and TableB on Column1 will return
a.column1 a.column2 b.column1 b.column3
1 2 1 3
SELECT * FROM TableA INNER JOIN TableB USING (Column1)
SELECT * FROM TableA INNER JOIN TableB ON TableA.Column1 = TableB.Column1
The NATURAL JOIN of TableA and TableB on Column1 will return:
column1 column2 column3
1 2 3
SELECT * FROM TableA NATURAL JOIN TableB
CROSS Join, Multiway join
•CROSS JOIN
★ No join condition
•Multiway join
SELECT …
FROM T1 JOIN T2 ON T1.a = T2.b
JOIN T3 ON T1.c = T3.d
Outer Joins
SELECT …
FROM R { LEFT | RIGHT | FULL } [OUTER] JOIN S ON <condition>
WHERE ...
R LEFT [OUTER] JOIN S
• Return all rows from left table R
• Rows from R may be padded with NULL (columns of S)
R RIGHT [OUTER] JOIN S
• Return all rows from right table S
• Rows from S may be padded with NULL (columns of R)
R FULL [OUTER] JOIN S
•Return all rows from both left R and right table S
•Rows from R and S may be padded with NULL
INNER JOIN ↔ nested query
SELECT ... FROM X INNER JOIN Y ON X.a = Y.b WHERE ...
SELECT ... FROM X WHERE X.a IN (SELECT ... FROM Y WHERE ...)
SELECT ... FROM X WHERE EXIST (SELECT ... FROM Y WHERE X.a = Y.b ...)
Left join ↔ right join
SELECT … FROM T1 LEFT JOIN T2 ON …
SELECT … FROM T2 RIGHT JOIN T1 ON …
Filter in join condition (ON clause) or in WHERE clause?
–INNER JOIN: no difference
–OUTER JOIN:
•filter for right table in LEFT JOIN, or left table in RIGHT JOIN: join condition
–In WHERE would cause issue with NULL value
•filter for left table in LEFT JOIN, or right table in RIGHT JOIN: no difference
Aggregate Functions
★ Summarize info from multiple rows into a single-row value
•Built-in aggregate functions: COUNT,SUM, MAX, MIN, AVG
•used in the SELECT clause or in a HAVING clause
–Ex. SELECT SUM(expr) FROM …
•Discard NULL values from expr(e.g., a particular column)
–COUNT(col) returns # of non-NULL col
–COUNT(DISTINCT col) returns # of unique non-NULL col
★ COUNT(*) returns # of rows including NULL
When empty table, or no row matches WHERE condition
★ COUNT(*), COUNT(col): return 0
★ Other aggregation functions: return NULL
GROUP BY Clause
SELECT … FROM … [WHERE …] GROUP BY <grouping cols> …
• **Partition relation into subsets of rows
–Based on grouping column(s)**
–Provide summary info for each group
•GROUP BY clause
–Specifies grouping columns (>1 is OK)
–Grouping columns may (not must) appear in SELECT clause
•SELECT column list (except columns in aggregation function) is a subset (including empty set) of GROUP BY columns
★ non-grouping columns cannot appear in SELECT clause, unless in aggregation functions
★ SELECT column list: summary information on per group basis
• If NULLs exist in grouping column
–Separate group created for all rows with a NULL value in grouping column
For each project, retrieve the project number, the project name, --and the number of employees from department 5 who work on the project.
select pnumber,pname,count(*)
from project,works_on,employee
where pnumber=pno and ssn=essn and dno=5
group by pnumber,pname;
HAVING Clause
SELECT … FROM … [WHERE …] GROUP BY <grouping cols> … HAVING ...
★ Filter groups (based on group summary info): usually aggregation function
★ vs WHERE which filters rows
For each project on which more than two employees work, retrieve the project number, the project name, and the number of employees who work on the project.
select pnumber,pname,count(*)
from project,works_on
where pnumber=pno
group by pnumber,pname
having count(*)>2;
For each department that has more than five employees, retrieve the department number and the number of its employees who are making more than $40,000.
select dnumber,count(*)
from department,employee
where dnumber=dno and salary>40000 and
dno in (select dno from employee group by dno having count(*)>5)
group by dnumber;
MySQL ISSUE
- CREATE SCHEMA is a synonym for CREATE DATABASE
- EXCEPT not supported
- Work around: NOT IN, LEFT JOIN/IS NULL
- (SELECT a FROM X) EXCEPT (SELECT b FROM Y)
- SELECT a FROM X WHERE a NOT IN (SELECT b FROM Y)
- SELECT a FROM X LEFT JOIN Y ON X.a= Y.bWHERE Y.b IS NULL
- INTERSECT not supported
- Work around: IN, DISTINCT/JOIN
- ★ Faulty implementation (avoid these)
- In GROUP BY, column not in GROUP BY allowed in SELECT
- SELECT a, b, COUNT(*) FROM … GROUP BY a
- Renamed column from SELECT allowed in HAVING
- SELECT a, COUNT(…) AS CNT FROM … GROUP BY … HAVING CNT > 0
- In GROUP BY, column not in GROUP BY allowed in SELECT
Schema Change Statements
DROP
–Used to drop named DB obj, such as tables, views, or constraint
DROP DATABASE [IF EXISTS] dbName;
DROP TABLE [IF EXISTS] tableName;
Alter
- Adding a column
ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);
- Dropping a column
ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address;
- Changing a column definition
ALTER TABLE COMPANY.DEPARTMENT
ALTER COLUMN Mgr_ssn SET DEFAULT ’333445555’;
- Adding or dropping table constraints
ALTER TABLE COMPANY.EMPLOYEE
ADD CONSTRAINT FK_DnoFOREIGN KEY(Dno)
REFERENCES Department(Dnumber);
Views (Virtual Tables)
– Single table derived from other tables (base tables, views)
–DB object
–Under a schema (container): [schenaName.]viewName
★Virtual table: Not necessarily has persistent data in physical form
Why view?
–To restrict the use of particular columns and/or rows of tables
–To hide the details of complicated queries
–To provide a backward compatible interface
–To restrict the updated/inserted value (WITH CHECK Option)
★ View always up-to-date
–No-persistent data, persistent data
–Responsibility of the DBMS and not the user
•Types
– Regular view: w/o persistent data
Views are virtual only and run the query definition each time they are accessed.
★ Materialized view: w/ persistent data
Materialized views are disk based and are updated periodically based upon the query definition.
Lec 7 : ER-to-Relational Mapping
Step 1: Regular Entity
Create a relation R **that include all simple attributes
–PK: one of key attributes
–R is called entity relation**
• Each tuple represents an entity instance
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance.
Step 2: Weak Entity
Create a relation R that includes all simple attributes
–PK: {PK of owner entity, partial key of weak entity}
**• PK of owner as FK attributes of R
– FK: CASCADE option for ON DELETE or ON UPDATE
If PK of T is (a,b), then FK: S.(a,b) to T.(a,b); separate FKs are not correct
Step 3: Binary 1:1 Relationship
Approach #1: FK approach
–Modify the relation S that is total participation in R
•Include PK of T as FK in S
•Include simple attributes of R in S
Approach #2: merged relation
–If both S and T are total participation, merge S, T, R to one relation
•Approach #3: cross-reference or relationship relation
–Create a relationship relation R (or look up table)
• Include PKs of S and T, as FKs in R
• PK of R: one of the two FKs, and the other is unique
• Cons: extra relation, and JOIN is needed for retrieval
Step 4: Binary 1:N Relationship
★ This cannot be used to map identifying relationship
Approach #1: modify the relation S (N-side of R)
–Include PK of T as FK in S
–Include simple attributes of R as attributes of S
Approach #2: create relationship relation R
–Include PKs of S and T, as FKs in R
Step 5: Binary M:N Relationship
Create a new relationship relation R
–Include PKs of S and T as FKs in R
•FK propagate (CASCADE) option for ON UPDATE, ON DELETE
–PK of R (composite PK): PK of S, PK of T
–Include any simple attributes of M:N relationship type
–1:1 and 1:N relationships can be mapped to this
Step 6: Multivalued Attributes
Create a new relation R for multivalued attr S.A
–Include attr of A, and PK of S
–PK of S as FK in R
•FK propagate (CASCADE) option for ON UPDATE, ON DELETE
–PK of R: {A, PK of S}
–If the multivalued attribute is composite, include its simple components
Step 7: N-ary Relationship Types
• Create a new relation S **for each n-ary relationship R
–Include PKs of participating entity types as FKs in S
–PK of S: {PKs of participating entity types}
–Include any simple attributes as attributes