226 - Database Mid
Lec1 : Big data
Hadoop
Distributed computing framework
Built-in resiliency, fault tolerance
NoSQL
- Schema-less data stored as some form of key-value pair DB
- Scalability: scale out
- (usually) availability instead of consistency: CAP (Consistency, Availability, Partition)
- Eventual consistency (BASE)(Basically Available, Soft State, Eventual Consistency)
RDMS scale up > scale out
NOSQL scale out > scale up
Lec2 : RDBMS Concept and Architecture
Database management system (DBMS)
- Defining a DB: data types, constraints, meta-data
- Constructing a DB: storing data on storage medium, relationship among records
- Manipulating a DB: query/update, report
- Sharing a DB: concurrent user access
- Query: retrieve data
- Transaction: read/write data from/to DB
- Protection
- System protection: against hardware/software malfunction
- Security protection: against unauthorized or malicious access
- Maintenance: Allow the system to evolve as requirements change over time
Characteristics of DB approach
- Self-describing
- Insulation between programs and data, data abstraction
- Support multiple views of data
- Sharing of data and multiuser transaction processing
Advantage of DB approach
- Controlling redundancy
- data normalization
- denormalization
- Providing persistent storage
- Providing storage structures and search techniques for efficient query processing
- Providing backup and recovery
- Enforcing data integrity constraints
Data abstraction
- Suppression of details of data organization and storage
- Highlighting of the essential features for an improved understanding of data
- different users can view data at preferred level of detail
Data model
- Collection of concepts that describe the structure of a database
- Data types, relationships, and constraints
- Provides means to achieve data abstraction
- Express business requirements and capabilities
- Basic operations: retrievals and updates
- Dynamic aspect or behavior: define valid operations
Data modeling steps
–Business requirements collection: what is needed?
• Talk to right people & get right information
–Conceptual data model
• Map requirements to data models (ERDs or UMLs)
–Logical data model
• Map ERDs to DB tables/columns, etc (DDLs) –DBMS and platform specific
–Physical data model
• Physical aspects of implementation DBs
• Performance requirements, storage/data distribution, access mechanism
Database schema
- Description of a database
- Less likely to change
- Schema diagram
- Displays selected aspects of schema
- Schema construct: an object in the schema
Database state or snapshot
- Data in database at a particular moment in time
- Any data modification changes DB state
- Likely to change
Three-Schema Architecture
**External level**
logical data independence
Conceptual level
Physical data independence
Internal level
Data independence
- Capacity to change the DB schema at one level w/o changing the schema at the next higher level
- mapping may be changed
- logical data independence
- E.g., add/remove record type/data field
- Physical data independence
- E.g., add new access structure (index), move DB file
DBMS Languages
- Structured Query Language (SQL)
- Data definition language (DDL)
- CREATE, ALTER, DROP
- Data manipulation language (DML)
- SELECT, INSERT, UPDATE, DELETE
- set-oriented
- Data definition language (DDL)
DBMS Architecture: Two-Tier Client/Server
- Server ---network ---client
- DB Server
- DB Client
- UI/app + DB access modules
- DB accessing modules: DB API
DBMS Architecture: Three-Tier Web Applications
- App server or Web server
- Adds intermediate layer between client and the DB server
- Runs apps programs and stores business rules
Lec3 : Entity-Relationship (ER) Model
Steps for Database Design
- Requirements collection and Analysis Talk to the right people to collect/analyze requirements
- Conceptual design Map data requirements to Entity Relationship Diagram (ERD)
- Logical design Map ERDs to DB tables and constraints (DDL)
Physical design
Internal storage structures, file organizations, indexes, access paths, and physical design parameters for the database files specified
Miniworld:
- Purpose: why
- Actors: who –roles: what –interactions with other actors: how, where, when, whom
- Operations: data requirements, functional requirements
ER Diagram Notation
ER model describes data as
- Entities
- Objects
- Thing in real world with independent existence
- Object w/ physical existence, e.g., car, person, house, employee
- Object w/ conceptual existence, e.g., job, course, company
- Attributes
- Properties of objects
- Particular properties that describe entity
- Each attribute has value(s)
- Relationships
- Data association among objects
Single-valued vs multi-valued attributes
- single-valued: age, b-day (Oval)
- multi-valued: {phone}, {college-degree} (Double-lined oval)
- Stored vs derived attributes
- E.g., b-day -> age
- NULL value: Unknown, N/A, missing
- Complex Attributes
- Composite and multi-valued attributes can be nested
Key attribute or uniqueness constraint
- Attribute w/ distinct value for each individual entity in entity set
- Key attribute in ERD: underlined oval
- A entity type may have more than one key attribute
Relationship
- Data association : one entity type refers to another entity type
- Relationship type R among n entity types E1, E2, ..., En
- Relationship set R == set of relationship instances
★ Attr is not needed if data association has been defined by relationship
Role and Recursive Relationship
- Role: Help to explain the relationship
- WORKS_FOR relationship:
- EMPLOYEE: role - employee or worker
- DEPARTMENT: role - employer or department
- Role name
- defines the role that an entity type plays in relationship
- Optional but essential for recursive relationship
- Recursive relationship
- same entity type participates more than once in a relationship type in different roles.
Cardinality ratio for a binary relationship
- Specifies max number of relationship instances that entity can participate in
- Types: 1:1, 1:N, N:1, M:N
- ER diagram: display 1, M, or N on diamond
- Explicit value if known
Participation constraint (minimum cardinality constraint)
- Specifies whether existence of entity depends on its being related to another entity
- Types: total and partial
Total participation (existence dependency)
- every employee must work for a department
- ERD: double line connecting entity type and relationship type
Partial participation
- Some, not all, employees manage some departments
- ERD: single line connecting entity type and relationship type
Migrating attributes of relationship to entity type
- 1:1 relationship type : Can be migrated to either one entity type
- 1:N relationship type : Can be migrated only to entity type on N-side of relationship
- M:N relationship type: Must be specified as relationship attributes
Weak Entity Types
- Do not have key attributes of their own
- Usually have partial key
- Uniquely distinguish weak entities related to the same owner entity
- May have duplicated value among all (weak) entities
- w/ one or more Identifying (or owner parent) entity type
- Identifying relationship
- Relates a weak entity type to its owner (or parent) entity type(s)
- Cardinality ratio: usually 1:1 or 1:N (where 1 is on owner entity side)
- ★ Always has a total participation constraint (existence dependency) b/w identifying relationship and weak entity
- ★ Not every existence dependency results in a weak entity type
- ER diagram:
- Weak entity type: double-lined rectangle
- Identifying relationship: double-lined diamond
- Partial key: dashed underline or dotted underline
- Alternative to weak entity types: complex (composited, multi-value) attrs
MinMax Notation
(min, max) notation on relationship type
- Replaces cardinality ratio (1:1, 1:N, M:N) and single/double line (participation constraints)
- Conversion between these two notations
- Expressive capability of requirements vs regular ERD
- ERD: single consistent notation (regular or min-max)
Relationship Types of Degree
- Degree = # of participating entity types
- 3 - way relationship in (a) = subset of 3 binary relationship in (b)
- Designer decides based on semantics or meaning
Some tools permit only binary relationships
–Ternary relationship => weak entity type
In general three binary relationships cannot replace a ternary relationship
weak entity w/ a ternary (or n-ary) identifying relationship type
=> two owner entity types
★ ERD specifies data requirements/aspect, not operations
- ERD provides data model upon which operations are operated
- No need to draw operations as relationships in ERD
Lec4 : Relational Model and Constraints
- uses the concept of a mathematical relation as building block
- has its theoretical basis in set theory and first-order predicate logic
Tuple : Represents a collection of related data values
Attribute : The name of a data value in a tuple
Relation : Each may have multiple tuples
ERD | Relational Model | RDBMS | Object |
---|---|---|---|
Entity or Entity Type | Relation | Table | Class |
Attribute | Attribute | Column | Property |
Entity instance | Tuple | Row | Object |
Domain
- specify all possible values for an attribute
- Set of atomic values: indivisible value
Relation schema
R(A1,A2, ...,An): R ➜ relation name, A1,A2,..., An ➜ list of attributes
Tuples
- A set of tuples => no order among tuples in a relation
- Attribute in tuples: atomic value
- Flat relational model
- values are atomic, not structures or lists
- Composite and multivalued attributes not allowed (1st Normal form)
- Multivalued attributes ➜ separate relations
- Composite attributes ➜ simple component attributes
NULL value
- Value unknown
- Value exists but is not available
- Attribute does not apply to this tuple
Relational Model Constraints
- Inherent model-based constraints or implicit constraints
- Inherent in the data model
- Schema-based constraints or explicit constraints
- Can be directly expressed in schemas of the data model
- Specified by Data Definition Language (DDL)
- Application-based or semantic constraints or business rules
- Cannot be directly expressed in schemas
- Enforced by application program
Domain Constraints
within a tuple the value of each attribute must be an element from the domain of that attribute, like Numeric data types, Boolean, Subrange range of values from a data type
Key Constraints
Unique value: no two tuples w/ the same value
- Superkey –Set of attributes that no two distinct tuples in any state r of R can have the same value for SK
- ★ A superkeymay not be a key. A key is always a superkey.
- Key
- Two distinct tuples **in any state of relation cannot have identical values for (all) attributes in key
- A key K is a minimal superkey
- A key may consist of one or more attributes
- Candidate key
- Relation schema may have more than one key
- Primary key (PK)
- Designated among candidate keys
- Other candidate keys are designated as unique keys
- Each relation: only one PK (underlined)
- PK cannot be NULL
Integrity Constraints
- Entity integrity constraint: PK cannot be NULL
- Referential integrity constraint
- Data association b/w two relations
- Consistency among tuples in two relations
- Foreign key rules:
- FK attributes: same domain(s) as PK attributes
- FK value in a tuple t1of the current state r1(R1) is
- PK value in some tuple t2 of the current state r2(R2), t1[FK] = t2[PK]
- NULL, i.e., ignore foreign key rules; subject to attribute nullability
on delete cascade
It will delete all the child records when parent record is deleted, so that there will be no child record when parent is deleted.
on update cascade
It will update the child records parent_id when parent record id that change. In rare case we use on update cascade
Operation for relational Model
Violation possible? | INSERT | UPDATE | DELETE |
---|---|---|---|
Domain constraint | Y | Y | N |
Key constraint | Y | Y | N |
Entity integrity constraint | Y | Y | N |
Referential integrity constraint | Y | Y | Y |