226 - Database Final 2

Lec12: Distributed database

no share physical component
independent of each other
Transaction may access data at one or more sites

Replication – multiple copies of data, stored in different sites, for faster retrieval and fault tolerance

Fragmentation – Relation is partitioned into several fragments stored in distinct sites

Data replication

Full replication of a relation: relation stored at all sites
Fully redundant databases: entire DB in every site
Pros
- Availability: site failure is OK if replicas exist
- Parallelism: queries processed by several nodes in parallel
- Reduced data transfer: relation available locally at site w/ a replica
Cons
- Increased cost of updates: update each replica
- Increased complexity of concurrency control: concurrent updates to distinct replicas
  - One solution: choose one copy as primary copy

Horizontal fragmentation

–Partition a table horizontally
–A fragment == a subset of rows == shard
–A table is divided to one or more horizontal fragments

Vertical fragmentation

– Partition a table vertically
–A fragment == a subset of columns
–Ex: 2 vertical fragments of employee:
•(ssn, name, bdate, addr, sex), (ssn, salary, super_ssn, Dno)
–All fragments must contain a common candidate key (or superkey, or tuple-id)

Data Fragmentation

Advantages:
- parallel processing on fragments of a relation
- Locality
Data transparency
- User is unaware of details of how and where data is stored
  
  • Fragmentation transparency
  
  • Replication transparency
  
  • Location transparency

Transaction coordinator:

Start execution of txns that originate at the site
Distribute sub-txns at sites for execution
Coordinate termination of txn that originates at the site (commit or abort at all sites)

Transaction Manager

Maintain log for recovery purposes
Coordinate concurrent txn executing at this site

System Failure Modes

Network partition : A network is said to be partitioned when it has been split into two or more subsystems that lack any connection between them
-- Network partitioning and site failures are generally indistinguishable

Commit Protocol

Ensure atomicity across sites: a txn executing at multiple sites either be committed at all the sites, or aborted at all the sites

Two-phase commit (2PC) protocol: widely used

Phase 1 : Get a decision

Ci asks all participants to prepare to commit txn T
–Ci adds <prepare T> to (local) log and forces log to stable storage
–sends prepare T msg to all sites at which T executed

Upon receiving msg, txn manager at each site determines if it can commit T
–if not, add <no T> to log and send abort T msgto Ci
–if yes
•add <ready T> to log : A promise by a site to commit T or abort T
•force all logs for T to stable storage
•send ready T msg to Ci

Phase 2 : Record a decision

Ci waits for receiving responses from all participating sites, or timeout
• If Ci received a ready T msg from all participating sites, commit T else abort T
• Receiving just one abort T msg: good enough for Ci to abort T
Ci adds <commit T> or <abort T> to (local) log and forces log to stable storage
–Once on stable storage it is irrevocable(even if failures occur)
Ci sends the decision to each participant
–Each participant records <commit T> or <abort T> to log

Failure Handing

2PC Cons - coordinator failure may result in blocking, where a decision either to commit or to abort T may have to be postponed until Ci recovers

Site Failure

Ci detects a site
- fails before the site responding w/ ready T msg: Ci assumes abort T msg
- fails after Ci receiving ready T msg: Ci ignores it ➜ have a promise
When site Sk recovers, it examines its log to determine fate of txns active at the time of the failure

–Log contains <commit T> record: redo (T)

–Log contains <abort T> record: undo (T)

–Log contains <ready T> record: consult Ci for the fate of T
–Log contains no control records (commit, abort, ready) concerning T:
➜ Sk failed before responding to prepare T msg from Ci : the failure of Sk precludes the sending of such a response, Ci must abort T

➜ Sk must execute undo (T)

Coordinator Failure

If coordinator Cifails while the commit protocol for Tis executing then participating sites must decide on T’s fate:

If an active site contains <commit T> in log ➜ commit T
If an active site contains <abort T> in log ➜ abort T
If some active participating site does not contain <ready T> in log, then the failed coordinator Ci cannot have decided to commit T : Can abort T; such a site must reject any subsequent <prepare T> message from Ci
else, then all active sites must have <ready T> in their logs, but no additional control records (such as <abort T> of <commit T>) ➜ active sites must wait for Ci to recover, to find decision ➜ Blocking problem

Single lock manager

– a single lock manager resides in a single chosen site, say Si
– txn sends a lock request to Si and lock manager determines

Txn can read data from any site w/ a replica of the data
Writes must be performed on all replicas of a data item
Pros: single implementation, simple deadlock handling
Cons: bottleneck, vulnerability

Distributed lock manager

Local lock manager at each site; control access to local data items

Pros: Simple implementation, reduce bottleneck, reduce vulnerability
Cons: deadlock detection
Primary copy: choose one replica of data item to be the primary copy

–Site containing the replica is called the primary site for that data item=
- When a txn needs to lock a data item Q, it requests a lock at the primary site of Q
- Pros: simple implementation
- Cons: If the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible
Quorum Consensus Protocol :
- Each site is assigned a weight; S = the total of all site weights ➜ A more reliable site can get higher weight
- Choose two values: read quorum QR and write quorum QW
  
  –Rules: QR +QW > S and 2 *QW > S
- Read: must lock enough replicas that sum(site weight) >= QR
- Write: must lock enough replicas that sum(site weight) >= QW
- Pros: the cost of either read or write locking can be selectively reduced
  
  •Optimize read: smaller QR, larger QW; optimize write: larger QR, smaller QW

Timestamp

Each transaction must be given a unique timestamp (TS)

– Each site generates a unique local TS using either a logical counter or the local clock
– Global unique TS: {unique local TS, unique site identifier}

Replication with Weak Consistency

Replication of data with weak degrees of consistency (w/o guarantee of serializabiliy)
- i.e. QR +QW ≤S or 2*QW≤S
- Usually only when not enough sites are available to ensure quorum
- Tradeoffs: consistency, availability, latency/performance
Key issues:
- Reads may get old versions
- Writes may occur in parallel, leading to inconsistent versions
Master-slave Replication
- txn updates: at a single “master” site, and propagated to “slave” sites
- Update propagation -not part of txn
  
  •immediately (after txncommits), or async(lazy, periodic)
  
  –txnmay read from slave sites, but no update
  
  –Replicas sees a transaction-consistent snapshot of DB at primary site
Multi-master replication

–txn updates done at any replica: update local copy, and system propagates to replicas transparently - txn unaware data replication
Lazy replication

–Txn updates: at one site (primary or any)

–Update propagation: lazy (not on critical path, after txn commits)

Deadlock detection: cannot be detected locally at either site
➜ Centralized detection: A global wait-for graph is constructed and maintained in a single site

CAP

Availability : To be robust (ability of system to function even during failures)
➜ Detect failures, Reconfigure system, Recovery/Reintegration

Consistency:
● Strong consistency: serializable schedule
● Weak consistency: master-slave, multi-master, lazy replication, etc.

Tade-off between A and C
–Most of distributed consensus algorithms will block during partitions to ensure consistency
• Paxosdoes not block as long as a majority of participants are alive
–Many apps require continued operation even during network partition
• Even at the cost of consistency

Consistency: All copies have the same value

Availability: System can run even if parts have failed

Partition-tolerant: Survive network partitioning

★ It is impossible to achieve all three ➜ Any two can be achieved
★ CAP theorem only matters when there is a partition

Eventual Consistency

when no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent
– You may not know how long it may take

• Known as BASE (Basically Available, Soft state, Eventual consistency)
Soft state: copies of a data item may be inconsistent

• Used by most NoSQL

Lec13 : No SQL

Schema-less data stored as some form of key-value pair DB
Scalability: scale out
(usually) availability instead of consistency: CAP
Eventual consistency (BASE)

<b></b>	RDBMS	NoSQL
Type of data	structured data	Semi-and un-structured data
Organization of data	schema: tables/rows/columns	schema-less: key-value pairs
Design steps	requirements➜ data model (ERD, UML) ➜ tables ➜…	store data first➜ (transform data) ➜ get/put
data association	referential integrity, foreign keys	You are on your own
Functionality	complex	simple
Properties	ACID	BASE
Pros	(strong) consistency Transactions Joins	Schema-less Scalability, Availability Performance
Cons	Scalability Availability	Eventual consistency Tricky join No transactions

NoSQL–Key-value Pair

Simple architecture
- a unique identifier (key) maps to an obj(value)
- DB itself does not care about obj
Good for
- Simple data model
- Scalability
Bad for : “Complex” datasets
Examples: AWS DynamoDB, Azure Tables, Google Cloud Datastore, Riak, Redis, Azure CosmosDB

NoSQL–Column-family/BigTable

Key-value pair + grouping
- Key –{a set of values}
- E.g., time series data from sensors, languages of web page
Good for
- More complex data set (than simple key-value pair)
- Scalability
Examples : Cassandra, HBase, Hypertable, Azure CosmosDB

NoSQL –Document

DB knows the nature of the data
- Document –JSON
- Need index to improve performance
Scalability –clustering, replication, some w/ partition-tolerant
Good for
- Systems already using document-style data
- Scalability
Examples: MongoDB, CouchDB, AWS DynamoDB, Azure CosmosDB

NoSQL–Graph

How data is related and what calculation is to be performed
Usually also has “transaction”
Good for
- Interconnected Data and non-tabular
  - Geospatial problems, recommendation engine, network analysis, bioinformatics
- “Closeness” of relationship
  - How vulnerable a company is to “bad news” for another company
Bad for: Simple tabular data
Examples: Neo4j, Azure CosmosDB

Cassandra

Scalable High Availability NoSQL, key-value pair + column family (≈ table in RDBMS)
Data model: a partitioned row store
- Rows are organized into tables
- Partition: consistent hashing - decides storage node (≈ AWS DynamoDB)
  - adding/removing nodes in cluster only reshuffles keys among neighboring nodes
Tunable consistency: decided by client apps, per operation
- Consistency level: 1, 2, 3, quorum, all, any, etc. (quorum ≈ AWS DynamoDB )
W: sent to all replica, R: decided by consistency level
Replication and multi-DC replication
- asynchronous masterless(peer-to-peer) replication
Cassandra Query Language (CQL): SQL like
- No joins and subqueries
- materialized views (i.e. pre-computed results of queries)

HBase

Column-oriented DB based on Google BigTable
Distributed, (timestamp) versioned, NoSQL DB on top of Hadoop and HDFS
Strong consistent reads and writes: not eventual consistency
- –All operations are atomic at the row level, regardless of # of columns
- Write ahead logging (WAL): protect from server failure
Region-based storage architecture: a set of rows, [starting-row-key, ending-row-key)
- Auto sharding of tables
HBaseMaster: masternode
- Assign regions to region servers

MongoDB

Scalable distributed document DB
- Nested JSON document (binary form of JSON –BSON)
- A document: a JSON obj w/ key-value pairs ≈ row in RDBMS
CRUD: JavaScript, ACID at document level
No joins: still allow to retrieve data via relationship ➜ $lookup
Replica Sets
- One server as primary (master), others as secondary (w/ replicated data)
- Client only reads from or writes to master
Auto sharding – range-based, hash-based, user-defined

CouchDB

JSON-and REST-based document-oriented DB]
Each document
- Unique immutable _id: auto-gen or explicit assigned per doc
- _rev: revision per doc, starting from 1, auto-modified after doc changes
- Each update/delete must specify both_id and _rev
CRUD: Futon web interface
No transaction, No locking
View: Content generated by mapreduce functions incrementally
Changes API
Multi-master replication: HA

NoSQL Pros and Cons

Pros:
–schema-less
–scalability (scale out), availability
–performance
–simpler change management (“agile”, faster dev)

Cons:
–eventual consistency
–tricky (server-side) join
–no transaction
–schema-less

Trade-offs (w/ RDBMS): functionality, availability, consistency, durability, scalability, performance

226 - Database Final 2

226 - Database Final 2

Lec12: Distributed database

Data replication

Horizontal fragmentation

Vertical fragmentation

Data Fragmentation

System Failure Modes

Commit Protocol

Distributed lock manager

Timestamp

Replication with Weak Consistency

CAP

Eventual Consistency

Lec13 : No SQL

Cassandra

HBase

MongoDB

CouchDB

NoSQL Pros and Cons

results matching ""

No results matching ""