What have we learnt from CSCI585? - Next Show
Chapter 1 Data vs Info Data Have not yet been processed to reveal their meaning to the end user Building blocks of information Generation, storage, and retrieval of data Info Produced by processing raw data to reveal its meaning Requires context Should be accurate, relevant, and timely to enable good decision making Bedrock of knowledge Database End user data meta data DBMS program manage database structure control access data Role data to share between database & user view application request hide db internal complexity Advantage Data integrity and less data inconsistency improve data security/sharing/access/decision maker data quality Type of Database general purpose database variety Discipline specific database focused on specific subject area Operational database Designed to day-to-day ops Analytical database store historical data data warehouse online analytical processing OLAP tool for retrieving processing and modeling for data warehouse Business intelligence capture and process biz data to generate info Unstructured data exist in their original state Structured result from formatting semistructured data Extensible Markup Language XML Database Evolution Manual File System Computerized File System DP specialist´╝îTrack data & produce report File System Redux(Excel) Term Data Field Record File Problem with file system lengthy dev times difficulty of getting quick answer complex system admin lack of security and limited data share extensive programming Structure dependence(exist)&independence(access) Data dependence(access storage change) & independence(is changed without affecting ability to access) Practical significant of data independence is diff between logical and physical format Data redundancy poor data security data inconsistency increased likelihood of data-entry error data anomaly update/insertion/deletion anomalies Data system store data structure and relationship define store and manage all access path and component DBMS Functions data dictionary data storage data transformation and presentation security management multiuser access control backup and recovery data integrity management database access language and api SQL DDL DML database communication interface Disadvantage Increased cost Management complexity Maintaining currency Vendor dependence Frequent upgrade/replacement cycle Chapter 2 Data Models Modeling vs Model Data Modeling create a specific data model for problem domain Data models simple representations of complex real world data structure Model Abstract of real object Entity vs Relationship entity is an object to collect and store data relationship one to many many to many one to one Constraint Business Rules brief precise and unambiguous define the basic building blocks describe main and distinguishing characteristics Hierarchical Model vs Network Model H large amount of data for complex manufacturing project tree which contains segment 1:M N 1:M & M:M represent complex data relationship effectively improve database preformance allow a record to have more than one parent Schema & Subschema Relation or table: matrix composed of intersecting tuple and attribute Tuple: Row Attribute: Column RDBMS H&N DBMS relational model ERD use graphic to model database component Connectivity label relationship types Entity instance Row in the relational table Graph Chen Notation Crow's Foot Notation UML OODBMS object-oriented abstraction of real world entity Attributes Describe properties of an object Class collection of similar objects class hierarchy(each class has only one parent) Inheritance inherit methods and attributes of parent UML ERDM & XML Extended relation data model support OO Object and relational dbms XML unstructured data for efficient and efficient exchange of all data type. Big data find new better way to manage large amount of web and sensor-generated data provide high performance and scalability at a reasonable cost Volumne Velocity Variety Hadoop\HDFS\MapReduce\NoSQL NoSQL distributed database architecture provide high scalability high availability and fault tolerance large amount of sparse data gear toward performance rather than transaction consistency key-value store Evolution of data models Hierarchy Model Pros Promote data sharing Parent/child relationship promote database security is provided 1:M efficient Cons physical data storage navigational system require knowledge of h path change in structure require change in all application programs implementation limitaion no data definition lack of standards Network Model Pros Conceptual simplicity Handle more relationship types Data access is flexible data owner conformance to standard DDL DML Cons System complexity limit navigational system yield complex implementation structural changes require changes in all application Relational Model Pros Structural independence Tabular view ad hoc query based on sql isolates the end user from improves implementaton Cons require substantial hardware conceptual simplicity give untrained people the tools to use a good system poorly may promote info problems Entity Relationship Model Visual modeling yield conceptual simplicity Visual representation makes it an effective communication tool Is integrated with the dominant relational model Cons Limited constraint Limited relationship No data manipulation language loss of info content occurs when attribute removed OOM Semantic content is added Visual representation includes semantic content Inheritance promotes data integrity Cons Slow development of standards caused vendor to supply their own enhancement complex nav system learning curve is steep high system overhead slow transactions No SQL High scalalility, avalibility Use low-cost commodity hardware support big data k-v Cons complex programming is required no relationship support no transaction integrity support data consistency eventually consistent model Models level External schema(abstraction high)(user view) specific representation of an external view Text Conceptual schema(ER diagram) high level description of the main data objects logical design task of creating a cs Tables Internal schema (sql, specific database) use database construct supported by the chosen database logical independence (sql) physical model (storage) Chapter 3 Relation Database Model Logical view of data facilitated by the creation of data relationships based on a logical construct called a relation Keys Primary Key Dependencies Functional Dependence( value of one or more attr determine other) Determinant attribute who determine Dependent Full functional Dependence Entire collection of attributes in determinant Composite Key (AreaNo. Phone) Key attribute Entity integrity PK unique PK is not null, even part of it Referential integrity null or primary key in a table Secondary key key used strictly for data retrieval purpose Relvar variable holds a relation heading contain the names of the attribute and the body contains the relation Closure embed Relational Set Op select project union intersect difference yield all row not in the other table union campatible product A table * B table join with a foreign key, common attribute natural Join equiljoin thetajoin inner join(only return matched from table being joined) outer join(match pairs retain and unmatched) right outer right weizhong left outer left weizhong divide 2 column/1 column output is 1 column that contain all values from the second column of the dividend that associated with every row in the divisor. A(a, b)/B(b') = C(a'), a' only has a's b include all b' Data dictionary System catalog Relationships 1:M 1:1 M:N Composite entity, bridge entity, associate entity show in register has redundancy serve crucial info purpose preserve the historical accuracy of the data Indexes Index Key index reference point that lead to data location identified by the key Unique Index index key can have only one pointer value associated with it Each index is associated with only one table Dr Codd'12 rules infomation guaranteed access systematic treatment of nulls dynamic online catalog based on the relational model comprehensive data sub lang view update high level insert update delete physical data independence logical data independence integrity independence distribution independence nonsubversion rule zero a database must use its relational facilities exclusively for management Chapter 4 ER Modeling ERD & ERM Attribute Optional attribute can be null required attribute can not be empty Domain possible value Identifiers one or more attr uniquely identify each entity instance Attributes Composite identifier Composite attribute simple attribute single valued attribute multivalued attribute deprived attribute Relationship Both direction Participant Connectivity Cardinality min or max number of entity occurence Weak & Strong weak (non-identifying) primary key of related entity not contain primary key component of parent entity Condition existence-dependent Database designer determine whether or not based on business rules dot line strong (identifying relationship) contain solid line Participation optional participation o-| 0≡ mandatory participation |≡ || Relationship Degree Unary relationship within a single entity Recursive relationship Binary relationship two entities are associated Ternary relationship Three entities are associated Database design challenges must conform to design standard need for high processing speed may limit the number and complexity of logically desirable relationship need for maximum info generation may lead to loss of clean design structure and high transaction speed Chapter 5 Advance data modeling EERD extended entity relationship model Entity supertypes and subtypes supertype common characteristics subtype unique characteristics of each entity subtype Specialization Hierarchy (Employee ->(engineer, manager, designer)) depicts arrangement of higher-level entity supertype and low-level entity subtype is-a relationship subtype exists within the context of a supertype every subtype has one supertype to which it's directly related supertype can have many subtypes Inheritance enable an entity subtype to inherit attribute and relationship of supertype inherit primary key 1:1 super:sub inherit all relationship lower-level subtype inherit all attribute Subtype discriminator controller disjoint and overlapping disjoint subtypes subset of supertype entity set nonoverlapping subset overlapping subset nonunique Completeness Constrain partial total completeness every subtype must be a member of any Specialization and Generalization specialization top-down process generalization bottom up Entity Cluster Virtual entity type Primary Key non inteligent no change over time preferable single-attribue preferable numeric security compliant Composite Primary Key Surrogate Primary Key Time-variant data Fan Trap Redundant relationship Chapter 7 Normalization 1NF Table format No repeating groups PK identified 2NF 1NF no partial dependencies determinant is only part of pk 3NF 2NF and no transitive dependencies 4NF & BCNF BCNF every determinant is a candidate key 4NF 3NF and no independent multivalued dependencies Partial dependency functional dependence in which the determinant is only part of pk Transitive dependency an attribute functionally depends on another nonkey attribute Functional dependence each value determine one and only one value of B Repeating group group of multiple entries of same type can exist for any single key All relational table satisfy 1NF requirement Improving the Design atomic attribute further divided atomicity granularity level of detail represented by the value store in row Surrogate Key numeric value auto_incremented Denormalization creation of normalized relations processing requirements and speed Joining a large number of table Defects less efficient indexing is more cumbersome Case redundant data derived data preaggregated data information requirements Chapter 8 SQL Check create table with default maybe. check(xx in ('s','d')) create view/drop view create index create index p_indx on table(column) create unique index name_index on table(column, column) Constraint constraint_name unique(xx,xx) Special Between is null like % _ in exist alter table Group group by order by having Chapter 8 Advance SQL JOIN cross join select * form T1,T2; inner join select * from T1,T2 where T1.C1 = T2.C1; select * from T1 natural join T2; select * from T1 JOIN T2 Using (C1); select * from T1 JOIN T2 ON (T1.C1=T2.C1); outer join select * from T1 left outer join T2 on t1.c1=t2.c1; select * from T1 right outer join T2 on t1.c1=t2.c1; select * from T1 full outer join T2 on t1.c1=t2.c1; Operator IN ALL ANY EXISTS EXCEPT MINUS UNION INTERSECT VIEW create view view_name as select query Chapter 10 Transaction Management & Concurrency Control Transaction Logical unit of work that must be entirely completed or aborted Consistent database state Data request = 1 sql Properties Atomicity must be completed if not, transaction is aborted Consistency Isolation data used in first transaction cannot be used in second Durability once transaction is commited can not be undone or lost Serializability consistent result Transaction will end when commit rollback end of program is reached program is abnormally terminated Log keep track of all transaction use for rollback abnormal termination system failure Log Table TRL_ID Transaction Log Record ID TRX_NUM Transaction Number PRT Pointer to a transaction record id Concurrency Control simultaneous transaction ensure the serializability of transaction Problems lost update uncommitted data Inconsistent retrievals Lock or scheduler Lock method Pessimistic locking confliction is likely Lock manager responsible for assigning and policing the locks Level Granularity database-level table page-level row-level field-level Type binary lock exclusive lock exists when access is reserved for transaction that locked the object shared lock exists when concurrent transaction are granted read access on the basis of a common lock Deadlocks To-Phase Locking 2pL not prevent deadlock Growing phase acquire Shrinking phase release without obtain new lock Governing rules two transaction can not have conflicting locks no unlock operation can precede a lock operation in the same transation no data are affected until all locks are obtained deadly embrace prevention detection avoidance Time stamping uniqueness monotonictiy always increase Optimistic Method majority of op don't conflict without restriction phase read read uncommited only allow phantom reads and non-repeatable read read commited do not allow dirty read repeatable read serializable all not allowed validation will not infect the integrity of consistency of database write Data recovery restore database from a given state atomic transaction property Concepts Write ahead log protocol logs are always written before data are updated Redundant transaction log physical disk failure Buffers temporary storage area in primary memory checkpoints allow dbms to write all its updated buffer from mem to disk Write-procedure deferred write & deferred update only transaction log is updated write-through database is updated by transaction operation during transaction's executed Chapter 11 Data Performance Tuning and Query Optimization Database performance tuning DBMS performance tuning response client's request SQL performance tuning generate SQL query return correct answer in least amount of time DBMS architecture data files extends table space file group data cache SQL cache or procedure cache Database query optimization timing static compiled by dbms dynamic Query processin parsing execution fetching SQL Parsing phase broken down into small units Query optimizer find most efficient way to access data Access plans dbms-specific and translate sql into complex I/O operation if access plan exists in sql cache resue it SQL execution I/O operations locks are acquired data are retrieved and placed in data cache SQL Fetching Phase DBMS use temp table space to store temp data movement of result set rows from server cache to client cache Bottleneck IO Index and Query optimization Index help speed up data access facilitate searching sorting and using aggregate function joint operation ordered set of value that contain the index key and pointers more efficient than a full table scan Data sparsity number of diff values a column could have hash indexes B-tree Bitmap DBMS determine best type of index to use Choices Rule-based opt Cost-based opt affect optimizer make decision base on existing statistics, which might be old might choose less-efficient decisions Optimizer hints special instruction for optimizer embedded in sql ALL_ROWS select /*+ ALL_ROWS */* from table; minimize the overall exe time batch mode processes FIRST_ROW select /*+FIRST_ROW */* from table; minimize the time needed to process the first set of rows minimize the time needed to return only the first set interactive mode processes INDEX(name) select /*+ INDEX(P_QOH_NDX) */* from product; force the optimizer to use P_QOH_NDX index to process this query SQL Performance most dbms automatic query opt at the server end most sql performance opt techniques are dbms-specific and thus rarely portable mainly because poorly written SQL` Index function-based index based on specific sql funtion indexes can not always be used to improve Conditional expression expressed with where or having clause use simple columns or literals as operands numeric field comparisons are faster than character date and null comparison equality comparison are faster than inequality transform conditional exp to use literal write equality condition first when using multiple conditional exp when using AND conditions, write the condition most likely to be false first OR, true first Avoid the use of NOT Query formulation identify what columns and computations are required identify source tables determine how to join table determine selection criteria order to display DBMS Tuning Data cache SQL cache Sort cache optimizer mode in-memory database store large portion Physical RAID to provide a balance between performance improvement and fault tolerance disk contention high usage table in own table space assign separate data files in separate storage volumes for indexes, system and high-usage table index organized table clustered index table store computed and aggregate attribute in table Chapter 12 Distributed Database Management DDBMS data and processing function are distributed among several sites Centralized dbms in biz environment Rapid ad hoc data access distributed data access mobile/internet/application as a service/big data Distributed processing logical processing is shared among two or more physical independent sites via network Distributed database store logically related database over two or more physically independent sites via network DDBMS component Computer workstation network hardware communication media Transaction processor TM AP Data processor DP aka DM Each TP can access data on any DP, and each DP handle all requests for local data from any TP Single-Site Processing, single site data SPSD processing is done on a single host data store on host processing restricted on end user side DBMS is accessed by dumb termials Centralized MPSD shared a single data repo accessed through LA CS architecture MPMD homogeneous integrate multiple instance of same dbms over a network heterogeneous different type of dbms over fully heterogeneous different dbms and different data model Distributed transparency distribution transparency Fragmentation location and local mapping unique fragment Distributed data dictionary DDD distributed global schema transaction will maintain integrity and consistency completed require complex mechanisms to manage transaction distributed request remote request single sql statement access remote transaction single remote site composed of several requests distributed transaction request data from several diff remote site on network distributed request single sql reference data at several DP site 2Phase commit protocol 2PC if a portion of a transaction op cannot be committed, all changes made at the other sites will be undone require that DP's transaction log entry be written before database fragment is updated Do-undo-redo protocol roll back forward with the help of log Write ahead protocol log before exe Define operation between coordinator and subordinates Preparation final commit failure ensure system will operate in case of network failure minimize total cost replica network latency network partitioning performance Heterogeneity Data fragmentation horizontal fragmentation division of a relation into fragments of rows vertical fragmentation columns mixed fragmentation combine Data allocation centralized data allocation partitioned data allocation replicated data allocation The CAP Theorem consistency availability partition tolerance trade off between consistency and availability is basically available, soft state, eventually consistent propagate Chapter 13 Business Intelligence and Data warehouse OLAP Rollup drilldown CUBE Star schema Chapter 15 Database Connectivity and Web Technology Database middleware UDA ODBC mysqli DAO RDO PDO XML/XSLT DTD Chapter 16 Database Administration and Security Decision Making Cycle data->info->knowledge->decision making->action->data Top of management level strategic decision making middle management deliver the data requirement operation level Information system IS application dev database operation place of DBA higher than db op and app dev DBA's function planning design implementation operation training DSO Disaster management full backup incremental backup concurrent backup Security confidentiality compliance integrity availability Identifies security vulnerabilities Identifies measures to protect the system security threat security breach Authorization audit log Data dictionary type integrated standalone 3rd party active data dictionary passive data dictionary Store description of all objects that interact with the database Metadata is the basis for monitoring database use and for assigning access rights to users DBA uses data dictionary to support data analysis and design CASE tool Workbench Type of table space system users temp undotbs recovery info Initialization Parameters trusted_log_flag