All things Database

What are the concepts and characteristics of transactions?

Concept: Transaction is a sequence of operations, an indivisible unit of work, starting with BEGIN TRANSACTION and ending with ROLLBACK/COMMIT

Features (ACID): -Atomicity (Atomicity): Logically, it is an indivisible unit of operation. All operations of the transaction are either submitted successfully or all failed and rolled back (implemented with a rollback log, and the operations in the log are executed in reverse); -Consistency (Consistency): The execution of the transaction must keep the database in a consistent state. In the consistent state, the results of reading a data for all transactions are the same; -Isolation (Isolation): The modification made by a firm is invisible to other transactions before it is finally submitted (concurrently executed transactions cannot affect each other); -Durability (Durability): Once the transaction is submitted successfully, the modification of the data is permanent

What concurrency consistency issues will appear?

-Missing Modifications: One transaction modifies the data, before the transaction is committed, another transaction modifies the same data, overwriting the previous modification; -Dirty Read (Dirty Read): A transaction reads data modified by another transaction but not committed (rolled back), resulting in inconsistent data obtained by the two transactions; -Nonrepeatable Read (Nonrepeatable Read): In the same transaction, a query operation reads a row of data at one time and reads the row of data at a later time, and found that the data has been modified (for **update **operating); -Phantom Read: When the same query is executed multiple times, because other transactions perform insert operations in this data range, different result sets will be returned each time (the difference between non-repeatable reads: It is for a whole data/range; and for insert/delete operations)

Four isolation levels of the database?

-Read Uncommited: Before a transaction is committed, its execution results are also visible to other transactions. Will lead to dirty reads, non-repeatable reads, and phantom reads; -Read Commited: A transaction can only see the changes made by the committed transaction. Can avoid dirty reading problem; -Repeatable Read: It can ensure that the same transaction gets the same result when the same data is read multiple times. (MySQL's default isolation level). Can avoid non-repeatable reading; -Serializable (Serializable): Forcing transactions to be executed serially, making it impossible to conflict with each other, thereby solving the problem of phantom reading. It may cause a lot of timeouts and lock contention, which is rarely used in practice.

What are optimistic locking and pessimistic locking?

-Pessimistic lock: Think that the data will be modified at any time, so every time the data is read, it will be locked to prevent other transactions from reading or modifying the data; it is used in scenarios where data is updated frequently. -Optimistic lock: When operating data, it will not be locked, but during the update, it will be judged whether there is another transaction to update the data during this period. If it has been updated, it will fail and retry; suitable for read more and less write Scenes. The implementation of optimistic locking is as follows:

-Add a version number or timestamp field, and update this field every time the data is updated;

-First read the field or all fields you want to update, compare it when updating, and update only if the field has not changed

Common types of blocking?

The intention lock is automatically added by InnoDB without user intervention. For UPDATE, DELETE and INSERT statements, InnoDB Will automatically add an exclusive lock (X) to the involved data set; For ordinary SELECT statements, InnoDB will not add any locks; The transaction can explicitly add a shared lock or exclusive lock to the record set through the following statement: Shared lock (S): SELECT

FROM table_name WHERE ... LOCK IN SHARE MODE. Other sessions can still query the record, and can also add a share mode shared lock to the record. But if the current transaction needs to update the record, it is likely to cause a deadlock. Exclusive lock (X): SELECT
FROM table_name WHERE ... FOR UPDATE. Other sessions can query the record, but cannot add a shared lock or exclusive lock to the record, but wait for the lock to be obtained -Exclusive Lock (Exclusive Lock) / X lock: When a transaction adds an X lock to the data, only this transaction is allowed to read and modify this data, and other transactions cannot add any lock to the data; -Shared Lock (Shared Lock)/S lock: After the S lock is added, the transaction can only read the data but not modify it, and other transactions can only add S locks, not X locks -Intention Locks:

-Before a transaction can obtain the S lock of a data row object, it must first obtain the IS lock of the entire table or a stronger lock;

-Before a transaction can obtain the X lock of a certain data row object, it must first obtain the IX lock of the entire table;

-IS/IX locks are compatible;

-Benefits: If a transaction wants to add X locks to the entire table, it needs to first detect whether other transactions lock the table or a row in the table. This detection is very time-consuming. With the intention lock, you only need to check whether the entire table has an IX/IS/X/S lock.

The role of the lock: used to manage concurrent access to shared resources to ensure the integrity and consistency of the database The concept of blocking granularity

MySQL provides two blocking granularities: row-level locks and table-level locks.

Small block size:

-Benefits: The less the amount of locked data, the smaller the possibility of lock contention, and the higher the concurrency of the system; -Disadvantages: ** system overhead** is large (locking, releasing locks, checking lock status all require resources)

MySQL locks


SELECT ... LOCK In SHARE MODE; SELECT ... FOR UPDATE;

What is a three-level lockout agreement?

-Level 1 Locking Protocol: The transaction must add X lock to the data before modifying the data, and will not release it until the transaction ends. Can solve the problem of lost modification (two transactions cannot add X locks to one data at the same time, to avoid modification being overwritten); -Second-level lockout protocol: On the first-level basis, the transaction must add S lock before reading the data, and release it after reading it. It can solve the dirty read problem (if there is already a transaction modifying the data, it means that the X lock has been added. At this time, the transaction that wants to read the data cannot add the S lock, and cannot read it, avoiding dirty reading data); -Three-level lockout protocol: On the basis of the second level, the transaction must first add an S lock before reading the data, and the transaction can not be released until the end of the transaction. Can solve the problem of non-repeatable reads (avoid other transactions to add X locks to modify the data before the end of the transaction, to ensure that the data will not be updated by other transactions during the transaction)

What is a two-stage lock protocol?

The transaction must be strictly divided into two phases to lock and unlock the data. The first phase is locked and the second phase is unlocked. That is to say, once the lock is released in a transaction, no new locks can be applied for.

Serializable scheduling refers to, through concurrency control, the results of concurrently executed transactions are the same as the results of a serially executed transaction. The transaction follows the two-stage lock protocol is a sufficient condition to ensure serializable scheduling.

What is MVCC?

Multi-Version Concurrency Control (MVCC), MVCC saves two hidden columns after each row of records to store create version number and delete version number.

-Creation version number: the transaction version number when creating a data row (transaction version number: the system version number at the beginning of the transaction; system version number: every time a new transaction is started, the system version number will automatically increase) ； -Delete version number: the transaction version number during the delete operation; -Various operations:

-When inserting operation, record the creation version number;

-When deleting, record the deleted version number;

-During the update operation, first record the deleted version number, and then add a new line to create the version number;

-In the query operation, the following conditions must be met to be queried: the delete version number is not defined or greater than the current transaction version number (the delete operation is done after the current transaction is started); the creation version number is less than or equal to the current transaction version number (create The operation is the transaction completed or completed before the transaction started)

The version number reduces lock contention and improves system performance; two isolation levels of committed read and repeatable read can be achieved, and MVCC is not required for uncommitted reads

Snapshot read and current read

Use MVCC to read the data in the snapshot, which can reduce the overhead caused by locking:


select

* from table ...;

The current read is the latest data, which needs to be locked. The first statement below requires an S lock, and all other statements require an X lock:


select

* from table where? lock in share mode; select

* from table where? for update; insert; update; delete;

The paradigm of the database?

-First Normal Form (1NF, Normal Form): Attributes should not be separable. Example: If "phone" is used as an attribute (a column), it does not conform to 1NF, because the attribute of telephone can be decomposed into home phones and mobile phones... If "mobile phone" is used as an attribute, it conforms to 1NF; -Second Normal Form 2NF: Each non-primary attribute completely depends on the primary attribute set (candidate key set);

-B is completely dependent on A, which means that all the attributes in A uniquely determine B. If there are fewer attributes, they cannot be uniquely determined. If there are more attributes, there will be redundancy (it is called dependency, not complete dependency). For example: (student ID, course name) this main attribute set can uniquely determine grades, but for the attribute of student name, (student ID, course name) this attribute set is redundant, so the student name does not completely depend on (student ID) , Course name) this attribute set;

-Primary attribute set/candidate code set: A certain set of attributes can uniquely determine other attributes (the primary key is a key selected from the candidate key set), but its subset cannot. The attributes in such an attribute group are the primary attributes; The attributes in the candidate code set become non-primary attributes;

-2NF can be satisfied by decomposition: make (student ID, course name, grade) into one table; (student ID, student name) into another table, to avoid a lot of data redundancy;

After satisfying 1NF, all columns in the table are required to depend on the primary key, and no column can be unrelated to the primary key, which means that a table only describes one thing; -Third Normal Form 3NF: On the basis of 2NF, non-primary attributes*

No transitive dependence* on the main attribute

-Transitive dependency: if C depends on B, and B depends on A, then C depends on A transitively;

-On the basis of 2NF, 3NF eliminates the dependence between non-primary attributes; for example, in a table, the primary attribute has (student number) and the non-primary attributes have (name, department, dean’s name), you can see the college The non-primary attribute of the long name depends on the department, and the transmission depends on the student number. The way to eliminate it is decomposition. Must first meet the second normal form (2NF), the requirement: each column in the table is only directly related to the primary key, not indirectly, (each column in the table can only depend on the primary key); What are the exceptions that do not meet the paradigm?

-Redundant data: some of the same data appears multiple times (such as student names); -Modification exception: the information in one record is modified, but the same information in the other record is not modified; -Deletion exception: If you delete a message, you will also lose other information (delete a course and lose a student's information); -Insertion exception: unable to insert (insert a student who does not have course information)

List several table connection methods?

-Inner Join: Only the rows that meet the join conditions in the two tables are combined as a result set

-Natural connection: only consider tuple pairs with the same attributes;

-Equivalent connection: query given conditions -Outer Join

-Left connection: All data in the table on the left is displayed, and the data in the table on the right only shows the part that is common, and there is no corresponding part to fill NULL;

-Right connection: opposite to left connection;

-Full Outer Join: Query all data in the left and right tables, but remove duplicate data in the two tables -Cross Join: Returns the Cartesian product of two tables (for tables containing m and n data respectively, returns the result of m*n)

What is a stored procedure?

What are the advantages and disadvantages?

A stored procedure is a collection of SQL statements that have been compiled and stored in the database in advance. When you want to implement the corresponding function, you only need to call this stored procedure (similar to a function, the input has output parameters).

advantage: -Pre-compilation, instead of compiling every time it runs, it improves the efficiency of database execution; -Encapsulates a series of operations. For some operations with more data interactions, compared to executing SQL statements alone, it can reduce the amount of network communication; -It has reusability, which reduces the workload of database development; -High security, allowing users without permission to indirectly manipulate the database through stored procedures; -More easy to maintain

Disadvantages: -Poor portability, the stored procedure binds the application to the database; -Complicated development and debugging: No good IDE; -Modifications are complicated, need to be recompiled, and sometimes need to update the code in the program to update the call

The difference between Drop/Delete/Truncate?

-Delete is used to delete all or part of the data of the table. After the delete is executed, the user needs to submit before it will be executed, which will trigger the DELETE** trigger** on the table (including An OLD virtual table can read-only access to the deleted data), the table structure is still after DELETE, deletion is very slow, delete line by line, because the log is recorded, you can use the log to restore the data; -Truncate deletes all data in the table, this operation cannot be rolled back, and will not trigger the trigger on this table. The operation is much faster than DELETE (drop the table directly, and then create a new table, the deleted data cannot be retrieved). If there is an AUTO_INCREMENT column in the table, reset it to 1; -Drop command to delete a table from the database, all data rows, indexes and constraints will be deleted; it cannot be rolled back, and triggers will not be triggered;

What is a trigger?

Trigger (TRIGGER) is an operation that is triggered by an event (such as INSERT/UPDATE/DELETE) (it cannot be called directly and cannot receive parameters). It is stored as an independent object in the database to ensure data integrity (for example, data can be verified or converted).

What are the types of constraints?

Constraint types: Primary Key constraints, Unique constraints, Check constraints, Non-empty constraints, Foreign Key constraints.

What is a view?

What is a cursor?

-View: A virtual table composed of data selected by query from the basic table of the database (the definition of the view is stored in the database). It can be added/deleted/modified/checked and other operations. A view is a reference to a number of basic tables, a virtual table, the result of the query statement execution, and does not store specific data (the basic table data has changed, the view will also change); it can be added, deleted, and modified just like the basic table Check operation (ps: Additions, deletions, and modifications are conditionally restricted); if the view generated by querying the tables cannot be performed, the addition, deletion, and modification of the view will affect the data of the original table. benefit:

-Ensure the security of the data by only giving users the permission to access the view;

-Simplify complex SQL operations and hide the complexity of data (such as complex connections); -Cursor: Used to locate a specific row in the result set returned by the query to operate on a specific row. Use the cursor to easily move and traverse the result set, scroll as needed or browse/modify the data in any row. Mainly used for interactive applications.

The realization principle of database index (B+ tree)

Comparison of using B-tree and B+ tree

InnoDB's index is implemented using B+ tree. The advantages of B+ tree compared to B tree:

-Few IO times: The middle node of the B+ tree only stores the index, and the data is stored in the leaf nodes, so the middle node can store more data, making the index tree more squat; -Range query is more efficient: B tree needs to traverse the entire tree in order, only B+ tree needs to traverse the linked list in the leaf node; -The query efficiency is more stable: each query needs to go from the root node to the leaf node, and the path length is the same, so the efficiency of each query is almost the same

Comparison of using B-tree index and hash index

Hash index can be searched in O(1) time, but only supports exact search, cannot be used for partial search and range search, cannot be used for sorting and grouping; B-tree index supports greater than or equal to search, range search. When the hash index encounters a large number of equal hash values, the search efficiency will be reduced. Hash index does not support data sorting.

Advantages of using indexes

-Greatly speed up the retrieving speed of data; -It can significantly reduce the time of grouping and sorting in the query; -By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed; -Change random I/O to sequential I/O (B+Tree index is ordered, and adjacent data will be stored together)

Disadvantages: It takes time and space to build and maintain indexes, and it is very slow to update indexes.

Under what circumstances will the index become invalid?

-LIKE statements beginning with "% (representing any 0 or more characters)"; -The index is not used at the same time before and after the OR statement; -Implicit conversion of the data type (such as varchar without single quotes, it may be automatically converted to int type); -For multi-column indexes, the leftmost matching principle/leftmost prefix principle must be satisfied (leftmost first, eg: multi-column indexes col1, col2, and col3, then the index is valid for col1 or col1, col2 or col1 ,Col2,col3); -If MySQL estimates that full table scans are faster than indexes, then indexes are not used (such as very small tables)

Where should I create an index?

-A certain column is often used as the maximum and minimum values; -Fields that are frequently queried; -Fields often used as table joins; -Fields that often appear after ORDER BY/GROUP BY/DISDINCT

What should I pay attention to when creating an index?

Expand

-It should only be built on small fields, not on large text or pictures (the more data stored on a page, the greater the data obtained by one IO operation, the higher the efficiency); -Indexed fields should be non-null. In MySQL, columns with null values are difficult to optimize for queries because they make indexes, index statistics, and comparison operations more complicated. Should use 0, a special value or an empty string instead of NULL; -Select fields with high data density (a large percentage of unique values in the total) for indexing

Classification of the index?

-Normal index -Unique index UNIQUE: The value of the index column must be unique, but null values are allowed; -Primary key index PRIMARY KEY: must be unique, no null values are not allowed (it is a special unique index; MySQL creates a primary key by default as a clustered index, but the primary key can also be a non-clustered index); -Single-column index and multi-column index/composite index (Composite): the number of indexed columns; -Covering index: The index contains all the data needed to satisfy the query. When querying, you only need to read the index without going back to the table to read the data; -Clustered index/non-clustered index: an index that reorganizes the physical addresses of data stored on the disk so that the data is sorted according to a specified rule (the physical order of the data is consistent with the order of the index). Therefore, only one clustered index can be created for each table (because the physical storage order needs to be changed). The advantage is that the query speed is fast, because the physical address of the required data can be obtained directly in sequence. The disadvantage is that the speed of making changes is slow. It is effective for values that need to be searched frequently. Non-clustered index only records the logical order, does not change the physical order; -Partition index (?) -Virtual index (Virtual): simulate the existence of an index without actually creating an index, which is used to quickly test the impact of creating an index on the execution plan. No relevant index segment, no increase in storage space usage

What is the difference between InnoDB and MyISAM, the two storage engines of MySQL?

-InnoDB support transaction, Commit and Rollback can be performed; -MyISAM only supports table-level locks, while InnoDB also supports row-level locks, which improves the performance of concurrent operations; -InnoDB supports foreign keys; -MyISAM crash has a much higher probability of damage than InnoDB, and recovery speed is also slower; -MyISAM supports compression tables and spatial data indexes, InnoDB needs more memory and storage; -InnoDB supports online hot backup

application scenarios

-MyISAM manages non-transactional tables. It provides high-speed storage and retrieval (MyISAM emphasizes performance, each query is atomic, and its execution speed is faster than InnoDB), as well as full-text search capabilities. If the table is relatively small, or read-only data (with a large number of SELECTs), you can still use MyISAM; -InnoDB supports transactions, has very good performance under concurrent conditions, and can basically replace MyISAM

hot backup and cold backup

-Hot backup: a method of backing up when the database is running. Advantages: It can be backed up by table or user. The database can still be used during backup and can be restored to any point in time. But can't go wrong -Cold backup: After the database is normally closed, the key files are copied to another location. Advantages: simple and fast operation, simple recovery

How to optimize the database?

SQL statement optimization

Analyze the slow query log: record the SQL statements whose response time exceeds the threshold long_query_time in MySQL, and use the log to find out the SQL with large IO and the SQL with missed indexes

Use Explain for analysis: Through the explain command, you can get the read order of the table, the operation type of the data read operation, which indexes can be used, which indexes are actually used, references between tables, and scanned The number of rows and other issues;

-Should try to avoid using !=, <, > operators in the where clause or null value judgment on the field, otherwise the engine will abandon the use of the index And perform a full table scan; -Only return the necessary columns: it is best not to use the SELECT

statement; -Return only necessary rows: use the LIMIT statement to limit the returned data; -Decompose a large join query into a single-table query for each table, and then associate in the application, so The benefits are:

-Make the cache more efficient. For connected queries, if one of the tables changes, the entire query cache cannot be used. For multiple queries after decomposition, even if one of the tables changes, the query cache for other tables can still be used;

-Decompose into multiple single-table queries, the cached results of these single-table queries are more likely to be used by other queries, thereby reducing redundant queries;

-Reduce lock contention

Optimization of index

Pay attention to the conditions that will cause the index to fail, and create an index where appropriate

Optimization of database table structure

-Follow the Three Paradigms when designing the watch; -Choose the appropriate data type: Do not store NULL fields as much as possible; use simple data types (int, varchar/ text); -Table Horizontal Sharding (Sharding): split the records in the same table into multiple tables with the same structure (strategy: hash modulo; divide according to ID range). When the data of a table continues to increase, Sharding is an inevitable choice. It can distribute the data to different nodes in the cluster, thereby alleviating the pressure of a single database; -Table Vertical Segmentation: Split a table into multiple tables by column. You can put infrequently used fields in the same table separately; put large fields in a table independently; or put frequently used fields (closely related) in a table. After the vertical segmentation, the business is clearer, the integration or expansion between systems is easy, and the data maintenance is simple

Optimization of system configuration

-Operating system: increase the number of queues supported by TCP; -MySQL configuration file optimization: buffer pool size and number settings

Optimization of hardware

-Disk performance: solid state drive; -CPU: multi-core and high frequency; -Memory: Increase the memory

What is master-slave replication?

What is the realization principle?

Master-slave replication (Replication) means that data can be replicated from a MySQL database master server to one or more slave servers, and the slave servers can replicate all databases or specific databases or specific tables in the master server. The asynchronous mode is adopted by default.

Implementation principle: -Primary server binary log dump thread: Write the data change (addition, deletion, and modification) log of the primary server into the Binary log; -Slave server I/O thread: Responsible for reading the binary log from the main server and writing to the local Relay log; -Slave server SQL thread: Responsible for reading the Relay log, analyzing the data changes that the master server has executed, and replaying it in the slave server (Replay) to ensure the consistency of the master and slave data

Why do you want master-slave replication?

-Read and write separation: the master server is responsible for writing, and the slave server is responsible for reading

-The contention of the lock is alleviated, even if the lock is added to the main server, the read operation can still be performed;

-The slave server can use MyISAM to improve query performance and save system overhead;

-Increase redundancy and improve availability -Real-time data backup, when a node in the system fails, you can easily failover -Reduce the frequency of disk I/O access of a single server and improve the I/O performance of a single machine