Tuesday, 10 November 2015

MySQL Indexes

For sometime now we have been looking at MySQL database. Today we will continue to dig deep in MySQL database, but we will be discussing MySQL Indexes.

Indexes allows a MySQL database to be searched more quickly and faster. Although a MySQL database can still be searched without an index but as the database begin to grow large there will be a need for an index which will be used to identify each rows in a table and hence makes searching the table a very smooth and fast one.

Types Of Indexes

INDEX
PRIMARY KEY
FULLTEXT
Creating an Index

The way to achieve fast searches is to add an index, either when creating a table or at any time afterwards. But the decision is not so simple. You must decide which columns require an index, a judgement that requires you to predict whether you will be searching any of the data in those columns. Indexes can also get complicated, because you can combine multiple columns in one index. And even when you’ve gotten to grips with all of that, you still have the option of reducing index size by limiting the amount of each column to be indexed. You can add an index to an existing table with the command below;


ALTER TABLE staff ADD INDEX(employee(20));
1
ALTER TABLE staff ADD INDEX(employee(20));
Also you can add index while creating a table using CREATE INDEX. The two options are equivalent, except that CREATE INDEX cannot be used to create an index of type PRIMARY KEY.


CREATE TABLE staff (
employee VARCHAR(65),
position VARCHAR(65),
department VARCHAR(50),
INDEX(employee(20)),
INDEX(position(16))) ENGINE MyISAM;

CREATE TABLE staff (
employee VARCHAR(65),
position VARCHAR(65),
department VARCHAR(50),
INDEX(employee(20)),
INDEX(position(16))) ENGINE MyISAM;
Primary key

The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain unique values. A primary key column cannot contain NULL values. Most tables should have a primary key, and each table can have only ONE primary key. You can add a primary key to a table using either the commands below.


ALTER TABLE staff ADD position VARCHAR(20) PRIMARY KEY;

ALTER TABLE staff ADD position VARCHAR(20) PRIMARY KEY;
OR


CREATE TABLE staff (
employee VARCHAR(65),
position VARCHAR(65),
department VARCHAR(50),
PRIMARY KEY (position(20))) ENGINE MyISAM;

CREATE TABLE staff (
employee VARCHAR(65),
position VARCHAR(65),
department VARCHAR(50),
PRIMARY KEY (position(20))) ENGINE MyISAM;
FULLTEXT

Unlike a regular index, a FULLTEXT index in MySQL allows super-fast searches of entire columns of text. What it does is store every word in every data string in a special index that you can search using “natural language,” in a similar manner to using a search engine.

Below  are some things that you should know about FULLTEXT indexes:

FULLTEXT indexes can be used only with MyISAM tables, the type used by MySQL’s default storage engine (MySQL supports at least 10 different storage engines). If you need to convert a table to MyISAM, you can usually use the MySQL command ALTER TABLE tablename ENGINE = MyISAM; .
FULLTEXT indexes can be created for CHAR , VARCHAR , and TEXT columns only.
A FULLTEXT index definition can be given in the CREATE TABLE statement when a table is created, or added later using ALTER TABLE (or CREATE INDEX ).
For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index than it is to load data into a table that has an existing FULLTEXT index.
You can create a FULLTEXT index with the command below:

MySQL

ALTER TABLE staff ADD FULLTEXT(employee,position);

ALTER TABLE staff ADD FULLTEXT(employee,position);

MYSQL - The future of ALTER IGNORE TABLE syntax

"IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table
or if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key
errors occur. If IGNORE is specified, only the first row is used of
rows with duplicates on a unique key. The other conflicting rows are
deleted. Incorrect values are truncated to the closest matching
acceptable value."
This creates several issues for the MySQL server team:
  1. IGNORE could remove rows from a parent table when using a foreign key relationship.
  2. IGNORE makes it impossible to use InnoDB Online DDL for several operations, for example adding a PRIMARY KEY or UNIQUE INDEX.
  3. IGNORE has some strange side-effects for replication. For example: DDL is always replicated via statement-based replication, and since SQL does not imply ordering, it's not clear which rows will be deleted as part of the ignore step. I also see cross-version replication problematic if future MySQL versions were to introduce more strictness, since a slave may de-duplicate more rows.

The most common case

We believe that the most common use case for IGNORE is to be able to add a UNIQUE INDEX on a table which currently has duplicate values present. i.e.
ALTER IGNORE TABLE users ADD UNIQUE INDEX (emailaddress);
In this scenario, a novice user manages to avoid auditing each entry in the users table, and simply lets MySQL pick a row to be kept, with all duplicates automatically removed.
There are two other ways to be able to do that:

Hand removal

Using the same an example as above, return a list of email addresses and PRIMARY KEY values for records that conflict:
SELECT GROUP_CONCAT(id), emailaddress, count(*) as count FROM users 
GROUP BY emailaddress HAVING count >= 2;

/* delete or merge duplicate from above query */

ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
Note: This method will be the fastest way, since when not usingIGNORE, MySQL is able to use InnoDB's Online DDL.

New table + INSERT IGNORE

While this method looks very similar, internally it's semantics are quite different:
CREATE TABLE users_new LIKE users;
ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO users_new SELECT * FROM users;
DROP TABLE users;
RENAME TABLE users_new TO users;
By creating a table first, the MySQL server will not have to manage rows in a foreign key relationship. The rows will also be re-sent to the slave using row-based replication, so issue (3) I mentioned above does not come into play.

Switching off referential integrity in MySQL

If you are loading very large quantities of data into MySQL, it sometimes makes sense to switch off foreign key constraints to enable the data to load faster. There are two commands for doing this:


mysql> SET foreign_key_checks = 0;

That will switch off foreign key checks for the current MySQL session, and:


mysql> SET GLOBAL foreign_key_checks = 0;

That will switch off foreign key checks at a MySQL server level.

Finally, to check the current setting of foreign_key_checks use this command:

mysql> SHOW Variables WHERE Variable_name='foreign_key_checks';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| foreign_key_checks | OFF   |
+--------------------+-------+
1 row in set (0.00 sec)

Killing all running queries on MySQL

On occasion you might need to kill all currently running queries against your MySQL server, without having to restart the service.

I do a lot of work on ETL (Extract Transform Load) applications, where you may have very long-running queries hanging around locking resources required by your data loader, that you will want to kill before starting a new data load.

Another scenario is where you have an application that is misbehaving by issuing many queries that are impacting other users of the MySQL server: so long as each application is connecting using a different MySQL user account (this should be a given), you can kill the queries for just that user.

MySQL greater than version 5.1, it will store the process ID of each client connection running a query in the information_schema database. We can query this database, then build up a series of KILL statements dynamically to kill each running query. Here is the main statement:


mysql> SELECT GROUP_CONCAT(CONCAT('KILL QUERY ',id,';') SEPARATOR ' ') FROM information_schema.processlist WHERE user <> 'system user' INTO OUTFILE '/tmp/killqueries.sql';

One you run that, check the contents of the /tmp/killqueries.sql in another terminal you should see something like this:


-bash-4.1$ cat /tmp/killqueries.sql
KILL QUERY 7; KILL QUERY 6;

Back at your MySQL prompt, you can now run that script directly:


mysql> SOURCE /tmp/killqueries.sql

Finally, if you only want to kill the queries belonging to a specific MySQL user account, you can modify the original query like so:


mysql> SELECT GROUP_CONCAT(CONCAT('KILL QUERY ',id,';') SEPARATOR ' ') FROM information_schema.processlist WHERE user = 'baduser' INTO OUTFILE '/tmp/killqueries.sql';

Technical screening questions for a MySQL DBA

In the past year, I have interviewed dozens of DBAs with a view to hiring a MySQL DBA. I have found that while many DBAs from other backgrounds, for example MS SQL or Oracle, claim to also have MySQL experience, a lot of them fail the screening once we get into the technical nuances of MySQL specifics.
For example, here are a few typical screening questions I ask, along with the approximate answers I expect to get back:

What are the two main storage engines in MySQL, and when would you use either?

MyISAM and InnoDB. InnoDB is transaction-safe, has row-level locking (MyISAM has table level locking), and supports foreign keys. MyISAM does not support foreign keys and is not ACID compliant, so you should only consider using MyISAM for new projects if you have specific reasons to do so (e.g. to use full-text support in MyISAM). This is basic MySQL stuff, but I have had candidates struggle to list off a few.

What kind of replication can you have with MySQL?

Here I am looking for experience in setting up master-slave and master-master configurations, along with using load balancers. It would be good if a candidate also knows what types of replication there are (row level versus query level, which is based on the query log). Another gotcha: trouble-shooting replication lag.

What are the different levels of transaction isolation level? What are they trying to prevent?

There are four levels, ranging from READ UNCOMMITED (not safe as it allows dirty reads) to SERIALIZABLE. READ COMMITED prevents dirty reads, REPEATABLE READ prevents non-repeatable reads, and SERIALIZABLE prevents phantom reads. A candidate should know what these are.

How would you make a default MySQL install secure?

Set the root user password, remove remote root user access, remove test database, remove anonymous user, set up new application-specific accounts with strong passwords and tight host and grant access, enable SSL etc.
I have a longer list that I work through, but this is a good sample. To be honest, given the widespread use of MySQL and the amount of highly-paid work available, I am surprised that there are not more DBAs building a career around it. If your are interested in pursuing this career however, the above areas are well worth studying.

Converting a MySQL database from latin1 to utf8

Introduction
When you create a new database on MySQL, the default behaviour is to create a database supporting the latin1 character set. This is fine for most use cases, however if your application needs to support natural languages that do not use the Latin alphabet (Greek, Japanese, Arabic etc.), then you will need to convert your database to use UTF-81 instead.

In this tutorial, I will show you have to convert an existing database and tables from latin1 to the utf8 character set. Note that I am using MySQL 5.5.34.

Creating a test database
Firstly we are going to create a test database for testing the migration process. Note that if you are going to migrate a real database, you should run this procedure against an offline backup, not against a live production system.

Lets begin by creating a new database with a simple table:


mysql> create database testmigration;
mysql> use testmigration;
mysql> create table test (sometext varchar(50));

The new database and table will have the latin1 character set by default on a stock MySQL installation, lets confirm this:

 mysql> show full columns from test;
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field    | Type        | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| sometext | varchar(50) | latin1_swedish_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)

mysql> select default_character_set_name from information_schema.schemata where schema_name = "testmigration";
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1                     |
+----------------------------+
1 row in set (0.00 sec)

If I try to store some non-Latin characters in the test.sometext field, it will become garbled.

Doing the migration to UTF-8
To begin with, we will alter the default character set on the new database to be utf8, which will ensure that any new tables in this database will use this by default:

mysql> alter database testmigration character set utf8 collate utf8_general_ci;

Query OK, 1 row affected (0.00 sec)

select default_character_set_name from information_schema.schemata where schema_name = "testmigration";

+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8                       |
+----------------------------+
1 row in set (0.00 sec)

That takes care of new tables, but for existing tables we have to do something a little more complex. Drop back to your bash terminal, and run the following command:

 1
$ mysqldump --add-drop-table -u root --password='password' testmigration | sed -e 's/CHARSET\=latin1/CHARSET\=utf8\ COLLATE\=utf8_general_ci/g' | iconv -f latin1 -t utf8 | mysql -u root --password='password' testmigration

The command uses the mysqldump command to dump the database to standard out, then sed is used to replace latin1 with utf8 in the dump, iconv is used to convert the dump from latin1 character encoding to utf8, and finally the mysql command is used to restore the resulting backup to the database server.

Testing
To confirm that this worked, log back in again and inspect the table as before:


mysql> use testmigration;
mysql> show full columns from test;
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field    | Type        | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| sometext | varchar(50) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)

mysql> show create table test;
+-------+--------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                     |
+-------+--------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `sometext` varchar(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)</code></pre>

You should now be able to safely insert in non-Latin characters from your clients.

Finding the table containing a column name in MySQL

Sometimes when you are working with a legacy database schema that you are unfamiliar with, you are confronted with column names in error messages but not the table name that contains the column in question. An example of this is the 1048 error in MySQL, where some code has attempted to write a record to the database with a null value for a column that contains a NOT NULL constraint:
SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'ColumnName' cannot be null
Great, but what table contains the column "ColumnName"? Answering that question can be particularly hard if there are many tables in the database schema.
Luckily you can query the MySQL meta data to find out:

SELECT DISTINCT TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME IN ('ColumnName')
    AND TABLE_SCHEMA='DatabaseName';

Adding a read-only MySQL user

Sometimes you may need to let a user have access to your MySQL database, for example for viewing data directly during testing or for running reports, but you do not want these users to update data or make schema changes.
It is actually quite easy to set up a read-only account in MySQL using the GRANT statement, which is ideal for these purposes.

Adding the new MySQL user

Connect to your database as root, then add your new user like so:
CREATE USER 'tester'@'%' IDENTIFIED BY 'password';
 
The % here means the user 'tester' connecting from any host, you can place a network hostname here instead if you want to restrict access further. Naturally you will also want to substitute password with something a little stronger ;-)
Now run the following to grant the SELECT privilage to the new user on all databases: 

GRANT SELECT ON *.* TO 'tester'@'%';

Or if you want to restrict access to only one database:

GRANT SELECT ON database.* TO 'tester'@'%';

Confirming that it worked
 
Connect as the new user:
mysql -u tester -p

And now see if you can update a record: 
mysql> use alpha;
mysql> update Person set version_num = 5 where OID = 1;
ERROR 1142 (42000): UPDATE command denied to user 'tester'@'localhost' for table 'Person'




Adding a composite unique key to an existing MySQL table

Introduction

Composite unique keys are very useful for when you want to prevent duplicate data across a number of database fields in a table. For example, suppose we have the following sample table called "user_settings" to store users settings for our application:
Field nameField type
idint (PK)
user_idint (FK)
setting_keyvarchar
setting_valuevarchar
For each user of the system, we only want to have one setting_key value for each possible setting_key. Therefore a composite unique key would be nice to enforce this: if user_id and setting_key are always unique when combined, this would ensure that no one user has the same setting stored more than once.
This is easy to achieve when creating a new table, but what if the table already exists and contains data?

Checking for existing violations

Before we can add the new unique constraint, we must first check to see of we have any violations and remove them. If we don't do this MySQL will prevent us from adding the new constraint.
The following query will find duplicated pairs for the constraint target fields:
 1 
SELECT user_id, setting_key, COUNT(*) c FROM user_settings GROUP BY user_id, setting_key HAVING c > 1;
If that gives you duplicates back, then you will need to remove them before moving onto the next step.

Adding the new composite unique key

To add the missing constraint:
 1 
ALTER TABLE user_settings ADD UNIQUE KEY `uk_user_settings` (user_id, setting_key);
By convention, I like to prefix unique key names with "uk_" but you can use whatever name you like here. Now to confirm that the key is in place on the table, try:
 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
 10 
mysql> SHOW INDEXES FROM user_settings;
+------------------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table            | Non_unique | Key_name              | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user_settings    |          0 | PRIMARY               |            1 | id          | A         |      133856 |     NULL | NULL   |      | BTREE      |         |               |
| user_settings    |          0 | uk_user_settings      |            1 | user_id     | A         |         200 |     NULL | NULL   | YES  | BTREE      |         |               |
| user_settings    |          0 | uk_user_settings      |            2 | setting_key | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
| user_settings    |          1 | fk_user_settings_1    |            1 | user_id     | A         |       66928 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.01 sec)

Monday, 9 November 2015

How to use ON DUPLICATE KEY for UPDATE

First a sample table
mysql> drop database if exists ali;
Query OK, 1 row affected (0.10 sec)

mysql> create database ali;
Query OK, 1 row affected (0.00 sec)

mysql> use ali;
Database changed
mysql> CREATE TABLE test
    -> (
    -> id int(11) unsigned NOT NULL AUTO_INCREMENT,
    -> external_id int(11),
    -> number smallint(5),
    -> value varchar(255),
    -> UNIQUE INDEX (external_id, number),
    -> PRIMARY KEY(id)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.09 sec)

mysql>
Let's insert an initial row and SELECT it back
mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (200,15,'Y')
    -> ON DUPLICATE KEY UPDATE number = number + 1;SELECT * FROM test;
Query OK, 1 row affected (0.06 sec)

+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     15 | Y     |
+----+-------------+--------+-------+
1 row in set (0.00 sec)

mysql>
Let's insert the same thing again
mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (200,15,'Y')
    -> ON DUPLICATE KEY UPDATE number = number + 1;SELECT * FROM test;
Query OK, 2 rows affected (0.10 sec)

+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     16 | Y     |
+----+-------------+--------+-------+
1 row in set (0.00 sec)

mysql>
OK let's try the value column at the same time
mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (201,15,'X')
    -> ON DUPLICATE KEY UPDATE
    -> number = number + 1,value = VALUES(value);
Query OK, 1 row affected (0.07 sec)

mysql> SELECT * FROM test;
+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     16 | Y     |
|  5 |         201 |     15 | X     |
+----+-------------+--------+-------+
2 rows in set (0.00 sec)

mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (201,15,'X')
    -> ON DUPLICATE KEY UPDATE
    -> number = number + 1,value = VALUES(value);
Query OK, 2 rows affected (0.06 sec)

mysql> SELECT * FROM test;
+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     16 | Y     |
|  5 |         201 |     16 | X     |
+----+-------------+--------+-------+
2 rows in set (0.00 sec)

mysql>
This time change value and number
mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (202,15,'Z')
    -> ON DUPLICATE KEY UPDATE
    -> number = number + 1,
    -> value = VALUES(value);
Query OK, 1 row affected (0.07 sec)

mysql> SELECT * FROM test;
+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     16 | Y     |
|  5 |         201 |     16 | X     |
|  8 |         202 |     15 | Z     |
+----+-------------+--------+-------+
3 rows in set (0.00 sec)

mysql> INSERT INTO test (external_id,number,value)
    -> VALUES (202,15,'A')
    -> ON DUPLICATE KEY UPDATE
    -> number = number + 1,
    -> value = VALUES(value);
Query OK, 2 rows affected (0.07 sec)

mysql> SELECT * FROM test;
+----+-------------+--------+-------+
| id | external_id | number | value |
+----+-------------+--------+-------+
|  1 |         200 |     16 | Y     |
|  5 |         201 |     16 | X     |
|  8 |         202 |     16 | A     |
+----+-------------+--------+-------+
3 rows in set (0.00 sec)
The problem is that UPDATE KEY is not recursive. In you last INSERT if you already have number 16, it will not change 15 to 17 (next available value). It will only try 16, and will fail due to duplicate key. 
id   external_id         number         value
1   200    16    Y
3   201    16    X
5   202    16    Z
6   202    15    Z
INSERT INTO test (external_id,number,value)
    VALUES (202,15,'Z')
     ON DUPLICATE KEY UPDATE
    number = number + 1,
     value = VALUES(value);
Error:
#1062 - Duplicate entry '202-16' for key 'external_id'
USE INSERT IGNORE KEYWORD 
ERROR WILL NOT COME AND RECORD WILL BE SKIPPED TO INERT
INSERT IGNORE INTO test (external_id,number,value)
    VALUES (202,15,'Z')
     ON DUPLICATE KEY UPDATE
    number = number + 1,
     value = VALUES(value);
To use INSERT ON DUPLICATE UPDATE syntax you must have a unique index indicating the columns you want to prevent duplicates in. So if you have say five specific columns that can't have duplicate values then you create your unique index on those five columns UNIQUE(col3,col6,col22,col34,col45) for example. So you could have duplicate values within each of those columns or three of five for example, but all five of those values can't be duplicates of another row with those five values already existing. At that point it would update instead of insert.