Today was tutorial day at the MySQL Users Conference. I got there early to register and get some breakfast before the first session.
The tutorial day is different from the rest of the conference in that there are only two sessions, each of three hours. The length of these sessions means the speakers can go into more detail than in a normal 45 minute session.
In the morning session I attended a session given by Brian Aker called MySQL 5.1 in Depth. Brian is the director of architecture for MySQL and a very good speaker. He also has some amazing hair which you can only begin to appreciate from his photo.
The next version of MySQL promises some exciting features. First up, partitioning. This has the potential to increase database performance because only those partitions that are needed are actually accessed for a query. For example, you could partition a table by date range:
CREATE TABLE RANGE_BY_DATE
(
customer_number INT NOT NULL,
customer_order VARCHAR(50) NOT NULL,
customer_order_date DATETIME NOT NULL)
PARTITION BY RANGE(YEAR(CUSTOMER_ORDER_DATE))
(
PARTITION P1 VALUES LESS THAN (2000),
PARTITION P2 VALUES LESS THAN (2003),
PARTITION P3 VALUES LESS THAN (2005),
PARTITION P4 VALUES LESS THAN MAXVALUE
)
;
Here we have partitioned the table into four partitions. Now a query selecting data from 2001 would only ever need to touch a single table.
There are other ways to partition a table as well, for example, ‘hash’ partitioning is good for spreading out data across a number of spindles:
CREATE TABLE HASH_EXAMPLE (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY HASH(col1)
(
PARTITION P1 DATA DIRECTORY = '/var/lib/mysql/data1',
PARTITION P2 DATA DIRECTORY = '/var/lib/mysql/data2',
PARTITION P3 DATA DIRECTORY = '/var/lib/mysql/data3',
PARTITION P4 DATA DIRECTORY = '/var/lib/mysql/data4',
)
;
If we put each of these data files on separate disks we can gain a reduction in I/O contention.
You can even have sub or composite partitions:
CREATE TABLE sub_example
(
customer_number INT NOT NULL,
customer_order VARCHAR(50) NOT NULL,
customer_order_date DATETIME NOT NULL,
customer_service_region INT NOT NULL)
PARTITION BY RANGE(YEAR(customer_order_date))
SUBPARTITION BY HASH(customer_service_region)
(
PARTITION P1 VALUES LESS THAN (2000) (SUBPARTITION S0, SUBPARTITION S1),
PARTITION P2 VALUES LESS THAN (2003) (SUBPARTITION S2, SUBPARTITION S3),
PARTITION P3 VALUES LESS THAN (2005) (SUBPARTITION S4, SUBPARTITION S5),
PARTITION P4 VALUES LESS THAN MAXVALUE (SUBPARTITION S6, SUBPARTITION S7)
)
;
There are a few things to keep in mind:
- Any storage engine may be partitioned
- There is a limit of 1024 partitions (including any subpartitions)
- If a value doesn’t match a ruleset it will be placed in the final partition
- Hash partitioning can give you high insertion rates if you put each partition on a separate spindle. However, it doesn’t make much sense for InnoDB tables as InnoDB only uses one big table space anyway.
- You don’t necessarily want to create hundreds of partitions in an attempt at future-proofing your database design as this will be a problem when using engines like MyISAM that will then have to open up hundreds of file descriptors.
Aparently ALTER TABLE queries will case a physical rebuild of the table although Brian mentioned that some engines are now being smarter about this and may not necessarily require a rebuild for every ALTER TABLE query.
MySQL 5.1 also finally gets disk-based clustering so you no longer need twice your database’s size of RAM to use cluster (NDB). Indexes still need to be kept in memory however.
You create the tablespace using SQL rather than my.cnf configurations:
CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 12M
ENGINE NDB;
CREATE TABLE t1
(pk1 INT NOT NULL PRIMARY KEY,
b INT NOT NULL,
c INT NOT NULL
)
TABLESPACE ts1 STORAGE DISK
ENGINE NDB;
A cool feature, especially for admins, is the task scheduler. This is basically cron inside the database. There is a new object called an ‘event’ that can be used to create one-time or recurring tasks. These are just stored procedures, eg:
DELIMITER //
CREATE EVENT optimise_tables
ON SCHEDULE EVERY 1 WEEK
STARTS '2006-03-05 1:00:00'
ON COMPLETION PRESERVE
DO
BEGIN
OPTIMIZE TABLE test.table1;
OPTIMIZE TABLE test.table2;
END
//
I like the sound of these. They let you incorporate regular maintenance tasks into the database rather than relying on external scripts. Also, being stored procedures, you can make them as smart as you want.
MySQL 5.0 had mysqlimport but in 5.1 it is has a new option “–use-threads” to let you specify multiple threads. This is basically a wrapper around LOAD DATA INFILE
. Apparently it is even advantageous to use two threads on a single core machine.
As I mentioned yesterday, the General and Slow Query Log are now system tables. This is cool! This means switching on the general query log is as simple as set global general_log=1;
. Before 5.1, switching the general query log on and off required a database restart! No more staying up late or working weekends just so you can switch the general query log on and off for a few hours. 🙂
Here is what is recorded in the logs:
mysql> desc slow_log;
+----------------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+-------------------+-------+
| start_time | timestamp | NO | | CURRENT_TIMESTAMP | |
| user_host | mediumtext | NO | | | |
| query_time | time | NO | | | |
| lock_time | time | NO | | | |
| rows_sent | int(11) | NO | | | |
| rows_examined | int(11) | NO | | | |
| db | varchar(512) | YES | | NULL | |
| last_insert_id | int(11) | YES | | NULL | |
| insert_id | int(11) | YES | | NULL | |
| server_id | int(11) | YES | | NULL | |
| sql_text | mediumtext | NO | | | |
+----------------+--------------+------+-----+-------------------+-------+
11 rows in set (0.00 sec)
Much more readable than the current slow query log. What’s even cooler is that these tables use the CSV engine which is both human and MS Excel readable so you can either tail
the table file or load it straight into Excel. You could also use the task scheduler to create an event that truncates the log tables every day.
mysqlslap is a new command-line load-testing tool that can simulate many concurrent connections to the database and repetetively run load tests. It is kind of like Apache’s ab but for databases.
It has a few cool features like autogeneration of SQL and can sequentially test different engines. For example you could pass a list of engines like this:
--engine=innodb, falcon, myisam
and compare the performance across the different storage engines.
Because this is a MySQL client utility you can run it against any version of MySQL >= 3.23.
There are a few new storage engines for 5.1 including PBXT which is an engine that promises the speed if myisam with transactions (although not fully ACID). This is implemented with 5.1’s new plugin architecture which means you can drop this storage engine directly into your running database. There is the crazy HTML storage engine implemented with GETs and PUTs and the AWS storage engine which Mark Atwood is giving a session on on Wednesday.
For the second session, I went to hear Domas Mituzas talk about what goes on behind the scenes at Wikipedia. Domas was a funny guy and it was interesting to hear how Wikipedia operates with mostly volunteer sysadmins and cheap commodity hardware.
They use a lot of custom code to handle things like database load balancing and caching. They use PowerDNS to handle geo-distribution.
A lot of effort goes into caching; not just ensuring as much is cached as possible but that the cache can easily be invalidated when an object is updated. Basically, Wikipedia’s policy is to cache as much as possible inside their network but once it leaves the Wikipedia network the private cache-control bit is set so clients will always request a new copy of the data.
Wikipedia do about 30,000 requests per second so logging to files is out of the question. They send their logs to the network where anyone that is interested in them can pick them up.
Some of the customisations to the standard LAMP stack are:
- PHP’s strtr() function has been replaced with FastStringSearch
- Imagemagick is used for thumbnailing rather than PHP GD
- They use APC as a bytecode cache; Domas thinks eaccelerator is too buggy
- There is a Tex filter to render formulae and other scientific content that is written in ocaml and no-one knows how to change it!
Wikipedia has realtime profiling written into their code. Also, they use xdebug combined with kcachegrind which creates some nice output showing a tree of calls and execution times. This is something that could be useful to us at the moment as we are working on profiling our code.
All their database servers run RAID0 – they expect servers to fail and so they put their redundancy into the servers themselves rather than the disks. It is easier for them to promote a slave database to master rather than run a server on a degraded RAID array or wait for its database to recover after a crash.
The text of Wikipedia articles is not actually stored in the database tables. The tables only contain metadata which points to external storage (more cheap disks) where the text of articles is stored (in compressed format).
Many of the Wikipedia tables have more indexing than data reflecting the high number of reads compared to writes.
It was also interesting to hear about the different culture of Wikipedia. It is mostly run by volunteers on cheap hardware and their attitude to uptime is what you might call relaxed but it seems to work for them. As Domas said, they don’t mind too much if they have a little downtime because when Wikipedia goes down it displays the “please donate” screen. 🙂