Category: Dev

  • MySQL Users Conference 2007 – Day 2

    MySQL Users Conference 2007 – Day 2

    Today was the first day of the main conference and at breakfast I met Daniel Bernholc from TradeDoubler‘s Stockholm office. It was interesting to hear about the TradeDoubler architecture. They do 25,000 RPS using only about 8 Sun Opteron web servers. All their application code is written in Java and they use a custom written Apache module for delivery.

    Guy Kawasaki gave a very entertaining keynote called “The Art of Innovation” and Brian Stevens from Red Hat talked about the OLPC project.

    The first session I went to was Rasmus talking about PHP performance and security. There is a new MySQL driver for PHP called MySQL ND (Native Driver) which should, in theory, give better performance but it seems in real applications this is not the case.

    Rasmus used an example of a single Drupal page view to demonstrate performance tuning. Like Domas Mituzas, he also recommends using xdebug with kcachegrind for profiling. He also demonstrated using callgrind running via valgrind to provide input to the kcachegrind visualizer. He started apache under vallgrind:

    valgrind --tool=callgrind --dump-instr=yes -v /usr/sbin/httpd -X
    chmod a+rw callgrind.out.XXXX

    A good tip when analysing kcachegrind output is if there are lots of boxes, that is generally a good thing. It means you are “distributing” your processing and no one part of your code is taking all the CPU time.

    Also like Domas, Rasmus recommends the APC opcode cache.

    The second part of Rasmus’s session was on PHP security. I found this very interesting as he went over the different types of XSS attacks and how to avoid them. You can check if a page is vulnerable to XSS by seeing if you can run alert(). If you can run an alert() you can run any javascript in the user’s browser. Basically, to avoid XSS problems you just need to make sure you do not display user data back to the user unfiltered. To help with this there is the filter PHP extension. This comes as standard in PHP 5.2 but can be used with 5.0/5.1 using PECL. XSS vulnerabilities are very common on the web and he showed us several examples of pages that were vulnerable including the IBM Developerworks article on how to prevent XSS atacks!

    Rasmus then moved on to Cross Site Request Forgery (XSRF). Basically, this is when data is submitted to a page from an unauthorised source. The way around this kind of attack is to use what Rasmus calls “crumbs”. These are bits of data in the submission form that contain user-specific data.

    The slides from his talk are here.

    The session by two guys from digg.com about their architecture was interesting. They scale by having many web servers behind a load balancer talking to many MySQL slave instances. Each web server opens connections to a random DB server. A memcached server farm is used to only cache chunks of content, not whole pages. A lot of people seem to be talking about memcached at this conference and it is something I had heard of but never really looked at. It looks pretty neat but most of the people talking about it here have far more reads than writes to their database so I’m not sure how useful it would be for us.

    Sharding is another technique that a lot of people seem to be implementing. Sharding is basically partitioning your database over a number of machines, breaking a big DB into smaller ones. But unlike, say, cluster, you need to manage the shards yourself in your application code. This can give performance improvements but it comes at the expense of more complicated code and you can’t do SQL joins across shards. Those are some pretty big downsides so it won’t suit everyone.

    Again, they recommend using the APC opcode cache. I was surprised to hear that Digg only have about 30GB of data.

    The SQL Antipatterns talk by Bill Karwin from Zend was also good. It was a quick run through common mistakes people make with their SQL:

    Comma separated lists
    This means storing a list of values in a varchar column and has a number of problems:

    • You can’t enforce referential integrity
    • It is hard to add a value to a list
    • It is difficult to search
    • You can’t even be sure the values in the list are integers

    If you have this problem you probably have a many to many relationship which can be solved using an intersection table.

    Ambiguous GROUP BY
    This is the old problem of MySQL not enforcing the SQL standard for GROUP BY. You can use non-aggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. This can cause some unexpected results, as the manual says:

    Do not use this feature if the columns you omit from the GROUP BY part are not constant in the group. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

    You can solve this problem by either following the SQL standard or using the GROUP_CONCAT function.

    Entity-Attribute-Value Tables
    This is a common way to store things like user attributes or preferences but there are some problems with this approach:

    • There is no way to declare an attribute to be NOT NULL
    • You cannot enforce referential integrity
    • It is expensive to reconstruct the attributes in a single row

    These problems can be solved by:

    • Only use it as a last resort
    • Enforcing constraints on attribute values in the application logic
    • Don’t fetch all the data as a single row causing big joins

    Other tips were not letting users run their own ad-hoc SQL queries on the DB(!) and avoiding SQL injection by using prepared statements where values are replaced after the SQL has been parsed.

    The performance tuning sessions are always popular and Peter Zaitsev is a smart guy. His session on server performance tuning was standing room only. He explained that you can get the most benefit out of tuning just a few parameters like innodb_buffer_pool_size.

    Another interesting session was given by Farhan Mashraqi from Fotolog. According to Alexa this is one of the top 25 sites on the web with millions of members. It is different from something like flickr because users may only upload a single image per day (Gold members get 6).

    Their site runs on about 30 Solaris servers divided into four clusters (there’s that sharding again). Each shard has a number of tables that were originally partitioned by username but this gave uneven distribution (which seems obvious if you think about it.) They use Akamai and I was interested to hear that they rewrote their entire site from PHP into Java. The reason the CTO gave was that the developers hated PHP…

    Originally their site ran on MyISAM tables which predictably caused concurrency problems as their traffic grew but I was interested to hear that InnoDB will also create table locks when inserting an autoincrement primary key.

    Farhan then went into some detail about how they managed to lower their I/O by increasing the number and reordering the columns referenced in their primary key. I want to go over his presentation again when it is posted on the conference site as he seemed to gain a huge performance increase from this but I suspect it is quite specific to their data.

  • MySQL Users Conference 2007 – Tutorial Day

    MySQL Users Conference 2007 – Tutorial Day

    Today was tutorial day at the MySQL Users Conference. I got there early to register and get some breakfast before the first session.

    The tutorial day is different from the rest of the conference in that there are only two sessions, each of three hours. The length of these sessions means the speakers can go into more detail than in a normal 45 minute session.

    In the morning session I attended a session given by Brian Aker called MySQL 5.1 in Depth. Brian is the director of architecture for MySQL and a very good speaker. He also has some amazing hair which you can only begin to appreciate from his photo.

    The next version of MySQL promises some exciting features. First up, partitioning. This has the potential to increase database performance because only those partitions that are needed are actually accessed for a query. For example, you could partition a table by date range:

    CREATE TABLE RANGE_BY_DATE
    (
        customer_number INT NOT NULL,
        customer_order VARCHAR(50) NOT NULL,
        customer_order_date DATETIME NOT NULL)
    PARTITION BY RANGE(YEAR(CUSTOMER_ORDER_DATE))
    (
    PARTITION P1 VALUES LESS THAN (2000),
    PARTITION P2 VALUES LESS THAN (2003),
    PARTITION P3 VALUES LESS THAN (2005),
    PARTITION P4 VALUES LESS THAN MAXVALUE
    )
    ;
    

    Here we have partitioned the table into four partitions. Now a query selecting data from 2001 would only ever need to touch a single table.

    There are other ways to partition a table as well, for example, ‘hash’ partitioning is good for spreading out data across a number of spindles:

    CREATE TABLE HASH_EXAMPLE (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY HASH(col1)
    (
    PARTITION P1 DATA DIRECTORY = '/var/lib/mysql/data1',
    PARTITION P2 DATA DIRECTORY = '/var/lib/mysql/data2',
    PARTITION P3 DATA DIRECTORY = '/var/lib/mysql/data3',
    PARTITION P4 DATA DIRECTORY = '/var/lib/mysql/data4',
    )
    ;
    

    If we put each of these data files on separate disks we can gain a reduction in I/O contention.

    You can even have sub or composite partitions:

    CREATE TABLE sub_example
    (
        customer_number INT NOT NULL,
        customer_order VARCHAR(50) NOT NULL,
        customer_order_date DATETIME NOT NULL,
        customer_service_region INT NOT NULL)
    PARTITION BY RANGE(YEAR(customer_order_date))
    SUBPARTITION BY HASH(customer_service_region)
    (
    PARTITION P1 VALUES LESS THAN (2000) (SUBPARTITION S0, SUBPARTITION S1),
    PARTITION P2 VALUES LESS THAN (2003) (SUBPARTITION S2, SUBPARTITION S3),
    PARTITION P3 VALUES LESS THAN (2005) (SUBPARTITION S4, SUBPARTITION S5),
    PARTITION P4 VALUES LESS THAN MAXVALUE (SUBPARTITION S6, SUBPARTITION S7)
    )
    ;
    

    There are a few things to keep in mind:

    • Any storage engine may be partitioned
    • There is a limit of 1024 partitions (including any subpartitions)
    • If a value doesn’t match a ruleset it will be placed in the final partition
    • Hash partitioning can give you high insertion rates if you put each partition on a separate spindle. However, it doesn’t make much sense for InnoDB tables as InnoDB only uses one big table space anyway.
    • You don’t necessarily want to create hundreds of partitions in an attempt at future-proofing your database design as this will be a problem when using engines like MyISAM that will then have to open up hundreds of file descriptors.

    Aparently ALTER TABLE queries will case a physical rebuild of the table although Brian mentioned that some engines are now being smarter about this and may not necessarily require a rebuild for every ALTER TABLE query.

    MySQL 5.1 also finally gets disk-based clustering so you no longer need twice your database’s size of RAM to use cluster (NDB). Indexes still need to be kept in memory however.

    You create the tablespace using SQL rather than my.cnf configurations:

    CREATE TABLESPACE ts1
    ADD DATAFILE 'datafile.dat'
    USE LOGFILE GROUP lg1
    INITIAL_SIZE 12M
    ENGINE NDB;
    
    CREATE TABLE t1
    (pk1 INT NOT NULL PRIMARY KEY,
    b INT NOT NULL,
    c INT NOT NULL
    )
    TABLESPACE ts1 STORAGE DISK
    ENGINE NDB;
    

    A cool feature, especially for admins, is the task scheduler. This is basically cron inside the database. There is a new object called an ‘event’ that can be used to create one-time or recurring tasks. These are just stored procedures, eg:

    DELIMITER //
    CREATE EVENT optimise_tables
    ON SCHEDULE EVERY 1 WEEK
    STARTS '2006-03-05 1:00:00'
    ON COMPLETION PRESERVE
    DO 
    BEGIN
    OPTIMIZE TABLE test.table1;
    OPTIMIZE TABLE test.table2;
    END
    //
    

    I like the sound of these. They let you incorporate regular maintenance tasks into the database rather than relying on external scripts. Also, being stored procedures, you can make them as smart as you want.

    MySQL 5.0 had mysqlimport but in 5.1 it is has a new option “–use-threads” to let you specify multiple threads. This is basically a wrapper around LOAD DATA INFILE. Apparently it is even advantageous to use two threads on a single core machine.

    As I mentioned yesterday, the General and Slow Query Log are now system tables. This is cool! This means switching on the general query log is as simple as set global general_log=1;. Before 5.1, switching the general query log on and off required a database restart! No more staying up late or working weekends just so you can switch the general query log on and off for a few hours. 🙂

    Here is what is recorded in the logs:

    mysql> desc slow_log;
    +----------------+--------------+------+-----+-------------------+-------+
    | Field          | Type         | Null | Key | Default           | Extra |
    +----------------+--------------+------+-----+-------------------+-------+
    | start_time     | timestamp    | NO   |     | CURRENT_TIMESTAMP |       | 
    | user_host      | mediumtext   | NO   |     |                   |       | 
    | query_time     | time         | NO   |     |                   |       | 
    | lock_time      | time         | NO   |     |                   |       | 
    | rows_sent      | int(11)      | NO   |     |                   |       | 
    | rows_examined  | int(11)      | NO   |     |                   |       | 
    | db             | varchar(512) | YES  |     | NULL              |       | 
    | last_insert_id | int(11)      | YES  |     | NULL              |       | 
    | insert_id      | int(11)      | YES  |     | NULL              |       | 
    | server_id      | int(11)      | YES  |     | NULL              |       | 
    | sql_text       | mediumtext   | NO   |     |                   |       | 
    +----------------+--------------+------+-----+-------------------+-------+
    11 rows in set (0.00 sec)
    

    Much more readable than the current slow query log. What’s even cooler is that these tables use the CSV engine which is both human and MS Excel readable so you can either tail the table file or load it straight into Excel. You could also use the task scheduler to create an event that truncates the log tables every day.

    mysqlslap is a new command-line load-testing tool that can simulate many concurrent connections to the database and repetetively run load tests. It is kind of like Apache’s ab but for databases.

    It has a few cool features like autogeneration of SQL and can sequentially test different engines. For example you could pass a list of engines like this:

    --engine=innodb, falcon, myisam

    and compare the performance across the different storage engines.

    Because this is a MySQL client utility you can run it against any version of MySQL >= 3.23.

    There are a few new storage engines for 5.1 including PBXT which is an engine that promises the speed if myisam with transactions (although not fully ACID). This is implemented with 5.1’s new plugin architecture which means you can drop this storage engine directly into your running database. There is the crazy HTML storage engine implemented with GETs and PUTs and the AWS storage engine which Mark Atwood is giving a session on on Wednesday.

    For the second session, I went to hear Domas Mituzas talk about what goes on behind the scenes at Wikipedia. Domas was a funny guy and it was interesting to hear how Wikipedia operates with mostly volunteer sysadmins and cheap commodity hardware.

    They use a lot of custom code to handle things like database load balancing and caching. They use PowerDNS to handle geo-distribution.

    A lot of effort goes into caching; not just ensuring as much is cached as possible but that the cache can easily be invalidated when an object is updated. Basically, Wikipedia’s policy is to cache as much as possible inside their network but once it leaves the Wikipedia network the private cache-control bit is set so clients will always request a new copy of the data.

    Wikipedia do about 30,000 requests per second so logging to files is out of the question. They send their logs to the network where anyone that is interested in them can pick them up.

    Some of the customisations to the standard LAMP stack are:

    • PHP’s strtr() function has been replaced with FastStringSearch
    • Imagemagick is used for thumbnailing rather than PHP GD
    • They use APC as a bytecode cache; Domas thinks eaccelerator is too buggy
    • There is a Tex filter to render formulae and other scientific content that is written in ocaml and no-one knows how to change it!

    Wikipedia has realtime profiling written into their code. Also, they use xdebug combined with kcachegrind which creates some nice output showing a tree of calls and execution times. This is something that could be useful to us at the moment as we are working on profiling our code.

    All their database servers run RAID0 – they expect servers to fail and so they put their redundancy into the servers themselves rather than the disks. It is easier for them to promote a slave database to master rather than run a server on a degraded RAID array or wait for its database to recover after a crash.

    The text of Wikipedia articles is not actually stored in the database tables. The tables only contain metadata which points to external storage (more cheap disks) where the text of articles is stored (in compressed format).

    Many of the Wikipedia tables have more indexing than data reflecting the high number of reads compared to writes.

    It was also interesting to hear about the different culture of Wikipedia. It is mostly run by volunteers on cheap hardware and their attitude to uptime is what you might call relaxed but it seems to work for them. As Domas said, they don’t mind too much if they have a little downtime because when Wikipedia goes down it displays the “please donate” screen. 🙂

  • Santa Clara

    Santa Clara

    I have arrived in Silicon Valley where I will be spending the next week at the MySQL Conference & Expo 2007. I’m pretty excited about it.

    It turns out California is quite a long way from London. After a ten hour flight I caught the big double-decker Caltrain train south along the Bay to Mountain View where I transferred to the Valley Transportation Authority Light Rail. I was soon rolling past the offices of Yahoo, Net App and Lucent, the huge hangars and radar dishes of NASA Ames and Lockheed Martin and the deliciously named Chubby’s Broiler.

    The train carried on down W Java Dr and it wasn’t long before I reached the Santa Clara Convention Center and my hotel (the Hilton) directly opposite. My room is pretty nice and from my window I can see the roller coasters of the Great America theme park.

    Now I just need to plan what sessions to attend but looking at the schedule I have the usual problem of there only being one of me. Tomorrow is tutorial day and although my registration is sessions-only at the moment, apparently I can upgrade it to the tutorials tomorrow. Unfortunately Scaling and High Availability Architectures is sold out so I might go along to hear about all the new stuff in MySQL 5.1 in the morning.

    5.1 has some cool new features like partitioning (which could allow us to eliminate tables split by date in our application) and logs on demand. In fact there is another session just on the new logging capabilities at the conference. Check out this post for some examples of what you can do when the logs are stored in the database itself.

    Learning how to write your own storage engine sounds intersting but not something I see myself doing any time soon so I reckon I see how Wikipedia operates in the afternoon.

  • Mobile phones

    Mobile phones

    I’ve been pretty busy with my project for my MSc in Computer Science. It is a metronome/instrument tuner that runs on a mobile phone. All modern mobiles have Java so I didn’t think it was such a crazy idea when I was planning this. What I didn’t know was that if you want to use some advanced features of the J2ME Multimedia API then you are in a whole world of pain.

    Basically, the problem boils down to this: there is a standard API with all sorts of cool stuff like MIDI and audio capture, but the mobile devices don’t necessarily implement it fully. Even if they say they support JSR 135, that doesn’t automatically mean you will be able to capture audio in the proper format or be able to access the needed MIDI controls.

    Device support wasn’t the only problem. Not only do the devices differ from each other but each manufacturer supplies emulators for their different phones so you can test your programs against them while developing. This is a great idea. At least it would be if the emulator bore any resemblance to the actual device. I don’t mean how it looks. I mean code that runs on the emulator will crash your phone and vice versa. Aargh!

    So I have spent waaay more time than I wanted faffing around just trying to get the damn thing to run without crashing my phone. I have also spent way more money than I wanted acquiring phones to test with. Not including old phones we had lying around, we now have four new mobiles in our house from three different manufacturers.

    All this has given me some insight into the different phones and I have been able to compare the phones themselves along with how they work for a developer. These are the phones I have:

    • Motorola Razr V3i
    • Sony Ericsson W300i
    • Sony Ericsson M600i
    • Nokia E60

    Motorola make some very stylish phones and the Razr is a very thin, very nice looking phone, but I have never liked the user interface. It was also the least fun to develop with. Motorola make it really hard to install Java MIDlets by USB cable rather than downloading over the air (and paying for it). It took me ages and loads of Googling to figure out how to hack my phone so I could install MIDlets using the cable. Once this was done however the Motorola MIDway tool isn’t bad for installing MIDlets and getting debug info while they are running. That said, the V3i (what’s up with all the i’s?) had the least success in running my app. It can only capture audio in AMR format and none of the MIDI stuff worked, however, the Motorola emulator isn’t too bad.

    Kathryn upgraded her old T610 to a new W300i (again with the i) Walkman phone. This is actually a nice little phone and I know Kathryn likes it. I figured, seeing as it is a Walkman, it should have pretty good audio support. In a way all these phones have pretty good audio support, but the problem is with audio capture. Once again the W300i is limited to capturing in compressed AMR format; no good for audio processing. I didn’t actually look to see if there was an emulator for the W300i but installing apps is a lot easier than with the Motorola (well it couldn’t be any harder). The Sony Ericsson File Manager lets you browse the contents of your phone and then it is a simple matter of dragging your JAR/JAD files onto the phone and selecting install. The MIDI bit worked pretty flawlessly although the MIDI sounds on this phone all have a vibrato you could drive a tank through.

    I was starting to get desparate now, so I decided to change tack a little and try using JavaME CDC instead of CLDC. Basically, the CDC is what PDAs and smartphones run and the CLDC is for smaller devices like mobiles. Unfortunately the CDC doesn’t contain the javax.sound package and the MMAPI implementation on the Sony Ericsson M600i I bought was one of the most limited of the bunch. Neither the MIDI nor the audio capture worked properly. Aside from those rather fundamental problems, the phone itself is really nice. It has a nice touch screen as well as a full QWERTY keyboard. It is 3G, has a good email client and you can easily edit MS Excel, Word and Powerpoint files. It can run Blackberry and other push email software and is generally a really sweet piece of kit. The Symbian OS it runs is nice and because it has a full Java CDC implementation you can get loads of cool software for it like IM clients. It serves as my phone/PDA now. The emulator is pretty good too, and installing apps is the same as the little W300i.

    Today I got the Nokia E60. It isn’t just the fact that this phone doesn’t have an ‘i’ in its name that sets it apart. This one actually worked right away. Both the MIDI and the audio capture ran without a hitch. The MIDI instruments even sound quite good! The emulator is a bit limited in that it doesn’t support some of the MIDI controls that the actual device does, as well as being dog slow, but installing programs is a breeze; the Nokia PC suite makes it easy with a dedicated button for installing apps. The phone is really nice too. It is part of their executive range so it doesn’t have a camera, but it is nicely built with a fantastic screen and is absolutely loaded with features including 3G and even wifi built in! In fact, it can do almost everything my Sony M600i can do, incuding editing Word and Excel files and handle push email, although the PIM stuff is a bit more limited and it isn’t as nice entering text on the standard mobile keypad rather than the full QWERTY or stylus of the Sony. If I hadn’t already bought the Sony, I would be really pleased with this phone. The other great thing about it is that you can actually use it without a SIM card inserted! All the other phones won’t let you do a thing until a SIM is inserted whereas the Nokia will do everything apart from make calls. I’ve never had a Nokia before but I’m impressed!

    Now all I need to do is finish the damn project…