Archive for the ‘Databases (RDBS)’ Category

MySQL 5.6.25 sql-bench results on Linux 4.2.3 kernel and SSD RAID 1

Friday, October 23rd, 2015

This post provides fresh sql-bench results for the test that was performed a while back.
(more…)

HP DL380 G5 drive write cache (BBWC)

Friday, June 19th, 2009

In one of my previous articles I have written about tools that I use for benchmarking database performance and especially for discovering system bottlenecks. In this article I will show you how large impact on filesystem performance a drive write cache may have.

HP DL360 G4 server

HP and its line of DL servers is very respected amongst IT engineers. It is (at least by my experience) a reliable class of servers, well built, easy to maintain and comes with an excellent server management software called Integrated Lights-Out (ILO). DLs SCSI and SAS storage subsystem is usually controlled by controllers called Smart Array, which can be integrated onto the motherboard or not. Fresh HP customer gets all the nifty RAID levels to play with and is usually satisfied. But what fresh customer usually DOES NOT KNOW is that Smart Array write performance really sucks if write cache is not enabled. And to enable it, fresh user needs to install a special module with attached battery called BBWC. So, HP, if you are accidentally reading this, please do notify your customers about such things.

HP DL380 G5 server

This is another server where I conducted the benchmark and, unlike DL360 G4, already came with BBWC installed. To simulate absence of write cache I disabled it with command line tool hpacucli (HP Array Configuration Utility Command Line Interface).

Benchmark metodology

For this benchmark I used two tools from sql-bench suite which heavily stress filesystem with lots of file creations and deletions. These tools are test-alter-table and test-create. The former test is faster and only gives rough figure. The later creates and deletes around 50,000 MyISAM tables which results in 150,000 files created and deleted. I executed both benchmarks first without and then with write cache enabled.
One of the machines (the DL380) was already in production, but it was benchmarked during the night when usage is negligible.

Test systems

HP DL360 G4

  • Controller: Smart Array 6i
  • Filesystem: ufs
  • OS: FreeBSD 6.0
  • MySQL: mysql-5.0.41-freebsd6.0-i386

HP DL380 G5

  • Controller: Smart Array P400i
  • ext3 filesystem
  • OS: Slackware 12.2
  • MySQL: mysql-5.0.77 compiled from source

Results on HP DL380 G5

Drive write cache DISABLED

Testing of ALTER TABLE
Time for insert (1000) 0 wallclock secs
( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for alter_table_add (100): 17 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for create_index (8): 2 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for drop_index (8): 2 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for alter_table_drop (91): 16 wallclock secs ( 0.01 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.01 CPU)
Total time: 37 wallclock secs ( 0.03 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.03 CPU)

Testing the speed of creating and dropping tables
Time for create_MANY_tables (10000): 253 wallclock secs ( 0.26 usr 0.06 sys + 0.00 cusr 0.00 csys = 0.32 CPU)
Time to select_group_when_MANY_tables (10000): 1 wallclock secs ( 0.09 usr 0.07 sys + 0.00 cusr 0.00 csys = 0.16 CPU)
Time for drop_table_when_MANY_tables (10000): 1 wallclock secs ( 0.09 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.12 CPU)
Time for create+drop (10000): 259 wallclock secs ( 0.24 usr 0.18 sys + 0.00 cusr 0.00 csys = 0.42 CPU)
Time for create_key+drop (10000): 255 wallclock secs ( 0.41 usr 0.11 sys + 0.00 cusr 0.00 csys = 0.52 CPU)
Total time: 769 wallclock secs ( 1.09 usr 0.45 sys + 0.00 cusr 0.00 csys = 1.54 CPU)

Drive write cache ENABLED

Testing of ALTER TABLE
Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for alter_table_add (100): 3 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for create_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for drop_index (8): 0 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for alter_table_drop (91): 4 wallclock secs ( 0.01 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.01 CPU)
Total time: 8 wallclock secs ( 0.04 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.05 CPU)

Testing the speed of creating and dropping tables
Time for create_MANY_tables (10000): 9 wallclock secs ( 0.34 usr 0.06 sys + 0.00 cusr 0.00 csys = 0.40 CPU)
Time to select_group_when_MANY_tables (10000): 1 wallclock secs ( 0.06 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.09 CPU)
Time for drop_table_when_MANY_tables (10000): 1 wallclock secs ( 0.10 usr 0.04 sys + 0.00 cusr 0.00 csys = 0.14 CPU)
Time for create+drop (10000): 9 wallclock secs ( 0.44 usr 0.10 sys + 0.00 cusr 0.00 csys = 0.54 CPU)
Time for create_key+drop (10000): 10 wallclock secs ( 0.35 usr 0.10 sys + 0.00 cusr 0.00 csys = 0.45 CPU)
Total time: 30 wallclock secs ( 1.29 usr 0.33 sys + 0.00 cusr 0.00 csys = 1.62 CPU)

Drive write cache DISABLED ENABLED Relative difference
sql-bench: test-alter-table 37 s 8 s 462%
sql-bench: test-create 769 s 30 s 2563%

Results on HP DL360 G4

Drive write cache DISABLED

Testing of ALTER TABLE
Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.04 CPU)
Time for alter_table_add (100): 33 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for create_index (8): 4 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for drop_index (8): 3 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for alter_table_drop (91): 33 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Total time: 75 wallclock secs ( 0.04 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.07 CPU)

Testing the speed of creating and dropping tables
Testing with 10000 tables and 10000 loop count
Time for create_MANY_tables (10000): 1035 wallclock secs ( 1.27 usr 0.25 sys + 0.00 cusr 0.00 csys = 1.52 CPU)
Time to select_group_when_MANY_tables (10000): 83 wallclock secs ( 0.63 usr 0.16 sys + 0.00 cusr 0.00 csys = 0.79 CPU)
Time for drop_table_when_MANY_tables (10000): 493 wallclock secs ( 0.50 usr 0.19 sys + 0.00 cusr 0.00 csys = 0.69 CPU)
Time for create+drop (10000): 958 wallclock secs ( 1.59 usr 0.38 sys + 0.00 cusr 0.00 csys = 1.97 CPU)
(NOTICE: Could not wait for this test to finish, because machine needed to get back in production,)
(thus I assume it to be around 900 seconds just to be on the safe side, probably would be more.)
Total time calculated: around 3400 seconds

Drive write cache ENABLED

Testing of ALTER TABLE
Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for alter_table_add (100): 8 wallclock secs ( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Time for create_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for drop_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU)
Time for alter_table_drop (91): 8 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU)
Total time: 18 wallclock secs ( 0.05 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.06 CPU)

Testing the speed of creating and dropping tables
Time for create_MANY_tables (10000): 104 wallclock secs ( 1.07 usr 0.24 sys + 0.00 cusr 0.00 csys = 1.31 CPU)
Time to select_group_when_MANY_tables (10000): 27 wallclock secs ( 0.48 usr 0.18 sys + 0.00 cusr 0.00 csys = 0.66 CPU)
Time for drop_table_when_MANY_tables (10000): 53 wallclock secs ( 0.32 usr 0.09 sys + 0.00 cusr 0.00 csys = 0.41 CPU)
Time for create+drop (10000): 143 wallclock secs ( 1.31 usr 0.30 sys + 0.00 cusr 0.00 csys = 1.62 CPU)
Time for create_key+drop (10000): 164 wallclock secs ( 1.56 usr 0.36 sys + 0.00 cusr 0.00 csys = 1.92 CPU)
Total time: 491 wallclock secs ( 4.74 usr 1.18 sys + 0.00 cusr 0.00 csys = 5.92 CPU)

Drive write cache ABSENT ENABLED Relative difference
sql-bench: test-alter-table 75 s 18 s 416%
sql-bench: test-create 3400 s 491 s 692%

Analysis

The test-alter-table results seem fine, slightly over 400% increase in performance. But what bothers me is the test-create difference. I expected the HP DL360 G4 to improve more and execute this test below 100 seconds barier, heck, actually I expected it below 50 seconds. It is true that this machine uses different operating- and filesystem. But 500 seconds still seems too much to me, especially when HP DL380 G5 excells at 30 seconds. If someone know the answer, please drop it in comments.

Conclusion

I believe this article has clearly shown why one must conduct even such synthetic tests before deploying the systems to production environment. Furthermore even before the “real-world benchmarks” are conducted. The phrase “real-world benchmark” signifies a comparative benchmark of certain application on an existing production systems and on the ones that are in testing phase. It often happens that hardware is not upgraded for quite some time, which means that new hardware is few generations younger than the existing one. The new one is far more powerful and one easily misses some not-so-innocent bottleneck if “real-world benchmark” displays certain improvement. Thus, as I believe, newer systems MUST perform better than older ones in every synthetic benchmark (if the systems are comparable, of course), and only then we can start conducting “real-world benchmarks”.

How did I discover this “issue”?

It happened to me back in the 2004 that I deployed such a HP server to collocation facility and only later discovered that it was performing worse than some old test machine lying under my desk. After couple of hours of googling I assumed that the lack of BBWC was our problem. I had to order it and then remove the server from collocation because I also wanted to upgrade all the firmwares, just in case. On top of that, I still had to figure out how to install ‘hpacucli’ on non-RedHat linux. After a long weekend the machine was back in production and never caused a single problem again.

MySQL sql-bench results

Tuesday, June 16th, 2009

UPDATE: There was a newer test performed, with MySQL 5.6.25 on Linux 4.2.3 64bit, on almost the same hardware. You can see it here.

This is a follow-up article to the MySQL Super Smack benchmark results. Results from sql-bench benchmark suite can easily pinpoint some of the potential system bottlenecks. I find it especially useful for discovering filesystem performance or – better – slowness.

Results
Total execution time is: 562 seconds

# run-all-tests
alter-table: Total time: 8 wallclock secs ( 0.02 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.03 CPU)
ATIS: Total time: 2 wallclock secs ( 1.20 usr 0.09 sys + 0.00 cusr 0.00 csys = 1.29 CPU)
big-tables: Total time: 5 wallclock secs ( 2.45 usr 0.08 sys + 0.00 cusr 0.00 csys = 2.53 CPU)
connect: Total time: 50 wallclock secs (12.74 usr 4.50 sys + 0.00 cusr 0.00 csys = 17.24 CPU)
create: Total time: 31 wallclock secs ( 1.20 usr 0.44 sys + 0.00 cusr 0.00 csys = 1.64 CPU)
insert: Total time: 397 wallclock secs (97.95 usr 13.61 sys + 0.00 cusr 0.00 csys = 111.56 CPU)
select: Total time: 44 wallclock secs ( 8.71 usr 0.88 sys + 0.00 cusr 0.00 csys = 9.59 CPU)
transactions: Test skipped because the database doesn’t support transactions
wisconsin: Total time: 3 wallclock secs ( 0.91 usr 0.23 sys + 0.00 cusr 0.00 csys = 1.14 CPU)
TOTALS 562.00 123.77 19.82 143.59 3425950

System specification can be found here.

ReiserFS vs others
In the age of Linux kernel 2.4.x we used ReiserFS v3 as the filesystem of choice. With the available options of ReiserFS (journal, performance), ext2 (stable but slow) and ext3 (probably stable, but not so speedy as ReiserFS) the choice was obvious. I skipped few years then and this year again tried using ReiserFS with linux 2.6.29.1 but it turned out to be even slower than ext2 was in the old days. Googling around for an answer gave some hints that ReiserFS has an issue with someting called BIG_KERNEL_LOCK on 2.6 kernels. I didn’t really investigate further, but went down the ext3 way.

Comments on test-create
If test-create takes much more time than, say, 30-60 seconds, then you definitely have a problem with filesystem write performance. On HP DL360 and DL380 class of servers this correlates with the presence and activation of BBWC (Battery-Backed Write Cache enabler kit). Without BBWC and hence without write cache enabled, this test took more than 10 minutes to complete. Thus, if you are purchasing some new HP servers, be sure that you also order BBWCs.

Question about test-insert
Looking at the test times, this test-insert result is really standing out. Again, I do not have any other data to compare it to, but somewhere deep down in my memory I seem to remember that the total time for all the tests was around 300 seconds. This obviously means that this test-insert result is the bad guy here. Can someone comment on this result, or paste in the comments his own? Thanks.

Feedback
If you have any questions, recommendations or benchmark results to compare, do not hesitate to leave a comment.

UPDATE1: 2014-09-09
I forgot to mention explicitly that this system is running 32bit version of Slackware

UPDATE2: 2014-09-09
Fortunately this system is still up and running. During these five years only storage has been expanded with 300GB 10K SAS drives in RAID 1 configuration. Software was upgraded regularly and is currently on MySQL 5.5.39 and pending 5.6.20 upgrade. I retested the test-create today and the result was 85 wall-clock seconds. This is almost 3x worse as initially. The server is currently lightly loaded.