HP DL380 G5 drive write cache (BBWC)

June 19th, 2009 by Boštjan Škufca

In one of my previous articles I have written about tools that I use for benchmarking database performance and especially for discovering system bottlenecks. rose gold glitter iphone 8 case iphone 7 case pocahontas iphone x case girls In this article I will show you how large impact on filesystem performance a drive write cache may have.

HP DL360 G4 server

HP and its line of DL servers is very respected amongst IT engineers. iphone 7 phone cases rose gold football phone cases iphone 7 It is (at least by my experience) a reliable class of servers, well built, easy to maintain and comes with an excellent server management software called Integrated Lights-Out (ILO). phone case iphone 7 walking dead DLs SCSI and SAS storage subsystem is usually controlled by controllers called Smart Array, which can be integrated onto the motherboard or not. otterbox iphone 6 case strada Fresh HP customer gets all the nifty RAID levels to play with and is usually satisfied. iphone 7 apple case leather apple charger case iphone x But what fresh customer usually DOES NOT KNOW is that Smart Array write performance really sucks if write cache is not enabled. one direction phone case iphone 7 soft iphone 6 plus case iphone 7 dragon ball case And to enable it, fresh user needs to install a special module with attached battery called BBWC. iphone 8 case carbon iphone 7 case cream So, HP, if you are accidentally reading this, please do notify your customers about such things. iphone 7 plus case star trek

HP DL380 G5 server

This is another server where I conducted the benchmark and, unlike DL360 G4, already came with BBWC installed. apple iphone 7 phone cases marvel iphone 7 plus case beach iphone 7 case with card holder black To simulate absence of write cache I disabled it with command line tool hpacucli (HP Array Configuration Utility Command Line Interface). iphone 8 case ombre iphone 8 case nack front

Benchmark metodology

For this benchmark I used two tools from sql-bench suite which heavily stress filesystem with lots of file creations and deletions. diamante phone case iphone 8 luke hemmngs iphone 6 case These tools are test-alter-table and test-create. iphone 6 anti radiation phone case The former test is faster and only gives rough figure. phone case iphone 7 plus initials The later creates and deletes around 50,000 MyISAM tables which results in 150,000 files created and deleted. I executed both benchmarks first without and then with write cache enabled. silicone phone cases iphone 6 plus One of the machines (the DL380) was already in production, but it was benchmarked during the night when usage is negligible. detachable iphone 7 case wolves iphone 7 case

Test systems

HP DL360 G4

  • Controller: Smart Array 6i
  • Filesystem: ufs
  • OS: FreeBSD 6.0
  • MySQL: mysql-5.0.41-freebsd6.0-i386

HP DL380 G5

  • Controller: Smart Array P400i
  • ext3 filesystem
  • OS: Slackware 12.2
  • MySQL: mysql-5.0.77 compiled from source

Results on HP DL380 G5

Drive write cache DISABLED

Testing of ALTER TABLE Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for alter_table_add (100): 17 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for create_index (8): 2 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for drop_index (8): 2 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for alter_table_drop (91): 16 wallclock secs ( 0.01 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.01 CPU) Total time: 37 wallclock secs ( 0.03 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.03 CPU) Testing the speed of creating and dropping tables Time for create_MANY_tables (10000): 253 wallclock secs ( 0.26 usr 0.06 sys + 0.00 cusr 0.00 csys = 0.32 CPU) Time to select_group_when_MANY_tables (10000): 1 wallclock secs ( 0.09 usr 0.07 sys + 0.00 cusr 0.00 csys = 0.16 CPU) Time for drop_table_when_MANY_tables (10000): 1 wallclock secs ( 0.09 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.12 CPU) Time for create+drop (10000): 259 wallclock secs ( 0.24 usr 0.18 sys + 0.00 cusr 0.00 csys = 0.42 CPU) Time for create_key+drop (10000): 255 wallclock secs ( 0.41 usr 0.11 sys + 0.00 cusr 0.00 csys = 0.52 CPU) Total time: 769 wallclock secs ( 1.09 usr 0.45 sys + 0.00 cusr 0.00 csys = 1.54 CPU)

Drive write cache ENABLED

Testing of ALTER TABLE Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for alter_table_add (100): 3 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for create_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for drop_index (8): 0 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for alter_table_drop (91): 4 wallclock secs ( 0.01 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.01 CPU) Total time: 8 wallclock secs ( 0.04 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.05 CPU) Testing the speed of creating and dropping tables Time for create_MANY_tables (10000): 9 wallclock secs ( 0.34 usr 0.06 sys + 0.00 cusr 0.00 csys = 0.40 CPU) Time to select_group_when_MANY_tables (10000): 1 wallclock secs ( 0.06 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.09 CPU) Time for drop_table_when_MANY_tables (10000): 1 wallclock secs ( 0.10 usr 0.04 sys + 0.00 cusr 0.00 csys = 0.14 CPU) Time for create+drop (10000): 9 wallclock secs ( 0.44 usr 0.10 sys + 0.00 cusr 0.00 csys = 0.54 CPU) Time for create_key+drop (10000): 10 wallclock secs ( 0.35 usr 0.10 sys + 0.00 cusr 0.00 csys = 0.45 CPU) Total time: 30 wallclock secs ( 1.29 usr 0.33 sys + 0.00 cusr 0.00 csys = 1.62 CPU)

Drive write cache DISABLED ENABLED Relative difference
sql-bench: test-alter-table 37 s 8 s 462%
sql-bench: test-create 769 s 30 s 2563%

Results on HP DL360 G4

Drive write cache DISABLED

Testing of ALTER TABLE Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.04 CPU) Time for alter_table_add (100): 33 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for create_index (8): 4 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for drop_index (8): 3 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for alter_table_drop (91): 33 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Total time: 75 wallclock secs ( 0.04 usr 0.03 sys + 0.00 cusr 0.00 csys = 0.07 CPU) Testing the speed of creating and dropping tables Testing with 10000 tables and 10000 loop count Time for create_MANY_tables (10000): 1035 wallclock secs ( 1.27 usr 0.25 sys + 0.00 cusr 0.00 csys = 1.52 CPU) Time to select_group_when_MANY_tables (10000): 83 wallclock secs ( 0.63 usr 0.16 sys + 0.00 cusr 0.00 csys = 0.79 CPU) Time for drop_table_when_MANY_tables (10000): 493 wallclock secs ( 0.50 usr 0.19 sys + 0.00 cusr 0.00 csys = 0.69 CPU) Time for create+drop (10000): 958 wallclock secs ( 1.59 usr 0.38 sys + 0.00 cusr 0.00 csys = 1.97 CPU) (NOTICE: Could not wait for this test to finish, because machine needed to get back in production,) (thus I assume it to be around 900 seconds just to be on the safe side, probably would be more.) Total time calculated: around 3400 seconds

Drive write cache ENABLED

Testing of ALTER TABLE Time for insert (1000) 0 wallclock secs ( 0.02 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for alter_table_add (100): 8 wallclock secs ( 0.02 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Time for create_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for drop_index (8): 1 wallclock secs ( 0.00 usr 0.00 sys + 0.00 cusr 0.00 csys = 0.00 CPU) Time for alter_table_drop (91): 8 wallclock secs ( 0.01 usr 0.01 sys + 0.00 cusr 0.00 csys = 0.02 CPU) Total time: 18 wallclock secs ( 0.05 usr 0.02 sys + 0.00 cusr 0.00 csys = 0.06 CPU) Testing the speed of creating and dropping tables Time for create_MANY_tables (10000): 104 wallclock secs ( 1.07 usr 0.24 sys + 0.00 cusr 0.00 csys = 1.31 CPU) Time to select_group_when_MANY_tables (10000): 27 wallclock secs ( 0.48 usr 0.18 sys + 0.00 cusr 0.00 csys = 0.66 CPU) Time for drop_table_when_MANY_tables (10000): 53 wallclock secs ( 0.32 usr 0.09 sys + 0.00 cusr 0.00 csys = 0.41 CPU) Time for create+drop (10000): 143 wallclock secs ( 1.31 usr 0.30 sys + 0.00 cusr 0.00 csys = 1.62 CPU) Time for create_key+drop (10000): 164 wallclock secs ( 1.56 usr 0.36 sys + 0.00 cusr 0.00 csys = 1.92 CPU) Total time: 491 wallclock secs ( 4.74 usr 1.18 sys + 0.00 cusr 0.00 csys = 5.92 CPU)

Drive write cache ABSENT ENABLED Relative difference
sql-bench: test-alter-table 75 s 18 s 416%
sql-bench: test-create 3400 s 491 s 692%

Analysis

The test-alter-table results seem fine, slightly over 400% increase in performance. water proof case iphone 8 plus 3 in 1 iphone 7 case But what bothers me is the test-create difference. slogan iphone 7 case iphone 7 vape case I expected the HP DL360 G4 to improve more and execute this test below 100 seconds barier, heck, actually I expected it below 50 seconds. iphone 8 plus case charging It is true that this machine uses different operating- and filesystem. bape case iphone 7 plus iphone 6 plus cases tumblr But 500 seconds still seems too much to me, especially when HP DL380 G5 excells at 30 seconds. phone case iphone 6 space wildflower cases iphone 6 plus If someone know the answer, please drop it in comments. iphone 7 jet black case

Conclusion

I believe this article has clearly shown why one must conduct even such synthetic tests before deploying the systems to production environment. Furthermore even before the “real-world benchmarks” are conducted. squishy phone case iphone 7 plus water phone case iphone 7 plus The phrase “real-world benchmark” signifies a comparative benchmark of certain application on an existing production systems and on the ones that are in testing phase. It often happens that hardware is not upgraded for quite some time, which means that new hardware is few generations younger than the existing one. tech 21 phone case iphone 6 The new one is far more powerful and one easily misses some not-so-innocent bottleneck if “real-world benchmark” displays certain improvement. iphone 7 case edge Thus, as I believe, newer systems MUST perform better than older ones in every synthetic benchmark (if the systems are comparable, of course), and only then we can start conducting “real-world benchmarks”. iphone 8 plus jet black case

How did I discover this “issue”?

It happened to me back in the 2004 that I deployed such a HP server to collocation facility and only later discovered that it was performing worse than some old test machine lying under my desk. galaxy iphone 7 plus case iphone 6 case orange After couple of hours of googling I assumed that the lack of BBWC was our problem. iphone iphone 6 case 360 degree phone case iphone 7 I had to order it and then remove the server from collocation because I also wanted to upgrade all the firmwares, just in case. On top of that, I still had to figure out how to install ‘hpacucli’ on non-RedHat linux. iphone 8 case marvel iphone 6 orange case After a long weekend the machine was back in production and never caused a single problem again.


4 Responses to “HP DL380 G5 drive write cache (BBWC)”

  1. I have been looking looking around for this kind of information. Will you post some more in future? I’ll be grateful if you will.

  2. LnddMiles says:

    The best information i have found exactly here. Keep going Thank you

  3. […] BBWC module contains 512MB of cache, all of which I normally allocate to writes, as disk contents are already cached in main memory, which is far larger and cheaper. I used to utilise 25%/75% cache divide for reads and writes, but not anymore. […]

Leave a Reply

 

*