Making MySQL Use More Memory

Unlike a lot of database servers, MySQL is strangely conservative (by default) on how much memory it will allocate. If you’re not careful, you can have 16GB of RAM on your machine with MySQL only using 50MBs, leading to extremely poor performance under heavy load. I know firsthand that navigating MySQL configuration guides can be a daunting task, so I’ve prepared this post for those looking for a ‘quick fix’ to encourage MySQL to use a more healthy amount of memory.

Which database storage engine do you use primarily?

Many beginner users may not understand this, but with MySQL you have a choice in which storage engine implementation your database runs on. This is where performance tuning begins to get complicated, as you have to set the configuration variables that correspond to the storage engine you are using! You can see what engine you are relying on by opening MySQL Administrator and viewing your schema under the Catalogs tab. Each table should should have an engine associated with it, which likely says either MyISAM or InnoDB.

In this post, I’ll cover how to increase general memory usage for InnoDB:

Set “innodb_buffer_pool_size” to be up to 80% of RAM, or at least a few hundred megabytes. The default is 8MB and I imagine anyone running a MySQL server these days can at least spare 200 megabytes of memory, so it should at look like this:

innodb_buffer_pool_size=200M

Again, the default is 8M! So, if you’re not setting this variable, you’re choking your database. Feel free to give it 1-2GB if you have the available memory, but the most gain will be made by just going above the default.

There are more InnoDB settings you could set, but their benefits pale in comparison with the value you’ll gain by increasing this from the default of 8M.

javaranch/jforum – nailing down a production problem

JavaRanch successfully cutover to our new forum software yesterday.  Unsurprisingly, we had a variety of small problems that cropped up.  One of them was the fact that anonymous users were able to post.

Below follows the thought process and steps involved in fixing it.  And then in fixing the problems caused by the fix!

  • A moderator noticed anonymous users can post – that’s not good
  • I looked at “general” group and saw deny anonymous posts was selected for all forums.  This looks right.  Hmm.  What’s the fastest thing I can change to prevent anonymous users from posting?
  • I know!  I’ll create a new group, put anonymous in it and give that group read only access.
  • Seems to work.
  • I logged a bug that when not logged in we see “new topics” in one stray place.  (At the time this doesn’t seem related.)
  • Things seem stable.  Time to get off the computer and enjoy Saturday night.
  • Another moderator asked why that was a bug since he gets redirected to login at that point. (This moderator is in Europe and had a good chunk of Sunday before I woke up.  It’s nice how things get done around the world.)
  • Chat session with this moderator – we realized he saw one behavior on our sandbox and I saw the error in production.  We discussed the behavior on test is correct/desired.  I need to figure out why it doesn’t work that way in production.  End chat session.
  • I wonder if the database is out of synch due to our data migration yesterday.
  • I edit ed the new group to force a database update.  If the database wasn’t out of synch, this doesn’t hurt anything.
  • I opened a new browser and tried accessing when logged out.  I get the expected behavior now.  I see the button and am prompted to login when I click it.
  • Now to tie up loose ends – I made the same “change” (edit without changing anything) to the “general” group, moved anonymous back there and got rid of the group created yesterday.
  • Marking it off on the bug list and all done.

My query is too slow – what do I do?

After posting Tuning JForum for JavaRanch, a couple people asked me offline about what I look for when tuning queries.

Before
The very first thing I do is identify the slow query and benchmark a “before” number.  This proves that there is something to optimize and lets me see if my tuning helped matters.  If I’m tuning a web app, this may start out as a slow page.  If I have a slow web page that uses multiple queries, I’ll add debug statements or use a profiler to see which one(s) are the slowest.  Ultimately by the time I am done with this step, I have a specific query that I want to tune.    I find this step important because it focuses the problem on something attainable.  “Tune one query” is a lot easier than “the website is slow.”  Granted once I’ve tuned the query I am likely to need to identify the next slow query and repeat the process.  But it’s steady progress.

During
Some “tuning opportunities” comes up repeatedly during query optimization.  Some common ones I’ve encountered (not necessarily in the order I would look for them):

  • Missing index – The first thing I do with a slow query is run a database explain and look at the output.  My previous blog entry has an example.  Table/heap scans are a big red flag for a tuning opportunity.  I look at what fields are in the where clause and make sure there is an index matching.  If not, I’ll add one or adjust the query to take advantage of existing indexes.
  • Inefficient query – Sometimes the explain plan shows nested queries where it isn’t necessary or expected.  If this is the case, I’ll rewrite the query to avoid the repetitive work.
  • Returning too many fields – If a query has “select *” or returns a lot of fields, I’ll examine the calling code to make sure they are all used.  If not, I’ll adjust the select clause to only return the fields that are actually needed.  If the database is on a different machine than the application code, network traffic is being wasted transferring unneeded data.  On a local machine, it’s still unnecessary work to read data that the caller does not need.  The effect of this on a network is large though.  Suppose you return just one unneeded column.  A query returning 1000 records has now wasted a kilobyte of network traffic that could have been spent on data you really did want.
  • Too many roundtrips – A lot of small queries isn’t necessarily better than a large one.  There are two reasons for this.  One is the time for network roundtrips.  Your work isn’t getting done while the application and database are waiting for communication to go back and forth.  If the queries can be merged or turned into a stored procedure, the application is likely to perform faster.  Sometimes merging helps overall.  For example, querying a table by primary key 1000 times is going to be slower than making 1000 separate requests to get a row.  If for no other reason than that the database is doing redundant work analyzing the query each time.
  • Check the algorithm – Sometimes the algorithm in the application is just plain inefficient.  There’s not much to say here because you need knowledge of an application to identify cases like this.
  • Can it be cached – Some queries just take time.  It may be a 100 millisecond query run many times a minute.  Or it may be a 1 second query run once every 5 minutes.   The easiest way to make it take less time is to cache the results.  After all, if the database doesn’t need to run the query, it will be faster.

Note: all of this advice assumes we are dealing with a huge table.  Of course if it wasn’t the query would likely not be slow in the first place.

After
How much did it help?  If my change made a significant improvement, I’ll look to see whether I have any other queries with the same problem.  I still gather before/after numbers for these.  It goes faster since I know a way of tuning it before I start though.