MySQL foreign key data types

I’ve just made some changes that have massively improved the database performance of Carsurvey.org.

Back in 2000, when I added comments support to Carsurvey.org, I made the very silly decision to set the foreign key in the comments table to be a varchar (I thought I might want some flexibility above and beyond integer ids), and this varchar foreign key referenced an existing integer id field in the reviews table.

All seemed well, but for the last week I’ve been puzzling over why some of my join queries between comments and reviews were quite slow, and weren’t using the indices that I thought they should. After several days, I decided to change the comments foreign key datatype to integer, so it would match the reviews id datatype. Suddenly indices started being used in the way I was expecting, and I got a big performance increase.

So there are two lessons to be learnt here:

1. Unless you have a very very good reason, make your foreign key datatype match the datatype of the unique key in the reference table. It’s common sense for most people, but apparently not me circa 2000.

2. MySQL (version 4.1.20 anyway) doesn’t appear to optimise joins well where it has to match columns with different datatypes. This is presumably because of how it does type conversions.

Hopefully my confession will help anyone else who’s made a similar mistake, and is trying to work out why their query isn’t using the index they expect.

3 thoughts on “MySQL foreign key data types

  1. Ian Chilton

    Hi,

    lol – how could you have flexibility beyond integer id’s if the primary key is an integer 🙂

    Oh well… – most things you wrote in the past can be improved if you look back over them!!

    Ian

  2. Steven Post author

    Hi Ian,

    I thought that I might use the foreign key to refer to several different tables (reviews, but maybe also articles and images tables that I hadn’t created yet), and then differentiate between them using a type field. Perhaps the other tables (articles, images etc) would not have numeric primary keys, so I hedged my bets and went with a varchar.

    Not a great idea in hindsight, but I didn’t realise it would have such a terrible impact on performance. Basically direct lookups were fine, but large joins weren’t using indices that were available. So nasty full or partial table scans were happening, rather than using nice efficient index lookups. Once I switched the foreign key to an integer, the indices came into play, and performance was transformed.

    None of this was obvious initially, as a few table scans don’t matter too much with a few thousand rows, but once you’re approaching 200,000 rows, it becomes more of an issue.

Leave a Reply