MySQL Table Order
Here's the situation. You have User_Table — which is a table of users, and a Content_Table — which is a table of content. They have a column in common called u_id which is the user's unique id whom submitted the content.
What we want to do is get a list of users where for each user we show their latest piece of content. We want to show every user once, even if they haven't submitted anything, and we want the user list organized by date of last submitted content. You could do this with two queries, the challenge is to do it with one.
So here's a SQL query:
"SELECT User_Table.u_id, User_Table.username, Content_Table.c_id FROM User_Table LEFT JOIN Content_Table ON User_Table.u_id = Content_Table.u_id GROUP BY User_Table.u_id HAVING COUNT(User_Table.u_id) >= 2 ORDER BY Content_Table.date_added DESC"
Let's just run through the query. We're doing a SELECT on the user's id and username, and the content's id. We want a list of users so we get it FROM the User_Table, and do a LEFT JOIN with the Content_Table based on the u_id column. This multiplies the two tables together and you end up with this big imaginary table so we GROUP BY u_id. We don't want to return the entire big multiplied table, we only want 1 entry for each user and only HAVING more than two entries in the big multiplied table. Because the big multiplied table includes the User_Table there will be at least one row for each user. If there's 2 rows or more it means there's at least 1 row in the Content_Table by that user. We then run an ORDER BY on the resulting smaller imaginary table which will organize the list of users based on the date_added column on the Content_Table.
If you run this query you likely won't get the results you expected. What you get is a list of users who have submitted at least one content item. The list of users is sorted correctly - by the last content item submitted. So if UserG submitted content after UserD it'll show them in the appropriate order, but if a given user has submitted more than one content item, the c_id fetched will be for the first content item they submitted. This might seem weird because the list of users is correctly sorted based on the last content item they submitted.
What's happening is MySQL processes the query in the order it's written, so it's running GROUP BY before it runs ORDER BY. What we'd like to happen is for the big imaginary table to be sorted by Content_Table.date_uploaded and then grouped by u_id. There are a number of ways to broach this issue that involve intensely convoluted queries; things like nesting, aliasing, and referencing the same table several times in different ways. That shit is hard to read and not compatible with older MySQL servers or different server configurations.
Now you could run the following query just before you run the one above:
"ALTER TABLE Content_Table ORDER BY date_added DESC"
and it'll physically resort the table. The problem with doing this is that if you later INSERT, DELETE, or UPDATE(in some cases) the table, it'll mess with the ordering and reordering the table on the physical disk, and running the ALTER TABLE query can be resource intensive - especially in cases where you have a very large frequently used table.
When I drafted this post a couple weeks ago I was on a tight deadline working on the new DearDrum.org and decided to de-prioritize the functionality until I had a chance to figure out a better way to do this. Still dealing with deadlines I figured I'd just post it and see if anyone else has a solution, or at least open it up for discussion.
Leave a Comment | Oct 13, 2010