Greenguy's Board


Go Back   Greenguy's Board > Blogs and Blogging
Register FAQ Calendar Today's Posts

 
 
Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
Old 2008-12-22, 12:37 AM   #24
cd34
a.k.a. Sparky
 
cd34's Avatar
 
Join Date: Sep 2004
Location: West Palm Beach, FL, USA
Posts: 2,396
Briefly, on each page that is composed, wordpress runs the following query:

Code:
SELECT SQL_CALC_FOUND_ROWS  wp_posts.* FROM wp_posts  INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id)  WHERE 1=1  AND wp_term_taxonomy.taxonomy = 'category' AND wp_term_taxonomy.term_id IN ('5') AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 25;
The query is dynamically generated, and, is almost unindexable in its general state, but, that's not where the issue comes to play. You can see that it is picking the 25 most recent posts, but, SQL_CALC_FOUND_ROWS requires it to continue to do a tablescan to get the total number of rows so wordpress can figure out pagination.

Hit the front page of a wordpress site, and generally that query will be cached. If you hit that on all of your archive pages, each time the page is generated, it must hit that query. The query holds an exclusive lock on the tables, so, when googlebot hits the first page holds the lock, the second requests must wait until the previous request finishes then holds its lock, the third page requires both of the previous to complete, and the saga continues.

When you have 100 posts on your blog, its not really that much of an issue, but, with 2000 posts on your blog, it creates a bit of a backlog. With the introduction of 2.6, wordpress got the brilliant idea to hold revisions in the main table (their beta held revisions in a separate table). Now, when you get a feed post in from a sponsor, and you edit it to fix a quote mark or something else, then see a third mistake so you edit that, now, your 2000 post blog quickly gets 6000 records that it must sort through.

Put 80 blogs on a server when googlebot starts hitting and you've got a mess.

Joomla/Mambo is not without its faults. Even version 1.5 does its own processing of images to check permissions and will crumble under load -- especially with the SEO plugin.

Any of them can handle a high traffic site with enough load balancers, caching servers, mysql pools, etc.

Now, the one thing that kills wordpress/joomla/mambo/drupal is the fact that they insert themselves into the '404' process. By handling 404's themselves, all of the processing required to run the page is executed everytime firefox asks for favicon.ico and it isn't there -- or any file that 404's. A missing stylesheet, thumbnail image, icon, etc can cause havoc in the site in addition to its normal load.

All boils down to people that write software for simplicity rather than performance. The age old example is the picture of the day script.

How many times a day do you need to decide which picture to display? I don't know how many scripts I've seen that count the number of images in a directory and do a modulo based on the julian date to figure out which one to display on each pageload. Or, the script takes the day of the month and the images are listed as 01.jpg, 02.jpg, etc.

Engineering to run a blog network like that is not without its challenges.
__________________
SnapReplay.com a different way to share photos - iPhone & Android
cd34 is offline   Reply With Quote
 


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 05:03 PM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc