Update: http://support.microsoft.com/kb/2483219 has been issued to address this scenario.
| Consider the following scenario:
In this scenario, the new link loads very slowly.
I recently came up against a pretty interesting issue. The environment was a SharePoint MOSS with SP1 with some CU updates, content databases ranging from just a few gigs to over 500 gigs a few front ends app’s and clustered sql. The move to SP2 and right after that to August 2010 CU was made. Everything was fine it seemed until some load was placed on the server. So no issue was ever noticed as it was done over the weekend. The assumption was made that there was no problem with the updates.
Come Monday at the hint of any load on the farm SQL CPU spiked to the roof and four specific the pages would time out or just plain die. After the IIS time out was adjusted to over 5 mins the pages were loading up but to the tune of 4 minute load times. With some work with moving the databases to a much beefier box the load time came down to about 3 mins. Something was up.
When the “show all subsites” was unchecked for the sites that were running slow, the page loaded up in under 5 seconds. Started digging into navigation and the way the taxonomy was setup, but it ended up only being a very small percentage of the actual problem. Did a TON of digging around to find the culprit, pssdiag, SQL processes, hardware and network perf checks. And when I say a TON I mean totally exhausting.
The issue actually came from the navigation querying all the sites in the navigation (in order to build it) and cross referencing the permissions of the user against those sites in order to know if it needed to be security trimmed or not. With inheritance had been broken multiple times in the environment for permissions, SharePoint had to go through all those one off sets of permissions (some as large as the original parent permission schema) in order to come to a conclusion on what sites to show in the navigation.
Long story short there were a few culprits that goes to promote “best practices”.
1- Huge database – Needed to be split up in >100 gig databases
2- Site topology was crazy out of whack. Navigation had hundreds of subsides that were being queried and built out
3- Permissions – all the groups were totally in SharePoint more than 50,000+ users and 5,000+ groups.
Remember the 100 gig limit for database sizes is not only best practice because it will help you restore those DB’s faster but the larger the databases get without the performance to back it up from the server side you WILL SEE performance degradation in the SharePoint environment.
A quick win to help load times get back to normal if you are in this situation is by unchecking the “Show all sub sites” box in the navigation section of the site settings.
If you are worried about users checking this box again you can also change the DynamicChildLimit in the web.config if you have access AND YOU KNOW WHAT YOU ARE DOING. http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.publishing.navigation.portalsitemapprovider.dynamicchildlimit.aspx
Hope that helps, if there ends up being an end all be all fix to this issue outside of the three I listed above, I’ll post it.