Thursday, March 08, 2012

A Lesson Learned From A SharePoint Patch Installation

A busy morning started from tons of reported issues on our SharePoint 2010 portal: my sites, audience, and some navigations were not working...Open up SharePoint Central Admin we first noticed saw an error "User Profile Application's connection is currently not available.." in the user profile service:


Inside profile synchronization service we saw the "An error has occurred while accessing the SQL Server database or the SharePoint Server Search service. If this is the first time you have seen this message, try again later. If this problem persists, contact your administrator.":


It looked like something wrong with the database connection. But there's no recent change on SQL database and service accounts. From the SharePoint log we noticed many unexpected errors from User Profile Web Services:

w3wp.exe (0x0B3C) 0x0268 SharePoint Portal Server User Profiles eh0u Unexpected ProfilePropertyService.GetProfileProperties Exception:
System.MissingMethodException: Method not found: 'Microsoft.SharePoint.Administration.SPIdentifierType Microsoft.SharePoint.Administration.SPAce`1.get_BinaryIdType()'. at
Microsoft.Office.Server.Administration.SPAclFormatter.Serialize[TRights](XmlWriter xmlWriter, SPAcl`1 acl) at
Microsoft.Office.Server.Administration.SPAclFormatter.Serialize[TRights](SPAcl`1 acl) at
Microsoft.Office.Server.Administration.UserProfileApplication.get_SerializedAdministratorAcl() at
Microsoft.Office.Server.Administration.UserProfileApplication.GetProperties() at
Microsoft.Office.Server.UserProfiles.ProfilePropertyService.GetProfileProperties() 4507cc64-e9bd-43d3-9191-5e10b9509083

"Method not found" implies that the service calls and web services were not matching and they had different method signature. What on earth was happening? We recalled that a SharePoint security patch KB2597124 was installed last night (described in Microsoft Security Bulletin). Would it be the culprit? Not likely at the first glance since we a few developers had already installed that patch and didn't find any issue on it.

We traced down the SharePoint update history and found the security patch was installed successfully. The only difference between development and production machines is that production servers didn't have SharePoint Oct. 2011 cumulative updates installed but development machines did:
Production environment: SharePoint2010_SP1(14.0.4763.1000) + KB_2597124_Patch
Dev environment: SharePoint2010_SP1(14.0.4763.1000) + Oct_2011_CU(14.0.6112.5000) + KB_2597124_Patch

Would that difference caused the problem? We read through the patch instruction and it clearly showed it can be applied to SharePoint 2010 or SharePoint 2010 SP1 server without any other prerequisite information.

Anyway we did some tests. We installed KB2597124 Patch in a SharePoint 2010 SP1 test machine (14.0.4763.1000 without any CU installed). Bamm, we then saw all those issues in the test machine!

The conclusion was clear. That the security patch has dependency on previous cumulative updates (Oct. 2011 CU or latest Dec. 2011 CU). But this dependency information is not included in patch’s document, and security patch doesn’t do any prerequisite check.

Those "Method not found" errors now are understandable. Those CU updates modify some user profile web services, and the security patch makes some changes on top of those updated web services. I guess Microsoft found the security vulnerability. They thought it's critical so they teamed up some developers to work on it quickly. What those developers did was grab the latest code including the CU build, fix the issue, test it and make it public. But should Microsoft be a little bit more responsible for that? At least do some testing for different environments before the release right?

The security patch can not be uninstalled. The only option for us was to apply the CU after the security patch which is not the right order. We installed Oct. 2011 CU in that test machine with security patch installed. The CU install was smooth but it failed in SharePoint configuration wizard after machine reboot as prompted:

A few errors can be found in the log file:

03/08/2012 16:08:49 14 ERR The exclusive inplace upgrader timer job failed.
03/08/2012 16:08:49 14 ERR Task upgrade has failed with a PostSetupConfigurationTaskException An exception of type Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException was thrown. Additional exception information: Failed to upgrade SharePoint Products.
03/08/2012 16:08:49 14 ERR An exception of type Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException was thrown. Additional exception information: Failed to upgrade SharePoint Products.
Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException: Exception of type 'Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException' was thrown.
at Microsoft.SharePoint.PostSetupConfiguration.UpgradeTask.Run()
at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()
03/08/2012 16:08:49 14 ERR Task upgrade has failed
03/08/2012 16:08:49 1 ERR Task upgrade has stopped and failed. Total failed is now 1
03/08/2012 16:08:49 8 ERR Task upgrade SharePoint Products failed, so stopping execution of the engine
03/08/2012 16:08:49 8 ERR Failed to upgrade SharePoint Products.
An exception of type Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException was thrown. Additional exception information: Failed to upgrade SharePoint Products.
Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException: Exception of type 'Microsoft.SharePoint.PostSetupConfiguration.PostSetupConfigurationTaskException' was thrown.
at Microsoft.SharePoint.PostSetupConfiguration.UpgradeTask.Run()
at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()
03/08/2012 16:08:49 8 ERR One or more configuration tasks has failed or some tasks were not run
03/08/2012 16:08:49 8 ERR Configuration of SharePoint Products failed. Configuration must be performed in order for this product to operate properly. To diagnose the problem, review the extended error information located at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS\PSCDiagnostics_3_8_2012_15_42_40_120_1586599544.log, fix the problem, and run this configuration wizard again.

We could see there were a few configuration tasks errors during the configuration wizard, which may because "Configuration must be performed in order for this product to operate properly" (message from log error). Fortunately, we found the site was back to normal after the CU installation: user profile service was up, mysites and navigation were all working properly. Reapplying the security patch was not implemented and we just didn't have enough time to test all that.

Lesson learned: make sure the test machine has the exact same configuration, fully test before applying any SharePoint patch, and don't take Microsoft for granted.