Wednesday, June 4, 2008

The Server is not Operational in Event Log on SharePoint Web Servers

Wow this one was a major tough one. Thanks to one of my colleagues for figuring it out.

We have a SharePoint farm consisting of 3 Front-ends, 1 Index server, and Cluster SQL 2005. The 3 Front-ends are using Microsoft NLB (WHICH IS EVIL BY THE WAY)! Anytime we got high usage on the front-ends we were slammed with "Server is not Operational" in the Application event log. Since the SharePoint site is using Forms Authentication we originally thought it was a problem with the LDAP provider. But couldn't make the case for that. After extensive searching my colleague found an older article about TCP wait time. Here is the link to that article.
http://www.port80software.com/200ok/archive/2004/12/07/205.aspx

It appears what was happening is that with all the flooding that happens with MS NLB it was causing issues with the connection the AD during heavy use periods. My colleague made this change on each front-end:

You must add the Tcp TimedWaitDelay REG_DWORD value to the HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Services\ Tcpip\Parameters registry subkey. Then, you set the delay to the number of seconds (in decimal form):Value Type: REG_DWORD-- Time in secondsValid Range: 30­300 (decimal)Default: 0xF0 (120 decimal)http://windowsitpro.com/article/articleid/23276/the-time_wait-states-effect-on-iis-performance.html This decreased the wait time reset value from 4 minutes to 2 minutes. After monitoring the system for a couple of days, Whala! No more errors. I sure hope this helps someone else out there!

No comments: