mercredi 8 décembre 2010

WMI leaks memory on Server 2008 R2 monitored agents

Here is something that a customer brought to my attention, and is probably impacting you already.

They noticed that WMI on some of their Server 2008R2 monitored agents was consuming a large amount of memory – and continually increasing. I started tracking this in SCOM by writing a rule to collect the Process\Private Bytes of all WMI processes (WmiPrvSE*) to check.

Sure enough – a handful (but not all strangely) of my Windows 2008 R2 monitored servers are exhibiting this behavior. Below is a graph where see can see most processes are consuming ~20MB or less, but some are steadily increasing – consuming 400MB of RAM or more.


These are some signs that this might be impacting you in OpsMgr:

You might get some alerts in the console like the following:

Workflow Runtime: Failed to run a process or script

The process started at 1:22:12 AM failed to create System.Discovery.Data. Errors found in output:

C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\AlertUpdateConnectorDiscovery.vbs(16, 1) SWbemObjectSet: No more threads can be created in the system.

Command executed: "C:\Windows\system32\cscript.exe" /nologo "AlertUpdateConnectorDiscovery.vbs" {A7504CAE-3EA5-5B1F-CDA4-A4593E4D85FD} {F8AEF188-D663-9719-3FD8-94B2AF6F0726} SQL2V1.opsmgr.net
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\

One or more workflows were affected by this.

Workflow name: AlertUpdateConnector.ConnectorDiscovery
Instance name: SQL2V1.opsmgr.net
Instance ID: {F8AEF188-D663-9719-3FD8-94B2AF6F0726}
Management group: PROD1

Or:

Workflow Runtime: Failed to run a WMI query

Object enumeration failed

Query: 'SELECT DisplayName, Name, StartMode FROM Win32_Service WHERE Name="ClusSvc" and StartMode!="Disabled"'
HRESULT: 0x80041006
Details: Out of memory

One or more workflows were affected by this.

Workflow name: Microsoft.Windows.Cluster.Service.Discovery
Instance name: SQL2CLN1.opsmgr.net
Instance ID: {90476733-8FA9-1718-152C-932FF9AB9BC6}
Management group: PROD1

Or:

Workflow Runtime: Failed to run a WMI query

Object enumeration failed


Query: 'SELECT NumberOfProcessors FROM Win32_ComputerSystem WHERE DomainRole >1'
HRESULT: 0x800705af
Details: The paging file is too small for this operation to complete.

One or more workflows were affected by this.

Workflow name: System.Mom.BackwardCompatibility.Computer.Server.DiscoveryRule
Instance name: SQLDB1.opsmgr.net
Instance ID: {AF7C2749-FF52-E354-EEAE-8CFCA3541607}
Management group: PROD1


The details of the script or discovery or workflow are irrelevant. What is relevant here is seeing the messages “No more threads can be created in the system” and Out of memory” and “The paging file is too small for this operation to complete”.

Those are tell-tale signs of a memory leak or memory pressure, and in this case caused by WMI.

Sure enough – when I check this system, I can easily see there is an issue:

image

If you are running Server 2008R2 on ANY monitored system, it is highly likely that you need to apply this hotfix.

It turns out there is a hotfix for Windows 2008 R2 – which addresses a possible leak when an application queries the Win32_Service class frequently. A monitoring tool would do this – and therefore OpsMgr can accelerate this leak in the OS.

http://support.microsoft.com/kb/981314

Aucun commentaire:

Enregistrer un commentaire