Here is something that a customer brought to my attention, and is probably impacting you already.
They noticed that WMI on some of their Server 2008R2 monitored agents was consuming a large amount of memory – and continually increasing. I started tracking this in SCOM by writing a rule to collect the Process\Private Bytes of all WMI processes (WmiPrvSE*) to check.
Sure enough – a handful (but not all strangely) of my Windows 2008 R2 monitored servers are exhibiting this behavior. Below is a graph where see can see most processes are consuming ~20MB or less, but some are steadily increasing – consuming 400MB of RAM or more.
These are some signs that this might be impacting you in OpsMgr:
You might get some alerts in the console like the following:
Workflow Runtime: Failed to run a process or script
The process started at 1:22:12 AM failed to create System.Discovery.Data. Errors found in output:
C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\AlertUpdateConnectorDiscovery.vbs(16, 1) SWbemObjectSet: No more threads can be created in the system.
Command executed: "C:\Windows\system32\cscript.exe" /nologo "AlertUpdateConnectorDiscovery.vbs" {A7504CAE-3EA5-5B1F-CDA4-A4593E4D85FD} {F8AEF188-D663-9719-3FD8-94B2AF6F0726} SQL2V1.opsmgr.net
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\4235\One or more workflows were affected by this.
Workflow name: AlertUpdateConnector.ConnectorDiscovery
Instance name: SQL2V1.opsmgr.net
Instance ID: {F8AEF188-D663-9719-3FD8-94B2AF6F0726}
Management group: PROD1
Or:
Workflow Runtime: Failed to run a WMI query
Object enumeration failed
Query: 'SELECT DisplayName, Name, StartMode FROM Win32_Service WHERE Name="ClusSvc" and StartMode!="Disabled"'
HRESULT: 0x80041006
Details: Out of memoryOne or more workflows were affected by this.
Workflow name: Microsoft.Windows.Cluster.Service.Discovery
Instance name: SQL2CLN1.opsmgr.net
Instance ID: {90476733-8FA9-1718-152C-932FF9AB9BC6}
Management group: PROD1
Or:
Workflow Runtime: Failed to run a WMI query
Object enumeration failed
Query: 'SELECT NumberOfProcessors FROM Win32_ComputerSystem WHERE DomainRole >1'
HRESULT: 0x800705af
Details: The paging file is too small for this operation to complete.One or more workflows were affected by this.
Workflow name: System.Mom.BackwardCompatibility.Computer.Server.DiscoveryRule
Instance name: SQLDB1.opsmgr.net
Instance ID: {AF7C2749-FF52-E354-EEAE-8CFCA3541607}
Management group: PROD1
The details of the script or discovery or workflow are irrelevant. What is relevant here is seeing the messages “No more threads can be created in the system” and “Out of memory” and “The paging file is too small for this operation to complete”.
Those are tell-tale signs of a memory leak or memory pressure, and in this case caused by WMI.
Sure enough – when I check this system, I can easily see there is an issue:
If you are running Server 2008R2 on ANY monitored system, it is highly likely that you need to apply this hotfix.
It turns out there is a hotfix for Windows 2008 R2 – which addresses a possible leak when an application queries the Win32_Service class frequently. A monitoring tool would do this – and therefore OpsMgr can accelerate this leak in the OS.
Aucun commentaire:
Enregistrer un commentaire