Oracle® Enterprise Manager Framework, Host, and Third-Party Metric Reference Manual 10g Release 2 (10.2) Part Number B16230-01 |
|
|
View PDF |
The host metrics provide description, collection statistics, data source, multiple thresholds (where applicable), and user action information for each metric.
This metric provides data on aggregate resource usage on a per project basis.
This metric is available only on Solaris version 9 and later.
The following table lists the metrics and their descriptions.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The data source for these metrics is Solaris CIM Object Manager.
Table 3-1 Aggregate Resource Usage Statistics (By Project)
Metric | Description |
---|---|
Cumulative CPU Wait Time (Seconds) |
Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime |
Cumulative Data Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in Data Page Faults over its lifetime |
Cumulative Major Page Faults |
Cumulative number of Major Page Faults engendered by the process over its lifetime |
Cumulative Minor Page Faults |
Cumulative number of Minor Page Faults engendered by the process over its lifetime |
Cumulative Number Character IO (bytes) Read and Written |
Cumulative number of character I/O bytes Read and Written by the process over its lifetime |
Cumulative Number of Blocks Read |
Cumulative number of blocks Read by the process over its lifetime |
Cumulative Number of Blocks Written |
Cumulative number of blocks Written by the process over its lifetime |
Cumulative Number of Involuntary Context Switches |
Cumulative number of Involuntary Context Switches made by the process over its lifetime |
Cumulative Number of Messages Received |
Cumulative number of Messages Received by the process over its lifetime |
Cumulative Number of Messages Sent |
Cumulative number of Messages Sent by the process over its lifetime |
Cumulative Number of Signals Received |
Cumulative number of Signals taken by the process over its lifetime |
Cumulative Number of System Calls Made |
Cumulative number of system calls made by the process over its lifetime |
Cumulative Number of Voluntary Context Switches |
Cumulative number of Voluntary Context Switches made by the process over its lifetime |
Cumulative Project Lock-Wait Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping on User Lock Waits over its lifetime |
Cumulative Project Other Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in all other ways over its lifetime |
Cumulative Stop Time (Seconds) |
Cumulative number of seconds that this process has spent Stopped over its lifetime |
Cumulative Swap Operations |
Cumulative number of swap operations engendered by the process over its lifetime |
Cumulative System Mode Time (Seconds) |
Cumulative number of seconds that this process has spent in System mode over its lifetime |
Cumulative System Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in System Page Faults over its lifetime |
Cumulative System Trap Time (Seconds) |
Cumulative number of seconds that this process has spent in System Traps over its lifetime |
Cumulative Text Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in Text Page Faults over its lifetime |
Cumulative User Mode Time (Seconds) |
Cumulative number of seconds that this process has spent in User mode over its lifetime |
Number of Processes Owned by Project |
Number of processes owned by the project measured in the aggregate |
Project CPU Time (%) |
Percent CPU time used by the process |
Project Process Memory Size (%) |
Ratio of the process resident set size to physical memory |
Project's Total Process Heap Size (KiloBytes) |
Total number of KiloBytes of memory consumed by the process heap at the time that it is sampled |
Project's Total Process Resident Set Size (KiloBytes) |
Resident set size of the process in kilobyte |
Project's Total Process Virtual Memory Size (KiloBytes) |
Resident set size of the process in kilobyte |
Total Number of Threads in Project's Processes |
Number of threads active in the current Process |
This metric provides data on aggregate resource usage on a per user basis.
This metric is available only on Solaris version 9 and later.
The following table lists the metrics and their descriptions.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The data source for these metrics is Solaris CIM Object Manager.
Table 3-2 Aggregate Resource Usage Statistics (By User)
Metric | Description |
---|---|
Cumulative CPU Wait Time (Seconds) |
Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime |
Cumulative Data Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent Waiting for CPU over its lifetime |
Cumulative Major Page Faults |
Cumulative number of Major Page Faults engendered by the process over its lifetime |
Cumulative Minor Page Faults |
Cumulative number of Minor Page Faults engendered by the process over its lifetime |
Cumulative Number Character IO (Bytes) Read and Written |
Cumulative number of character I/O bytes Read and Written by the process over its lifetime |
Cumulative Number of Blocks Read |
Cumulative number of blocks Read by the process over its lifetime |
Cumulative Number of Blocks Written |
Cumulative number of blocks Written by the process over its lifetime |
Cumulative Number of Involuntary Context Switches |
Cumulative number of Involuntary Context Switches made by the process over its lifetime |
Cumulative Number of Messages Received |
Cumulative number of Messages Received by the process over its lifetime |
Cumulative Number of Messages Sent |
Cumulative number of Messages Sent by the process over its lifetime |
Cumulative Number of Signals Received |
Cumulative number of Signals taken by the process over its lifetime |
Cumulative Number of System Calls Made |
Cumulative number of system calls made by the process over its lifetime |
Cumulative Number of Voluntary Context Switches |
Cumulative number of Voluntary Context Switches made by the process over its lifetime |
Cumulative Stop Time (Seconds) |
Cumulative number of seconds that this process has spent Stopped over its lifetime |
Cumulative Swap Operations |
Cumulative number of Swap Operations engendered by the process over its lifetime |
Cumulative System Mode Time (Seconds) |
Cumulative number of seconds that this process has spent in System mode over its lifetime |
Cumulative System Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in System Page Faults over its lifetime |
Cumulative System Trap Time (Seconds) |
Cumulative number of seconds that this process has spent in System Traps over its lifetime |
Cumulative Text Page Fault Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in Text Page Faults over its lifetime |
Cumulative User Lock-Wait Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping on User Lock Waits over its lifetime |
Cumulative User Mode Time (Seconds) |
Cumulative number of seconds that this process has spent in User mode over its lifetime |
Cumulative User Other Sleep Time (Seconds) |
Cumulative number of seconds that this process has spent sleeping in all other ways over its lifetime |
Number of Processes Owned by User |
Number of processes owned by the user measured in the aggregate |
Total Number of Threads in User's Processes |
Number of processes owned by the user measured in the aggregate |
User CPU Time (%) |
Percent CPU time used by the process |
User Process Memory Size (%) |
Ratio of the process resident set size to physical memory |
User's Total Process Heap Size (KiloBytes) |
Total number of kilobytes of memory consumed by the process heap at the time that it is sampled |
User's Total Process Resident Set Size (KiloBytes) |
Resident set size of the process in kilobytes |
User's Total Process Virtual Memory Size (KiloBytes) |
Size of the process virtual address space in kilobytes |
The Buffer Activity metric provides information about OS memory buffer usage. This metric reports buffer activity for transfers, accesses, and cache (kernel block buffer cache) hit ratios per second.
The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | not available |
HP Tru64 | table() system call |
IBM AIX | sar command |
Windows | not available |
The following table lists the metrics and their descriptions.
Table 3-3 Buffer Activity Metrics
Metric | Description |
---|---|
Buffer Cache Read Hit Ratio (%) |
Number of reads from block devices to buffer cache as a percentage of all buffer reads |
Buffer Cache Reads (per second) |
Number of reads performed on the buffer cache per second. Note: This metric is not available on HP Tru64. |
Buffer Cache Write Hit Ratio (%) |
Number of writes from block devices to buffer cache as a percentage of all buffer writes |
Buffer Cache Writes (per second) |
Number of writes performed on the buffer cache per second. Note: This metric is not available on HP Tru64. |
Physical I/O Reads (per second) |
Number of reads per second from character devices using physical I/O mechanisms |
Physical I/O Writes (per second) |
Number of writes per second from character devices using physical I/O mechanisms |
Physical Reads (per second) |
Number of reads performed per second from block devices to the system buffer cache |
Physical Writes (per second) |
Number of physical writes from block devices to the system buffer cache |
The CPU Usage metric provides information about the percentage of time the CPU was in various states, for example, idle state and wait state. The metric also provides information about the percentage of CPU time spent in user and system mode. All data is per-CPU in a multi-CPU system.
On HP Tru64, this information is available as the cumulative total for all the CPUs and not for each CPU which is monitored in the Load metric. Hence, this metric is not available on HP Tru64.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | kernel statistics (class cpu_stat) |
HP | pstat_getprocessor() system call |
Linux | /proc/stat |
HP Tru64 | not available |
IBM AIX | oracle_kstat() system call |
Windows | performance data counters |
The following table lists the metrics and their descriptions.
Table 3-4 CPU Usage Metrics
Metric | Description |
---|---|
CPU Idle Time (%) |
Represents the percentage of time that the CPU was idle and the system did not have an outstanding disk I/O request. This metric checks the percentage of processor time in idle mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system). |
CPU Interrupt Time (%) |
See Section 3.4.1, "CPU Interrupt Time (%)" Note: This metric is available only on Windows. |
CPU System Time (%) |
Represents the percentage of time that the CPU is running in system mode (kernel). This metric checks the percentage of processor time in system mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system). |
CPU User Time (%) |
Represents the portion of processor time running in user mode. This metric checks the percentage of processor time in user mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system). |
CPU Wait Time (%) |
Represents the percentage of time that the CPU was idle during which the system had an outstanding disk I/O request. This metric checks the percentage of processor time in wait mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system). Note: This metric is not available on Solaris and HP Tru64. |
Represents the percentage of time that the CPU receives and services hardware interruptions during representative intervals. This metric checks the percentage of processor time in interrupt mode for the CPU(s) specified by the Host CPU parameter, such as cpu_stat0, CPU0, or * (for all CPUs on the system).
This metric is available only on Windows.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "CPU Number" object.
If warning or critical threshold values are currently set for any "CPU Number" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "CPU Number" object, use the Edit Thresholds page. See the Editing Thresholds topic in the Enterprise Manager online help for information on accessing the Edit Thresholds page.
Data Source
The data sources for this metric are Performance Data counters.
This metric collects certain Cluster Ready Services (CRS) error messages and issues either WARNING or CRITICAL alerts based on the error codes.
Shows the name and full path of the Cluster Ready Services (CRS) alert log.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Collects CRS-1012, CRS-1201, CRS-1202 and CRS-1401, CRS-1402, CRS-1602 and CRS-1603 messages in the Cluster Ready Services (CRS) alert log at the host level.
CRS-1201, CRS-1401, CRS-1012 alert log messages trigger warning alerts.
CRS-1202, CRS-1402, CRS-1602 and CRS-1603 alert log messages trigger critical alerts.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-5 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
MATCH |
CRS-(1201|1401|1012) |
CRS-(1202|1402|1602|1603) |
1* |
%clusterwareErrStack% See %alertLogName% for details. |
* Once an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.
If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.
Collects CRS-1203, CRS-1205 and CRS-1206 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'CRS Resource Alert Log Error' alerts at critical level.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-6 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
MATCH |
Not Defined |
CRS-120(3|5|6) |
1* |
%resourceErrStack% See %alertLogName% for details. |
* Once an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.
If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.
Collects CRS-1009 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'OCR Alert Log Error' type alerts. OCR refers to Oracle Cluster Registry.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-7 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
MATCH |
Not Defined |
CRS-1009 |
1* |
%ocrErrStack% See %alertLogName% for details. |
* Once an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.
If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.
This metric monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS).
Monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS). A critical alert is raised for the nodeapp if its status is 'OFFLINE NOT RESTARTING'. A warning alert is raised for the nodeapp if its status is either 'UNKNOWN or OFFLINE'.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-8 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
MATCH |
UNKNOWN|OFFLINE |
OFFLINE NOT RESTARTING |
1 |
CRS resource %nodeapps% is %status% |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Nodeapp" object.
If warning or critical threshold values are currently set for any "Nodeapp" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Nodeapp" object, use the Edit Thresholds page.
User Action
Refer to the Real Application Clusters Administration and Deployment Guide for Node Applications startup and troubleshooting information.
This metric monitors whether there is a Virtual Internet Protocol (IP) relocation taking place. When a Virtual IP is relocated from the host (node) on which it was originally configured, a critical alert is generated.
Shows the current host (node) on which the Virtual Internet Protocol (IP) is configured.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 5 Minutes |
Shows whether the Virtual Internet Protocol (IP) has relocated from the host (node) where it was originally configured. The value is TRUE if relocation happened. Otherwise it is FALSE. When the value is TRUE, a critical alert is raised.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-9 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
= |
Not Defined |
TRUE |
1 |
CRS resource %vip% was relocated to %current_node% |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Virtual IP Name" object.
If warning or critical threshold values are currently set for any "Virtual IP Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Virtual IP Name" object, use the Edit Thresholds page.
The Disk Activity metric monitors the hard disk activity on the target being monitored. For each device on the system, this metric provides information about access to the device. This information includes: device name, disk utilization, write statistics, and read statistics for the device.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | kernel statistics (class kstat_io) |
HP | pstat_getdisk system call |
Linux | iostat command |
HP Tru64 | table() system call |
IBM AIX | oracle_kstat() system call |
Windows | performance data counters |
The following table lists the metrics and their descriptions.
Table 3-10 Disk Activity Metrics
Metric | Description |
---|---|
Average Disk I/O Service Time (ms) |
|
Average Disk I/O Wait Time (ms) |
See Section 3.8.2, "Average Disk I/O Wait Time (ms)". Note: This metric is not available on Linux. |
Average Outstanding Disk I/O Requests |
Represents the average number of commands waiting for service (queue length). Note: This metric is not available on Linux. |
Average Run Time (ms) |
Represents the average time spent by the command on the active queue waiting for its execution to be completed. Note: This metric is not available on Linux. |
Disk Block Writes (per second) |
Represents the number of blocks (512 bytes) written per second. Note: This metric is not available on HP. |
Disk Block Reads (per second) |
Represents the number of blocks (512 bytes) read per second. Note: On HPUNIX, this metric is named Disk Blocks Transferred (per second). |
Disk Device Busy (%) |
See Section 3.8.3, "Disk Device Busy (%)". Note: On HPUNIX, this metric is named Device Busy (%). |
Disk Reads (per second) |
Represents the disk reads per second for the specified disk device. Note: This metric is not available on HP. |
Disk Writes (per second) |
Represents the disk writes per second for the specified disk device. Note: This metric is not available on HP. |
Represents the sum of average wait time and average run time.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-11 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
6 |
Average service time for disk %keyvalue% is %value% ms, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Disk Device" object.
If warning or critical threshold values are currently set for any "Disk Device" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Disk Device" object, use the Edit Thresholds page.
User Action
This number should be low. A high number can indicate a disk that is slow due to excessive load or hardware issues. See also the CPU in IO-Wait (%) metric.
Represents the average time spent by the command waiting on the queue for getting executed.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Disk Device" object.
If warning or critical threshold values are currently set for any "Disk Device" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Disk Device" object, use the Edit Thresholds page.
User Action
A high figure indicates a slow disk. Use the OS iostat -xn command to check wait time and service time for local disks and NFS mounted file systems. See also the CPU in IO-Wait (%) metric.
Represents the amount of disk space utilization as a percentage of capacity.
Note: On HPUNIX, this metric is named Device Busy (%).
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-12 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
> |
80 |
95 |
6 |
Disk Device %keyValue% is %value%%% busy. |
The Disk Device Errors metric provides the number of errors on the disk device.
These metrics are available only on Solaris.
Note:
For all target versions, the collection frequency for each metric is every 72 hours.The data source for these metrics is Solaris iostat -e command.
Table 3-13 Disk Device Errors Metrics
Metric | Description |
---|---|
Hard Errors |
Represents the error count of hard errors encountered while accessing the disk. Hard errors are considered serious and may be traced to misconfigured or bad disk devices. |
Soft Errors |
Represents the error count of soft errors encountered while accessing the disk. Soft errors are synonymous to warnings. |
Total |
Represents the sum of all errors on the particular device. |
Transport Errors |
Represents the error count of network errors encountered. This generally indicates a problem with the network layer |
The Fans metric monitors the status of various fans present in the system.
This metric is available only on Dell Poweredge Linux Systems.
Represents the status of the fan.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-14 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of Fan at device %FanIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Fan Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Fan Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Fan Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: coolingDeviceStatus (1.3.6.1.4.1.674.10892.1.700.12.1.5)
Provides a description of the location of the fan. Example values are "CPU Fan", "PCI Fan", and "Memory Fan".
This metric is available only on Dell Poweredge Linux Systems.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 15 Minutes |
Data Source
SNMP MIB object: coolingDeviceLocationName (1.3.6.1.4.1.674.10892.1.700.12.1.8)
The File Access System Calls metric provides information about the usage of file access system calls.
This metric is available on Solaris, HP, and IBM AIX.
Represents the number of file system blocks read per second performing direct lookup.
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
IBM AIX | sar command |
The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of lookuppn() calls made over this five-second period divided by five.
Represents the number of system iget() calls made per second. iget is a file access system routine.
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | kernel memory structure (class cpu_vminfo |
HP | sar command |
IBM AIX | kernel memory structure (class cpu_vminfo |
User Action
This data is obtained using the OS sar command, which is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of iget() calls made over this five-second period divided by five.
Represents the number of file system lookuppn() (pathname translation) calls made per second.
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
IBM AIX | sar command |
The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of lookuppn() calls made over this five-second period divided by five.
The File and Directory Monitoring metric monitors various attributes of specific files and directories. Setting of key value specific thresholds triggers the monitoring of files or directories referred to in the given key value. The operator must specify key value specific thresholds to monitor any file or directory.
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points |
HP | perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points |
Linux | perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points |
HP Tru64 | not available |
IBM AIX | perl stat command for files; df for directories that are file system mount points; du for directories that are not file system mount points |
Windows | not available |
Reports issues encountered in fetching the attributes of the file or directory. Errors encountered in monitoring the files and directories specified by the key value based thresholds are reported.
Note: This metric is not available on IBM AIX.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-15 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
!= |
Not Defined |
0 |
1 |
%file_attribute_not_found% . |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.
If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.
Fetches the octal value of file permissions on the different variations of UNIX operating systems including Linux. Setting a key value specific warning or critical threshold value against this metric would result in the monitoring of a critical file or directory. For example, to monitor the file permissions for file name /etc/passwd, you should set a threshold for /etc/passwd.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-16 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
!= |
Not Defined |
Not Defined |
1 |
Current permissions for %file_name% are %file_permissions%, different from warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.
If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.
Fetches the current size of the given file or directory in megabytes. Setting a key value specific warning or critical threshold value against this metric would result in monitoring of a critical file or directory. For example, to monitor the file permissions for directory /absolute_directory_path, you should set a threshold for /absolute_directory_path.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-17 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
1 |
Size of %file_name% is %file_size% MB, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.
If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.
Data Source
The data sources for this metric include the following:
Provides the value for the rate at which the file�s size is changing. Setting a key value specific warning or critical threshold value against this metric would result in monitoring of the critical file or directory. For example, to monitor the file change rate for the file name /absolute_file_path, the operator should set a threshold for /absolute_file_path.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-18 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
1 |
%file_name% is growing at the rate of %file_sizechangerate% (KB/hour), crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "File or Directory Name" object.
If warning or critical threshold values are currently set for any "File or Directory Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "File or Directory Name" object, use the Edit Thresholds page.
The Filesystems metrics provide information about local file systems on the computer.
Represents the name of the disk device resource.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 15 Minutes |
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | /etc/mnttab file entries |
HP | bdf command |
Linux | df command |
HP Tru64 | df command |
IBM AIX | /etc/mnttab file entries |
Windows | not available |
Represents the total space (in megabytes) allocated in the file system.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 15 Minutes |
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | vminfo system |
HP | bdf command |
Linux | df command |
HP Tru64 | df command |
IBM AIX | stavfs() system call |
Windows | not available |
Represents the percentage of free space available in the file system.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-19 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every 24 Samples |
< |
20 |
5 |
1 |
Filesystem %keyValue% has %value%%% available space, fallen below warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Mount Point" object.
If warning or critical threshold values are currently set for any "Mount Point" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Mount Point" object, use the Edit Thresholds page.
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | stavfs() system call |
HP | bdf command |
Linux | df command |
HP Tru64 | df command |
IBM AIX | stavfs() system call |
Windows | Windows API |
User Action
Use the OS du -k command to check which directories are taking up the most space (du -k|sort -rn).
The Inventory metric is used for periodic collection of host configuration information. By default, host configuration is collected every 24 hours.
The Kernel Memory metric provides information on kernel memory allocation (KMA) activities.
This metric is available only on Solaris. The data source is the sar
command. The data is obtained by sampling system counters once in a five-second interval.
The following table lists the metrics and their descriptions.
Table 3-20 Kernel Memory Metrics
Metric | Description |
---|---|
Failed Requests for Large Kernel Memory |
Number of requests for large memory that failed, that is, requests that were not satisfied |
Failed Requests for Oversize Kernel Memory |
Number of oversized requests made that could not be satisfied. Oversized memory requests are allocated dynamically so there is no pool for such requests |
Failed Requests for Small Kernel Memory |
Number of requests for small memory that failed, that is, requests that were not satisfied |
KMA Available for Large Memory Requests (Bytes) |
Amount of memory, in bytes, the kernel memory allocation (KMA) has for the large pool; the pool used for allocating and reserving large memory requests. |
KMA for Oversize Memory Requests (Bytes) |
Amount of memory allocated for oversized memory requests |
KMA for Small Memory Requests |
Amount of memory, in bytes, the Kernel Memory Allocation has for the small pool; the pool used for allocating and reserving small memory requests |
Memory Allocated for Large Memory Requests (Bytes) |
Amount of memory, in bytes, the kernel allocated to satisfy large memory requests |
Memory Allocated for Small Memory Requests (Bytes |
Amount of memory, in bytes, the kernel allocated to satisfy small memory requests |
The Load metric provides information about the number of runnable processes on the system run queue. If this is greater than the number of CPU's on the system, then excess load exists.
Note:
For all target versions, the collection frequency for each metric is every 5 minutes.The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | kernel statistics |
HP | pstat_getdynamic(), pstat_getprocessor(), pstat_getproc(), pstat_getstatic(), getutent(), pstat_getvminfo() system calls |
Linux | uptime, free, getconf, ps, iostat, sar, w OS commands; /proc/stat |
HP Tru64 | table() system call, uptime, vmstat, psrinfo, ps, who, swapon OS commands |
IBM AIX | oracle_kstat(), getutent(), getproc(), sysconf() system calls |
Windows | performance data counters (unless noted) (unless otherwise noted) |
The following table lists the metrics and their descriptions.
Table 3-21 Load Metrics
Metric | Description |
---|---|
CPU in IO-Wait (%) |
|
CPU in System Mode (%) |
For UNIX-based platforms, this metric represents the amount of CPU being used in SYSTEM mode as a percentage of total CPU processing power. For Windows, this metric represents the percentage of time the process threads spent executing code in privileged mode. |
CPU in User Mode (%) |
For UNIX-based platforms, this metric represents the amount of CPU being used in USER mode as a percentage of total CPU processing power. For Windows, this metric represents the percentage of time the processor spends in the user mode. This metric displays the average busy time as a percentage of the sample time. |
CPU Interrupt Time (%) |
See Section 3.16.2, "CPU Interrupt Time (%)". Note: This metric is available only on Windows. |
CPU Queue Length |
See Section 3.16.3, "CPU Queue Length". Note: This metric is available only on Windows. |
CPU Utilization (%) |
|
Free Memory (%) |
Amount of free memory as a percentage of total memory. The data source for Windows host is Windows API. |
Longest Service Time (ms) |
Maximum of the average service time of all disks. Units are represented in milliseconds. Note: This metric is not available on Windows. |
Memory Page Scan Rate (per second) |
|
Memory Utilization (%) |
|
Page Transfers Rate |
See Section 3.16.7, "Page Transfers Rate". Note: This metric is available only on Windows. |
Run Queue Length (1 minute average) |
See Section 3.16.8, "Run Queue Length (1 minute average)". Note: This metric is not available on Windows. |
Run Queue Length (5 minute average) |
See Section 3.16.10, "Run Queue Length (5 minute average)". Note: This metric is not available on Windows. |
Run Queue Length (15 minute average) |
See Section 3.16.9, "Run Queue Length (15 minute average)". Note: This metric is not available on Windows. |
Swap Utilization (%) |
|
Total Disk I/O Per Second |
Rate of I/O (read and write) operations, calculated from all disks. Note: This metric is not available on Windows. |
Total Processes |
Total number of processes currently running on the system. |
Total Swap, Kilobytes |
Total amount of page file space available to be allocated by processes. Paging files are shared by all processes and the lack of space in paging files can prevent processes from allocating memory. Note: This metric is available only on Windows. The data sources for this metric are Performance Data counters and Windows API GlobalMemoryStatusEx. |
Total Users |
Represents the total number of users currently logged into the system. This metric checks the number of users running on the system. Note: This metric is not available on Windows. |
Used Swap, Kilobytes |
Size in kilobytes of the page file instance used. Note: This metric is available only on Windows. The data sources for this metric are Performance Data counters and Windows API GlobalMemoryStatusEx. |
Represents the average number of jobs waiting for I/O in the last interval.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-22 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
40 |
80 |
6 |
CPU I/O Wait is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
User Action
A high percentage of I/O wait can indicate a hardware problem, a slow NFS server, or poor load-balancing among local file systems and disks. Check the system messages log for any hardware errors. Use the iostat -xn command or the nfsstat -c (NFS client-side statistics) command or both to determine which disks or file systems are slow to respond. Check to see if the problem is with one or more swap partitions, as lack of swap or poor disk load balancing can cause these to become overloaded. Depending on the specific problem, fixes may include: NFS client or server tuning, hardware replacement, moving applications to other file systems, adding swap space, or restructuring a file system for better performance.
Represents the percentage of time the processor spends receiving and servicing hardware interrupts during sample intervals. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended during interrupts. Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity. Suspends normal thread execution during interrupts.
This metric is available only on Windows.
Data Source
The data sources for this metric are Performance Data counters.
Processor Queue Length is the number of ready threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent on the workload.
This metric is available only on Windows.
Data Source
The data sources for this metric are Performance Data counters.
User Action
A consistently high value indicates a number of CPU bound tasks. This information should be corelated with other metrics such as Page Transfer Rate. Tuning the system, accompanied with additional memory, should help.
For UNIX-based platforms, this metric represents the amount of CPU utilization as a percentage of total CPU processing power available.
For Windows, this metric represents the percentage of time the CPU spends to execute a non-Idle thread. CPU Utilization (%) is the primary indicator of processor activity.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-23 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
80 |
95 |
6 |
CPU Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
For UNIX-based systems, this metric represents the number of pages per second scanned by the page stealing daemon.
For Windows, this metric represents the rate at which pages are read from or written to disk to resolve hard page faults. The metric is a primary indicator of the kinds of faults that cause system-wide delays.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-24 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
6 |
Page scan rate is %value% /sec, crossed warning (%warning_threshold% /sec) or critical (%critical_threshold% /sec) threshold. |
User Action
If this number is zero or close to zero, then you can be sure the system has sufficient memory. If scan rate is always high, then adding memory will definitely help.
Represents the amount of free memory as a percentage of total memory.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-25 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
99 |
Not Defined |
6 |
Memory Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
For the Windows host, the data source is the Windows API.
Indicates the rate at which pages are read from or written to disk to resolve hard page faults. It is a primary indicator of the kinds of faults that cause systemwide delays. It is counted in numbers of pages. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
This metric is available only on Windows.
Data Source
The data sources for this metric are Windows Performance counters.
User Action
High transfer rates indicate a memory contention. Adding memory would help.
Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.
This metric is not available on Windows.
User Action
Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.
Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.
This metric is not available on Windows.
User Action
Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.
Represents the average number of processes in memory and subject to be run in the last interval. This metric checks the run queue.
This metric is not available on Windows.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-26 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
10 |
20 |
6 |
CPU Load is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
User Action
Check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time.
For UNIX-based platforms, this metric represents the percentage of swapped memory in use for the last interval.
For Windows, this metric represents the percentage of page file instance used.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-27 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
80 |
95 |
6 |
Swap Utilization is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data sources for the Windows host are Windows API and performance data counters.
User Action
For UNIX-based platforms, check the swap usage using the UNIX top command or the Solaris swap -l command. Additional swap can be added to an existing file system by creating a swap file and then adding the file to the system swap pool. (See documentation for your UNIX OS). If swap is mounted on /tmp, space can be freed by removing any junk files in /tmp. If it is not possible to add file system swap or free up enough space, additional swap will have to be added by adding a raw disk partition to the swap pool. See UNIX documentation for procedures.
For Windows, check the page file usage and add an additional page file if current limits are insufficient.
The Log File Monitoring metric allows the operator to monitor one or more log files for the occurrence of one or more perl patterns in the content. In addition, the operator can specify a perl pattern to be ignored for the log file. Periodic scanning will be performed against new content added since the last scan, lines matching the ignore pattern will be ignored first, then lines matching specified match patterns will result in one record being uploaded to the repository for each pattern. The user can set a threshold against the number of lines matching the given pattern. File rotation will be handled within the given file.
Returns the actual content if the given file has been specifically registered for content uploading, else it will return the count of lines that matched the pattern specified.
The operator can list the names of files or directories to be never monitored in <EMDROOT>/sysman/config/lfm_efiles file. The operator can list the names of the files or directories whose contents can be uploaded into Oracle Management Repository in <EMDROOT>/sysman/config/lfm_ifiles file.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 15 Minutes |
Data Source
Oracle provided perl program that scans files for the occurrence of user specified perl patterns.
Returns the number of lines matching the pattern specified in the given file. Setting warning or critical thresholds against this column for a specific {log file name, match pattern in perl, ignore pattern in perl} triggers the monitoring of specified criteria against the given log file.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-28 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every Sample |
> |
0 |
Not Defined |
1* |
%log_file_message% Crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
* Once an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects.
If warning or critical threshold values are currently set for any unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Log File Name", "Match Pattern in Perl", "Ignore Pattern in Perl", and "Time Stamp" objects, use the Edit Thresholds page.
Data Source
Oracle supplied perl program monitors the log files for user specified criteria.
The Memory Devices metric monitors the status of memory devices configured in the system.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the metrics, descriptions, and data sources.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.Table 3-29 Memory Devices Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Bank Location |
Bank location name of the memory device, when applicable |
memoryDeviceBankLocationName (1.3.6.1.4.1.674.10892.1.1100.50.1.10) |
Location |
Location name of the memory device, for example, "DIMM A". |
memoryDeviceLocationName (1.3.6.1.4.1.674.10892.1.1100.50.1.8) |
Memory |
Section 3.18.1, "Memory Status" |
|
Size (MB) |
Size, in kilobytes, of the memory device |
memoryDeviceSize (1.3.6.1.4.1.674.10892.1.1100.50.1.14) |
Type |
Type of the memory device |
memoryDeviceType (1.3.6.1.4.1.674.10892.1.1100.50.1.7) |
Represents the status of the memory device.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-30 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of Memory at bank location %MemoryBankLocation% and location %MemoryLocation% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis" and "Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis" and "Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis" and "Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: memoryDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.50.1.5)
The Message and Semaphore Activity metric provides information about the message and semaphore activity of the host system being monitored.
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | not available |
HP Tru64 | ipcs command |
IBM AIX | sar command |
Windows | not available |
The following table lists the metrics and their descriptions.
Table 3-31 Message and Semaphore Activity
Metric | Description |
---|---|
msgrcv() System Calls (per second) |
Number of msgrcv system calls made per second. The msgrcv system call reads a message from one queue to another user-defined queue. |
semop() System Calls (per second) |
Number of semop system calls made per second. The semop system call is used to perform semaphore operations on a set of semaphores. |
The Network Interfaces metric includes input errors and interface collisions on the network interface. The following network interfaces are supported: le, hme, qfe, ge, and fddi.
Note:
For all target versions, the collection frequency for each metric is every 5 minutes.Data Source
The data sources for the metrics in this category include the following:
Host | Data Source |
---|---|
Solaris | kernel memory structures (kstat) |
HP | netstat, lanscan, and lanadmin commands |
Linux | netstat command and /proc/net/dev |
HP Tru64 | netstat command |
IBM AIX | oracle_kstat() system call |
Windows | not available |
User Action
Use the OS netstat -i command to check the performance of the interface. Also, check the system messages file for messages relating to duplex setting by using the OS grep -i command and searching for the word 'duplex'.
Metrics and Descriptions
The following table lists the metrics and their descriptions.
Table 3-32 Network Interfaces Metrics
Metric | Description |
---|---|
Network Interface Input Errors (%) |
Number of input errors, per second, encountered on the device for unsuccessful reception due to hardware/network errors. This metric checks the rate of input errors on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces). |
Network Interface Collisions (%) |
Number of collisions per second. This metric checks the rate of collisions on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces). |
Network Interface Combined Utilization (%) |
See Section 3.20.1, "Network Interface Combined Utilization (%)" |
Network Interface Output Errors (%) |
Number of output errors per second. This metric checks the rate of output errors on the network interface specified by the network device names parameter, such as le0 or * (for all network interfaces). |
Network Interface Read (MB/s) |
Amount of megabytes per second read from the specific interface |
Network Interface Read Utilization (%) |
Amount of network bandwidth being used for reading from the network as a percentage of total read capacity |
Network Interface Total Error Rate (%) |
See Section 3.20.2, "Network Interface Total Error Rate (%)" |
Network Interface Total I/O Rate (MB/sec) |
See Section 3.20.3, "Network Interface Total I/O Rate (MB/sec)" |
Network Interface Write (MB/s) |
Amount of megabytes per second written to the specific interface |
Network Interface Write Utilization (%) |
Amount of network bandwidth being used for writing to the network as a percentage of total read capacity. |
Represents the percentage of network bandwidth being used by reading and writing from and to the network for full-duplex network connections.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-33 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
6 |
Network utilization for %keyvalue% is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.
If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.
Represents the number of total errors per second, encountered on the network interface. It is the rate of read and write errors encountered on the network interface.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-34 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
6 |
Network Error Rate for %keyvalue% is %value%%%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.
If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.
Data Source
It is computed as the sum of Network Interface Input Errors (%) and Network Interface Output Errors (%).
Represents the total I/O rate on the network interface. It is measured as the sum of Network Interface Read (MB/s) and Network Interface Write (MB/s).
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-35 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
6 |
Network I/O Rate for %keyvalue% is %value%MB/Sec, crossed warning (%warning_threshold%MB/Sec) or critical (%critical_threshold%MB/Sec) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each "Network Interface Name" object.
If warning or critical threshold values are currently set for any "Network Interface Name" object, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each "Network Interface Name" object, use the Edit Thresholds page.
Data Source
It is computed as the sum of Network Interface Read (MB/s) and Network Interface Write (MB/s).
The Paging Activity metric provides the amount of paging activity on the system.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.Data Source
The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | kernel statistics (class misc cpu_stat) |
HP | pstat_getvminfo() system call |
Linux | sar command |
HP Tru64 | table(() system call and vmstat command |
IBM AIX | oracle_kstat() system call |
Windows | performance data counters |
Metrics and Descriptions
The following table lists the metrics and their descriptions:
Table 3-36 Paging Activity Metrics
Metric | Description |
---|---|
Address Translation Page Faults (per second) |
Minor page faults by way of hat_fault() per second. This metric checks the number of faults for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). Note: This metric is not available on Linux and Windows. |
Cache Faults |
Rate at which faults occur when a page sought in the file system cache is not found and must be retrieved from elsewhere in memory (a soft fault) or from disk (a hard fault). The file system cache is an area of physical memory that stores recently used pages of data for applications. Cache activity is a reliable indicator of most application I/O operations. This metric shows the number of faults, without regard for the number of pages faulted in each operation. Note: This metric is available only on Windows. |
Copy-on-write Faults (per second) |
Rate at which page faults are caused by attempts to write that have been satisfied by coping of the page from elsewhere in physical memory. This is an economical way of sharing data since pages are only copied when they are written to; otherwise, the page is shared. This metric shows the number of copies, without regard for the number of pages copied in each operation. Note: This metric is available only on Windows. |
Demand Zero Faults (per second) |
Rate at which a zeroed page is required to satisfy the fault. Zeroed pages, pages emptied of previously stored data and filled with zeros, are a security feature of Windows that prevent processes from seeing data stored by earlier processes that used the memory space. Windows maintains a list of zeroed pages to accelerate this process. This metric shows the number of faults, without regard to the number of pages retrieved to satisfy the fault. Note: This metric is available only on Windows. |
igets with Page Flushes (%) |
Represents the percentage of UFS inodes taken off the freelist by iget which had reusable pages associated with them. These pages are flushed and cannot be reclaimed by processes. Note: This metric is available on Solaris, HP, and IBM AIX. |
Page Faults (per second) |
Average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation, hence this is also equal to the number of page fault operations. This metric includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays. Note: This metric is available only on Windows. |
Page Faults from Software Lock Requests |
Represents the number of protection faults per second. These faults occur when a program attempts to access memory it should not access, receives a segmentation violation signal, and dumps a core file. This metric checks the number of faults for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). Note: This metric is not available on Linux or Windows. |
Page-in Requests (per second) |
For UNIX-based systems, represents the number of page read ins per second (read from disk to resolve fault memory references) by the virtual memory manager. Along with Page Outs, this statistic represents the amount of real I/O initiated by the virtual memory manager. This metric checks the number of page read ins for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). For Windows, this metric is the rate at which the disk was read to resolve hard page faults. It shows the number of reads operations, without regard to the number of pages retrieved in each operation. Hard page faults occur when a process references a page in virtual memory that is not in working set or elsewhere in physical memory, and must be retrieved from disk. This metric is a primary indicator of the kinds of faults that cause systemwide delays. It includes read operations to satisfy faults in the file system cache (usually requested by applications) and in non-cached mapped memory files. Note: This metric is not available on Linux. |
Page-out Requests (per second) |
For UNIX-based systems, represents the number of page write outs to disk per second. This metric checks the number of page write outs for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). For Windows, this metric is the rate at which pages are written to disk to free up space in physical memory. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data, not code. This metric shows write operations, without regard to the number of pages written in each operation. Note: This metric is not available on Linux. |
Pages Paged-in (per second) |
For UNIX-based systems, represents the number of pages paged in (read from dirk to resolve fault memory references) per second. This metric checks the number of pages paged in for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). For Windows, this metric is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. |
Pages Paged-out (per second) |
For UNIX-based systems, represents the number of pages written out (per second) by the virtual memory manager. Along with Page Outs, this statistic represents the amount of real I/O initiated by the virtual memory manager. This metric checks the number of pages paged out for the CPU(s) specified by the Host CPU(s) parameter, such as cpu_stat0 or * (for all CPUs on the system). For Windows, this metric is the rate at which pages are written to disk to free up space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, not code. A high rate of pages output might indicate a memory shortage. Windows writes more pages back to disk to free up space when physical memory is in short supply. |
Pages Put on Freelist by Page Stealing Daemon (per second) |
Number of pages that are determined unused, by the pageout daemon (also called the page stealing daemon), and put on the list of free pages. Note: This metric is not available on Linux and Windows. |
Pages Scanned by Page Stealing Daemon (per second) |
Represents the scan rate is the number of pages per second scanned by the page stealing daemon. If this number is zero or closer to zero, then you can be sure the system has sufficient memory. If the number is always high, then adding memory will definitely help. Note: This metric is not available on Linux and Windows. |
Transition Faults (per second) |
Rate at which page faults are resolved by recovering pages that were being used by another process sharing the page, or were on the modified page list or the standby list, or were being written to disk at the time of the page fault. The pages were recovered without additional disk activity. Transition faults are counted in numbers of faults; because only one page is faulted in each operation, it is also equal to the number of pages faulted. Note: This metric is available only on Windows. |
The Peripheral Component Interconnect (PCI) Devices metric monitors the status of PCI devices.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, their descriptions, and user actions.
Table 3-37 PCI Devices Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Description |
Descriptive name of the Dell Peripheral Component Interconnect (PCI) Device |
pCIDeviceDescriptionName (1.3.6.1.4.1.674.10892.1.1100.80.1.9) |
Manufacturer |
Name of the Dell Peripheral Component Interconnect (PCI) Device manufacturer |
pCIDeviceManufacturerName (1.3.6.1.4.1.674.10892.1.1100.80.1.8) |
PCI Device Status |
Represents the status of the Dell Peripheral Component Interconnect (PCI) Device.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-38 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of PCIDevice %PCIDeviceIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index", "PCI Device Index", and "System Slot Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: pCIDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.80.1.5)
The Power Supplies metric monitors the status of various power supplies present in the host system.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, their descriptions, and user actions.
Table 3-39 Power Supplies Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Location |
Location name of the power supply |
powerSupplyLocationName (1.3.6.1.4.1.674.10892.1.600.12.1.8 |
Output (Tenths of Watts) |
maximum sustained output wattage of the power supply, in tenths of watts |
powerSupplyOutputWatts (1.3.6.1.4.1.674.10892.1.600.12.1.6) |
Power Supply Status |
Represents the status of the power supply.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-40 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of Power Supply %PSIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Power Supply Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Power Supply Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Power Supply Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: powerSupplyStatus (1.3.6.1.4.1.674.10892.1.600.12.1.5)
The Process, Inode, File Tables Stats metric provides information about the process, inode, and file tables status.
Data Source
The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | sar command, for example, sar -v |
HP Tru64 | table() system call |
IBM AIX | sar command |
Windows | not available |
The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval.
Metrics and Descriptions
The following table lists the metrics and their descriptions.
Table 3-41 Process, Inode, File Tables Statistics Metrics
Metric | Description |
---|---|
File Table Overflow Occurrences |
Number of times the system file table overflowed, that is, the number of times that the OS could not find any available entries in the table in the sampling period chosen to collect the data. Note: This metric is not available on Linux or Windows. |
Inode Table Overflow Occurrences |
Number of times the inode table overflowed, that is, the number of times the OS could not find any available inode table entries. Note: This metric is not available on Linux or Windows. |
Maximum Size of Inode Table |
Maximum size of the inode table. Note: This metric is not available on Linux or Windows. |
Maximum Size of Process Table |
Maximum size of the process table. Note: This metric is not available on Linux or Windows. |
Maximum Size of System File Table |
Maximum size of the system file table. Note: This metric is not available on Linux or Windows. |
Number of Allocated Disk Quota Entries |
Number of allocated disk quota entries. Note: This metric is available only on Linux. |
Number of Queued RT Signals |
Number of queued RT signals. Note: This metric is available only on Linux. |
Number of Super Block Handlers Allocated |
Number of allocated super block handlers. Note: This metric is available only on Linux. |
Number of Used File Handles |
Current size of the system file table. |
Percentage of Allocated Disk Quota Entries |
Percentage Of Allocated Disk Quota Entries against the maximum number of cached disk quota entries that can be allocated. Note: This metric is available only on Linux. |
Percentage of Allocated Super Block Handlers |
Percentage Of Allocated Super Block Handlers against the maximum number of super block handlers that Linux can allocate. Note: This metric is available only on Linux. |
Percentage of Queued RT Signals |
Percentage of queued RT signals. Note: This metric is available only on Linux. |
Percentage of Used File Handles |
Percentage of used file handles against the maximum number of file handles that the Linux kernel can allocate. Note: This metric is available only on Linux. |
Process Table Overflow Occurrences |
Number of times the process table overflowed, that is, the number of times the OS could not find any process table entries in a five-second interval. Note: This metric is not available on Linux or Windows. |
Size of Inode Table |
Current size of the inode table. |
Size of Process Table |
Current size of the process table. Note: This metric is not available on Linux or Windows |
The Processors metric monitors the state of each CPU in the host.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, descriptions, and data sources.
Table 3-42 Processors Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Family |
Family of the Dell process device |
processorDeviceFamily (1.3.6.1.4.1.674.10892.1.1100.30.1.10) |
Manufacturer |
Name of the manufacturer of the Dell processor |
processorDeviceManufacturerName (1.3.6.1.4.1.674.10892.1.1100.30.1.8) |
Processor Status |
||
Speed (MHz) |
current speed of the Dell processor device in Mega Hertz (MHz). A value of zero indicates the speed is unknown. |
processorDeviceCurrentSpeed (1.3.6.1.4.1.674.10892.1.1100.30.1.12) |
Version |
Version of the Dell processor |
processorDeviceVersionName (1.3.6.1.4.1.674.10892.1.1100.30.1.16) |
Represents the status of the Dell processor device.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-43 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of Processor %ProcessorIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Processor Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Processor Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Processor Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: processorDeviceStatus (1.3.6.1.4.1.674.10892.1.1100.30.1.5)
The Program Resource Utilization metric provides flexible resource monitoring functionality. The operator must specify the criteria for the programs to be monitored by specifying key value specific thresholds. Values for the key value columns {program name, owner} define the unique criteria to be monitored for resource utilization in the system.
By default, no programs will be tracked by this metric. Key Values entered as part of a key value specific threshold setting define the criteria for monitoring and tracking.
Note:
For all target versions, the collection frequency for each metric is every 5 minutes.The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | ps command |
HP | ps command |
Linux | ps command |
HP Tru64 | ps command |
IBM AIX | ps command |
Windows | performance data counters |
The following table lists the metrics and their descriptions.
Table 3-44 Program Resource Utilization Metrics
Metric | Description |
---|---|
List of PIDs |
This metric is only available on Solaris. |
Program's Max CPU Time Accumulated (Minutes) |
See Section 3.26.1, "Program's Max CPU Time Accumulated (Minutes)" |
Program's Max CPU Time Accumulated PID |
Identifier of the process that has accumulated the most CPU time matching the {program name, owner} key value criteria |
Program's Max CPU Utilization (%) |
|
Program's Max CPU Utilization PID |
Identifier of the process with the maximum percentage of CPU utilized matching the {program name, owner} key value criteria since last scan |
Program's Max Process Count |
|
Program's Max Resident Memory (MB) |
|
Program's Max Resident Memory PID |
Identifier of the process with the maximum resident memory occupied by a single process matching the {program name, owner} key value criteria |
Program's Min Process Count |
|
Program's Total CPU Time Accumulated (Minutes) |
See Section 3.26.6, "Program's Total CPU Time Accumulated (Minutes)" |
Program's Total CPU Utilization (%) |
Represents the maximum CPU time accumulated by the most active process matching the {program name, owner} key value criteria.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-45 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
%prog_max_cpu_time_pid% process running program %prog_name% has accumulated %prog_max_cpu_time% minutes of cpu time. This duration crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Represents the maximum percentage of CPU utilized by a single process matching the {program name, owner} key value criteria since last scan.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-46 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
Process %prog_max_cpu_util_pid% running program %prog_name% is utilizing %prog_max_cpu_util%%% cpu. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Fetches the current number of processes matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for maximum number of processes that a given {program name, owner} key value criteria crosses.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-47 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
%prog_max_process_count% processes are running program %prog_name% owned by [%owner%], crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Represents the maximum resident memory occupied by a single process matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for maximum value a given {program name, owner} key value criteria crosses.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-48 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
%prog_max_rss_pid% process running program %prog_name% is utilizing %prog_max_rss% (MB) of resident memory. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Fetches the current number of processes matching the {program name, owner} key value criteria. It can be used for setting warning or critical thresholds to monitor for minimum number of processes that a given {program name, owner} key value criteria should never go under.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-49 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
< |
Not Defined |
Not Defined |
3 |
%prog_max_process_count% processes are running program %prog_name% owned by [%owner%], fallen below warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Represents the total CPU time accumulated by all active process matching the {program name, owner} key value criteria.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-50 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
%prog_max_count% processes running program %prog_name% owned by [%owner%] have accumulated %prog_total_cpu_time% minutes of cpu time. This duration crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
Represents the percentage of CPU time utilized by all active process matching the {program name, owner} key value criteria since last collection.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-51 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 5 Minutes |
After Every Sample |
> |
Not Defined |
Not Defined |
3 |
%prog_max_count% processes running program %prog_name% owned by [%owner%] are utilizing %prog_total_cpu_util%%% cpu. This percentage crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Program Name" and "Owner" objects.
If warning or critical threshold values are currently set for any unique combination of "Program Name" and "Owner" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Program Name" and "Owner" objects, use the Edit Thresholds page.
The Remote Access Card metric monitors the status of the Remote Access Card.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, their descriptions, and data sources.
Table 3-52 Remote Access Card Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
DHCP Settings |
Determines whether the dynamic host configuration protocol (DHCP) was used to obtain the network interface card (NIC) information. |
remoteAccessNICCurrentInfoFromDHCP (1.3.6.1.4.1.674.10892.1.1700.10.1.33) |
Gateway Address |
Represents the IP address for the gateway currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware. |
remoteAccessNICCurrentGatewayAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.32) |
IP Address |
Provides the internet protocol (IP) address currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware |
remoteAccessNICCurrentIPAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.30) |
LAN Settings |
Represents the local area network (LAN) settings of the remote access hardware. |
remoteAccessLANSettings (1.3.6.1.4.1.674.10892.1.1700.10.1.15 |
Network Mask Address |
Represents the subnet mask currently being used by the onboard network interface card (NIC) provided by the remote access (RAC) hardware. |
remoteAccessNICCurrentNetmaskAddress (1.3.6.1.4.1.674.10892.1.1700.10.1.31) |
Product Name |
Represents the name of the product providing the remote access (RAC) functionality |
remoteAccessProductInfoName (1.3.6.1.4.1.674.10892.1.1700.10.1.7) |
Remote Access Card State |
Represents the state of the remote access (RAC) hardware. |
remoteAccessStateSettings (1.3.6.1.4.1.674.10892.1.1700.10.1.5) |
Remote Access Card Status |
||
Version |
Represents the version of the product providing the remote access (RAC) functionality. |
remoteAccessVersionInfoName (1.3.6.1.4.1.674.10892.1.1700.10.1.9) |
Represents the status of the remote access (RAC) hardware.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-53 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of Remote Access Card is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
SNMP MIB object: remoteAccessStatus (1.3.6.1.4.1.674.10892.1.1700.10.1.6)
The Storage Summary metrics collectively represent the summary of storage data on a host target. These metrics are derived from the various metrics collected and uploaded into the Oracle Management Repository by the Management Agent. They are computed every time the Management Agent populates the Management Repository with storage data. This collection is also triggered automatically whenever the user manually refreshes the host storage data from the Storage Details page.
These metrics are available on the Linux and Solaris hosts.
Note:
For target versions 3.0 and higher, the collection frequency for each metric is every 24 hours or when the user manually refreshes storage data from the Storage Details page.For more details on how these metrics are computed see the "About Storage Computation Formulas" topic in the Enterprise Manager online help. The online help also provides information about ASM, databases, disks, file systems, volumes, and storage details.
The following table lists the metrics and their descriptions.
Table 3-54 Storage Summary Metrics
Metric | Description |
---|---|
ASM Storage Allocated (GB) |
Total storage allocated to Oracle databases from Automatic Storage Management (ASM) instances on the host |
ASM Storage Metric Collection Errors |
Number of metric collection errors attributed to the storage related metrics of the Automatic Storage Management (ASM) targets on the host |
ASM Storage Overhead (GB) |
Storage overhead of Automatic Storage Management (ASM) targets on the host |
ASM Storage Unallocated (GB) |
Storage available in Automatic Storage Management (ASM) targets on the host for allocating to databases |
Databases Storage Free (GB) |
Total free storage available in the databases on the host |
Databases Storage Metric Collection Errors |
Metric collection errors of storage related metrics of databases on the host |
Databases Storage Used (GB) |
Total free storage available in the databases on the host |
Disk Storage Allocated (GB) |
Storage allocated from the total disk storage available on the host |
Disk Storage Unallocated (GB) |
Storage that is available for allocation in disks on the host. |
Host Storage Metric Collection Errors |
Total number of storage related metric collection errors of the host target |
Hosts Summarized |
The possible values for this metric are:
|
Local File Systems Storage Free (GB) |
Total free storage in all distinct local file systems on the host |
Local File Systems Storage Used (GB) |
Total used space in all distinct local file systems on the host |
Number of ASM Instances Summarized |
Total number of Automatic Storage Management (ASM) instances, the storage data of which was used in computing storage summary of this host |
Number of Databases Summarized |
Total number of databases, the storage data of which was used in computing storage summary of this host |
Other Mapping Errors |
Storage metric mapping issues on the host excluding the unmonitored server mapping errors |
Total Number of ASM Instances |
Total number of Automatic Storage Management (ASM) instances on the host |
Total Number of Databases |
Total number of databases on the host |
Total Storage Allocated (GB) |
Total storage allocated from the host-visible storage available on the host |
Total Storage Free (GB) |
Free storage available from the total allocated storage on the host |
Total Storage Overhead (GB) |
Overhead associated with storage on the host |
Total Storage Unallocated (GB) |
Total unallocated storage on the host |
Total Storage Used (GB) |
Total storage used in the file systems and databases on the host |
Unmonitored NFS Server Mapping Errors |
Total number of storage mapping issues that result from unmonitored Network File Systems (NFS) servers |
Volumes Storage Allocated (GB) |
Total storage allocated from the volumes available on the host |
Volumes Storage Overhead (GB) |
Storage overhead in the volumes on the host |
Volumes Storage Unallocated (GB) |
Storage available for allocation in the volumes on the host |
Writeable NFS Storage Free (GB) |
Total free space available in all distinct writeable NFS mounts on the host |
Writeable NFS Storage Used (GB) |
Storage used in all writeable NFS mounts on the host |
The Swap Area Status metric provides the status of the swap memory on the system.
The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | swap |
HP | swapinfo |
Linux | /proc/swaps |
HP Tru64 | swapon |
IBM AIX | lsps |
Windows | not available |
Represents the number of 1K blocks in swap area that is not allocated.
Metric Summary
The following table shows how often the metric's value is collected.
Target Version | Collection Frequency |
---|---|
All Versions | Every 24 Hours |
User Action
Check the swap usage using the UNIX top command or the Solaris swap -l command. Additional swap can be added to an existing file system by creating a swap file and then adding the file to the system swap pool. (See documentation for your UNIX OS). If swap is mounted on /tmp, space can be freed by removing any junk files in /tmp. If it is not possible to add file system swap or free up enough space, additional swap will have to be added by adding a raw disk partition to the swap pool. See UNIX documentation for procedures.
The Switch/Swap Activity metric displays the metric reports on the system switching and swapping activity.
Data Source
The data sources for this metric category, unless otherwise stated, include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | sar command |
HP Tru64 | not available |
IBM AIX | sar command |
Windows | not available |
The OS sar command is used to sample cumulative activity counters maintained by the OS. Also, the data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of processes swapped in over this five-second period divided by five.
Metrics and Descriptions
The following table lists the metrics and their descriptions.
Table 3-55 Switch/Swap Activity Metrics
Metric | Description |
---|---|
Process Context Switches (per second) |
Number of process context switches per second. Note: This metric is available on Solaris, HP, and IBM AIX. |
Swapins Transfers (per second) |
Number of 512-byte units transferred for swapins per second. Note: This metric is not available on HP Tru64. |
Swapout Transfers (per second) |
Number of 512-byte units transferred for swapouts per second. Note: This metric is not available on HP Tru64. |
System Swapins (per second) |
Number of process swapins per second. Note: This metric is not available on HP Tru64. |
System Swapouts (per second) |
Number of process swapouts per second. Note: This metric is not available on HP Tru64 |
The System BIOS (Basic Input/Output System) metric monitors the BIOS status for Dell Poweredge Linux systems.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, their descriptions, and data sources.
Table 3-56 System BIOS Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Manufacturer |
Manufacturer's name of the System BIOS (Basic Input/Output System |
systemBIOSManufacturerName (1.3.6.1.4.1.674.10892.1.300.50.1.11 |
Size |
Image size of the System BIOS (Basic Input/Output System) in kilobytes. A value of zero indicates that the size is unknown. |
systemBIOSSize (1.3.6.1.4.1.674.10892.1.300.50.1.6) |
System BIOS Status |
||
Version |
Version name of the System BIOS (Basic Input/Output System) |
systemBIOSVersionName (1.3.6.1.4.1.674.10892.1.300.50.1.8) |
Represents the status of the System BIOS (Basic Input/Output System) in this chassis.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-57 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Status of BIOS %BiosIndex% in chassis %ChassisIndex% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "System BIOS Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "System BIOS Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "System BIOS Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: systemBIOSStatus (1.3.6.1.4.1.674.10892.1.300.50.1.5)
The System Calls metric provides statistics about the system calls made over a five-second interval.
Data Source
The data sources for this metric category, unless otherwise stated, include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | not available |
HP Tru64 | table(() system call |
IBM AIX | sar command |
Windows | not available |
The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval. The results are essentially the number of system calls made over this period divided by the period.
Metrics and Descriptions
The following table lists the metrics and their descriptions.
Table 3-58 System Calls Metrics
Metric | Description |
---|---|
Characters Transferred by Read System Calls (per second) |
Number of characters transferred by read system calls (block devices only) per second |
Characters Transferred by Write System Calls (per second) |
Number of characters transferred by write system calls (block devices only) per second |
exec() System Calls (per second) |
Number of exec() system calls made per second |
fork() System Calls (per second) |
Number of fork() system calls made per second |
read() System Calls (per second) |
Number of read() system calls made per second |
System Calls (per second) |
Number of system calls made per second. This includes system calls of all types. |
write() System Calls (per second) |
Number of write() system calls made per second |
The Temperature metric monitors the hotness or coldness of the temperature probe.
This metric is available only on Dell Poweredge Linux Systems.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The following table lists the metrics, their descriptions, and user actions.
Table 3-59 Temperature Metrics
Metric | Description | Data Source (SNMP MIB Object) |
---|---|---|
Current Temperature |
Current reading of the temperature probe. The value is representing temperature in tenths of degrees Centigrade |
temperatureProbeReading (1.3.6.1.4.1.674.10892.1.700.20.1.6) |
Location |
Description of the location name of the temperature probe. Examples of values are: "CPU Temp" and "System Temp". |
temperatureProbeLocationName (1.3.6.1.4.1.674.10892.1.700.20.1.8) |
Temperature Probe Status |
Represents the status of the temperature probe.
This metric is available only on Dell Poweredge Linux Systems.
The following table lists the possible values for this metric and their meaning.
Metric Value | Meaning (per SNMP MIB) |
---|---|
1 | Other (not one of the following) |
2 | Unknown |
3 | Normal |
4 | Warning |
5 | Critical |
6 | Non-Recoverable |
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-60 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
Not Uploaded |
>= |
4 |
5 |
1 |
Temperature at probe %ProbeIndex% in chassis %ChassisIndex% is %TemperatureReading% (C). Status is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Chassis Index" and "Temperature Probe Index" objects.
If warning or critical threshold values are currently set for any unique combination of "Chassis Index" and "Temperature Probe Index" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Chassis Index" and "Temperature Probe Index" objects, use the Edit Thresholds page.
Data Source
SNMP MIB object: temperatureProbeStatus (1.3.6.1.4.1.674.10892.1.700.20.1.5)
The Top Processes metric is a listing of (up to) 20 processes that include 10 processes consuming the largest percentage of memory and 10 processes consuming the most percentage of CPU time. The processes are listed in the order of memory consumption.
The data sources for this metric category include the following:
Host | Data Source |
---|---|
Solaris | ps command |
HP | ps command |
Linux | ps command |
HP Tru64 | ps command |
IBM AIX | ps command |
Windows | performance data counters |
The following table lists the metrics and descriptions.
Table 3-61 Top Processes Metrics
Metric | Description |
---|---|
Command and Arguments |
Command and all its arguments |
CPU Time for Top Processes |
CPU utilization time in seconds |
CPU Utilization for Top Processes (%) |
Percentage of CPU time consumed by the process. For UNIX-based platforms, check the load on the system using the UNIX uptime or top commands. Also, check for processes using too much CPU time by using the top and ps -ef commands. Note that the issue may be a large number of instances of one or more processes, rather than a few processes each taking up a large amount of CPU time. Kill processes using excessive CPU time. |
Memory Utilization for Top Processes (%) |
Percentage of memory consumed by the process |
Physical Memory Utilization (KB) |
Number of kilobytes of physical memory being used. For Solaris and IBM AIX hosts, the data source is kernel memory structure (struct vminfo). |
Process User ID |
User name that owns the process, that is, the user ID of the process being reported on. For the Windows host, the data source is the Windows API. |
Virtual Memory Utilization (KB) |
Total size of the process in virtual memory in kilobytes (KB). For the Windows host, the data source is the Windows API. |
This metric reports tty device activity.
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | sar command |
HP | sar command |
Linux | not available |
HP Tru64 | table() system call |
IBM AIX | sar command |
Windows | not available |
The OS sar command is used to sample cumulative activity counters maintained by the OS. The data is obtained by sampling system counters once in a five-second interval.
The following tables lists the metrics and their descriptions.
Table 3-62 TTY Activity Metrics
Metric | Description |
---|---|
Incoming Character Interrupts (per second) |
Number of received incoming character interrupts per second |
Input Characters Processed by canon() |
Input characters processed by canon() per second |
Modem Interrupt Rate (per second) |
Modem interrupt rate |
Outgoing Character Interrupts (per second) |
Number of transmit outgoing character interrupts per second |
TTY Output Characters (per second) |
Number of output characters per second |
TTY Raw Input (chars/s) |
Raw input characters per second |
The UDM metric allows you to execute your own scripts. The data returned by these scripts can be compared against thresholds and generate severity alerts similar to alerts in predefined metrics. UDM is similar to the Oracle9i Management Agent's UDE functionality.
The data source for these metrics is the User Defined Script.
The following table lists the metrics and their descriptions.
Table 3-63 User Defined Metrics
Metric | Description |
---|---|
User Defined Numeric Metric |
Contains a value if the value type is NUMBER. Otherwise, the value is "", if the value is STRING. |
User Defined String Metric |
Contains a value if the value type is STRING. Otherwise, the value is "", if the value is NUMBER. |
The Users metric provides information about the users currently on the system being monitored.
Represents the number of times a user with a certain user name is logged on to the host target.
Data Source
For Solaris, HP, Linux, HP Tru64, and IBM AIX, the number of times a user is logged on is obtained from the OS w command.
For Windows, the source of information is Windows API.
The purpose of this metric is to collect those entries from all available Windows NT event log files whose type is either Error or Warning. A critical or a warning alert is raised only for System and Security Event log file entries.
Note: Since log files continue to grow, this metric outputs log events which had been written to the log file after the last collection time, that is, only those records are written out whose timeGenerated (time when the event was generated) is after the last collection time until the last record of the log file. If this metric is collected for the first time, only the events generated on the current date are outputted.
This metric is available only on Windows.
Note:
For all target versions, the collection frequency for each metric is every 15 minutes.The data source for these metrics is WMI Operating System Classes.
The following table lists the metrics and their descriptions.
Table 3-64 Windows Events Log Metrics
Metric | Description |
---|---|
Category |
Subcategory for this event. This subcategory is source-specific. |
Date-Time |
Date and time when the Source generated the event. |
Description |
Event message as it appears in the Windows event log. |
Event ID |
Identifier of the event |
Log Name |
Name of the Windows event log file |
Record Number |
Identifies the event within the Windows event log file |
Source |
Name of the source (application, service, driver, subsystem) that generated the entry |
User |
Name of the logged-on user when the event occurred. If the user name cannot be determined, the user name is NULL. |
Windows Event Severity |
The seriousness of the event. Possible values are: Warning and Error.
This metric is available only on Windows.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-65 Metric Summary Table
Target Version | Key | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|---|
All Versions |
logfile: "system" |
Every 15 Minutes |
After Every Sample |
= |
warning |
error |
1* |
X1User[%user%]:Category[%categorystring%]:Description[%message%] |
* Once an alert is triggered for this metric, it must be manually cleared.
Multiple Thresholds
For this metric you can set different warning and critical threshold values for each unique combination of "Log Name", "Source", and "Event ID" objects.
If warning or critical threshold values are currently set for any unique combination of "Log Name", "Source", and "Event ID" objects, those thresholds can be viewed on the Metric Detail page for this metric.
To specify or change warning or critical threshold values for each unique combination of "Log Name", "Source", and "Event ID" objects, use the Edit Thresholds page.
Data Source
WMI Operating System Classes
The Zombie Processes metric monitors the orphaned processes in the different variations of UNIX systems.
Represents the percentage of all processes running on the system that are currently in zombie state.
Metric Summary
The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.
Table 3-66 Metric Summary Table
Target Version | Evaluation and Collection Frequency | Upload Frequency | Operator | Default Warning Threshold | Default Critical Threshold | Consecutive Number of Occurrences Preceding Notification | Alert Text |
---|---|---|---|---|---|---|---|
All Versions |
Every 15 Minutes |
After Every 60 Samples |
> |
35 |
50 |
1 |
%value%%% of all processes are in zombie state, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data sources for this metric include the following:
Host | Data Source |
---|---|
Solaris | ps command |
HP | ps command |
Linux | ps command |
HP Tru64 | not available |
IBM AIX | not available |
Windows | not available |