Thursday, January 14, 2010

Adding an Oracle home to an agent inventory

When an Oracle inventory is saved in a non-standard location, the Oracle Grid Control agent can be unable to enumerate the software that is in that Oracle Home. This is true even when it can find the home itself.
In OEM, you’ll run into an error like that below when you click on the home in the Targets list:

Error Could not find Oracle Home <ORACLE_HOME> in the inventory collected for <hostname>

We’ll use TESTSRV2 as a troubleshooting example.

Locate the oraInst.loc file

The oraInst.loc file contains the inventory for all of the Oracle software. Normally, Oracle maintains a single copy of this file, but when one is saved in a non-standard directory, it can get left out.

Monday, January 4, 2010

Oracle Database loses its OEM configuration after a Cold Backup

This was an annoying problem that took awhile to track down. In short: after our scheduled cold backups, an Oracle (11g) database would lose its configuration in Oracle Enterprise Manager. It would present a "Metric Collection Error" that would go away after reconfiguring the database. The fix, as it turned out, was pretty simple, but it took awhile to tease out. The problem was that the trace directory (bdump in 10g) was too full. Specifically, the metric collection (the process by which OEM gathers data about the database) was timing out. We ruled out performance problems on the database side; the system is not utilized much at all. Instead, we discovered that because there were a lot of files in the trace directory (> 31k), it was taking a long time for the OEM agent to get to the alert log, which is one of the metrics that it collects. This was hinted at in the emagent.trc file:
2010-01-04 13:01:53,087 Thread-47647632 ERROR TargetManager: TIMEOUT reached in computing dynamic properties for target TESTDB,
oracle_database::compute timings were [decideIncludeDB:0-0] [SystemTablespaceNumber:0-0] [SysauxTablespaceNumber:0-0 ...
[DeduceAlertLogFile:1-1] [GetCPUCount:1-1] [EnabledFeatures:1-1] [GetOSMInstance:1-1] [GetNLSParam:1-1] [GetAdrBase:1-(1)]
So you can see above that one of the things it was trying to do was get at the alert log. It took a long time to enumerate all of the small files in the trace directory, so we shut down the instance, cleared out the trace directory, and restarted the instance. That took care of the problem. In troubleshooting this problem, we also increased the dynamic properties timeout setting (dynamicPropsComputeTimeout_oracle_database) in the emd.properties file (in [agent_home]/sysman/config), changing the value from the default (120) to a larger setting (240). That did not help, though it's a good troubleshooting step, should you run into a similar problem.