Sunday, September 13, 2009

Troubleshooting Grid Control/ Grid Log files

Troubleshooting Grid Control/ Grid Log files
http://oracledbasupport.co.uk

When troubleshooting problems in the Grid Control framework, it is important to remember that Grid Control is a J2EE application deployed to the Oracle Application Server 10g J2EE and Web Cache installation using an Oracle database as a repository.

When faced with a problem, first localize the problem to the affected component

1. Troubleshooting the OMR
2. Troubleshooting the OMS
3. Troubleshooting the OMA










1. Troubleshooting the OMR ( Repository/Database )

Connectivity: Access to the OMR is through the database listener. Please locate listener log file at ORACLE_HOME/network/log. The log file contains record of every connection and connection request received by the listener. Listener errors are recorded in the log file with a TNS-xxxxx format. If necessary, enable tracing for listener to record more diagnostic information. Trace files for the listener can be found in ORACLE_HOME/network/trace

Availability: If OMR database is unavailable for some other reason, check database alert log and trace files to determine the root cause.

Space usage: Space problems may occur in the OMR database if the OMR tablespaces are unable to accept new information due to lack of free space in the data files.

Performance: OMR database performance problems will normally trigger Grid Control alerts as metric threshold values are crossed.

2. Troubleshooting the OMS ( Management Service )

OMS log files are produced for – 1> Oracle Web Cache, 2> Oracle HTTP Server, 3> Oracle Application Server Containers for J2EE, and 4> the Oracle Process Monitor and Notification subcomponents.

1> Oracle Web Cache Log files may be found in ORACLE_HOME/webcache/logs. There are two key log files for Web Cache: the access_log (which records every connection to the Web Cache) and the event_log (which contains data about Web Cache availability and errors).

2> The Oracle HTTP Server provides the access_log and error_log in ORACLE_HOME/Apache/Apache/logs. As with Web Cache, the access_log can grow very large and is therefore “rotated” every 12 hours. Older logs are preserved in ORACLE_HOME/Apache/Apache/logs.

3> Oracle Application Server Containers for J2EE for generated log files for Grid Control may be found in ORACLE_HOME/j2ee/OC4J_EM/log/OC4J_EM_default_island_1. OC4J generates several log files that provide diagnostic information, including:

default-web-access.log: Contains information about each request received by the component. Information includes the IP address of the requester, date and time of the request, the URL that was specified in the request, and the result code. All requests should come from the OHS; all result codes should indicate success (200). This information is valuable when troubleshooting connection difficulties between the OHS and OC4J.

em-application.log: Contains information about all events, errors, and exceptions associated with the EM application. This is excellent information for troubleshooting Java errors.

global-application.log: Contains information about events, errors, and exceptions relating to the OC4J JVM that are not specific to the EM application. This is also a good source of information for troubleshooting Java errors.

server.log: Includes availability information for the OC4J_EM component, including start and stop times.

4> The Oracle Process Monitor and Notification (OPMN) system provides logs for each of the OMS components in ORACLE_HOME/opmn/logs. Key log files include:

HTTP_Server: In cases where the OHS will not start, this log file often contains pertinent error messages that can help diagnose the problem.

OC4J~OC4J_EM~default_island~1: Contains any errors received while starting the OC4J_EM component. This can be helpful in troubleshooting Java errors and global configuration problems.

3. Troubleshooting the OMA ( Management Agent )

Connectivity between the OMA and OMS: When a single management agent is unable to connect to the OMS, the problem will normally be found on the OMA’s server. If multiple agents are unable to connect, the problem may lie with the OMS or underlying network. Check AGENT_HOME/sysman/config/emd.properties and verify that the repository URL is correct.

Ensure you can ping the host identified as the repository URL. attempt to telnet to the OMS host.

Upload throughput as the OMA reports information on metric targets through the OMS to the OMR: OMA logs may be found in AGENT_HOME/sysman/log. OMA upload errors are recorded in AGENT_HOME/sysman/log/emdctl.trc.

Target discovery as new targets are added to a server: Use emctl config listtargets or check AGENT_HOME/sysman/emd/targets.xml to determine which targets are monitored by the agent. Remember to make a backup copy of the targets.xml file prior to any modifications. Errors with target discovery will be reported in AGENT_HOME/sysman/log/agentca.log.


Oracle Enterprise Manager 10g Grid Control comprises three major components:
Oracle Management Repository (OMR)
Oracle Management Service (OMS)
Oracle Management Agent (OMA)

grid components

Oracle Enterprise Manager 10g Grid Control comprises three major components:

  • Oracle Management Repository (OMR)
  • Oracle Management Service (OMS)
  • Oracle Management Agent (OMA)
Key configuration files for the OMA include:
./sysman/emd/targets.xml
./sysman/config/emd.properties
./sysman/config/emagentlogging.properties

grid components

To start the Grid Control framework, do the following:
1. Start the OMR database listener
2. Start the OMR database
3. Start all OMSs
4. Start the OMA on the OMS/OMR server
5. Start the OMA on managed servers

To stop the Grid Control framework:
1. Stop the OMA on managed servers (optional).
2. Stop the OMA on the OMS/OMR server.
3. Stop all OMSs.
4. Stop the OMR database.
5. Stop the OMR database listener.

Oracle Management Agent : Must be installed on each managed host, Must be in its own ORACLE_HOME , Communicates with the OMS via HTTP or HTTPS

Oracle Management Service : Includes member components:, Oracle HTTP Server (OHS), Oracle Application Server Containers for J2EE (OC4J), OracleAS Web Cache, Is a J2EE application deployed on Oracle Application Server 10g, Connects to the OMR by using Java Database Connectivity

Oracle Management Repository : Resides in an Oracle database, Includes schema objects belonging to SYSMAN, Can be installed in a preexisting database, Can be configured to contain other management data, including:, Oracle Application Server infrastructure database, Oracle Recovery Manager catalog

Oracle Process Monitor and Notification Control utility

$ opmnctl startall
$ opmnctl stopall
$ opmnctl startproc ias-component=OC4J
$ opmnctl stopproc process-type=OC4J_EM
$ opmnctl status -l

Enterprise Manager Control Utility (emctl)
$ emctl start oms
$ emctl stop oms
$ emctl status oms
$ emctl start iasconsole
$ emctl stop iasconsole

Distributed Configuration Manager Control (dcmctl)
$ dcmctl start
$ dcmctl start –ct WebCache
$ dcmctl stop
$ dcmctl getstate
$ dcmctl listcomponents

Example Output of Commands

$ /opt/oracle/product/oms10g/opmn/bin/opmnctl status -l

Processes in Instance: EnterpriseManager0.test
-------------------+--------------------+---------+----------+------------+----------+-----------+------
ias-component | process-type | pid | status | uid | memused | uptime | ports
-------------------+--------------------+---------+----------+------------+----------+-----------+------
DSA | DSA | N/A | Down | N/A | N/A | N/A | N/A
LogLoader | logloaderd | N/A | Down | N/A | N/A | N/A | N/A
HTTP_Server | HTTP_Server | 749 | Alive | 1325924656 | 194208 | 123:54:07 | http1:7778,http2:7200,https1:4444,https2:1159,http3:4889
dcm-daemon | dcm-daemon | 629 | Alive | 1325924655 | 24956 | 123:54:39 | N/A
OC4J | home | 23906 | Alive | 1325924658 | 33148 | 121:24:21 | ajp:12502,rmi:12402,jms:12602
OC4J | OC4J_EMPROV | 23907 | Alive | 1325924659 | 57724 | 121:24:21 | ajp:12503,rmi:12403,jms:12603
OC4J | OC4J_EM | 12150 | Alive | 1325924667 | 242044 | 1193:01:~ | ajp:12501,rmi:12401,jms:12601
WebCache | WebCache | 23908 | Alive | 1325924660 | 106924 | 121:24:21 | http:7777,invalidation:9401,statistics:9402
WebCache | WebCacheAdmin | 23909 | Alive | 1325924661 | 15652 | 121:24:21 | administration:9400
$ /opt/oracle/product/oms10g/opmn/bin/opmnctl status

Processes in Instance: EnterpriseManager0.test
-------------------+--------------------+---------+---------
ias-component | process-type | pid | status
-------------------+--------------------+---------+---------
DSA | DSA | N/A | Down
LogLoader | logloaderd | N/A | Down
HTTP_Server | HTTP_Server | 749 | Alive
dcm-daemon | dcm-daemon | 629 | Alive
OC4J | home | 23906 | Alive
OC4J | OC4J_EMPROV | 23907 | Alive
OC4J | OC4J_EM | 12150 | Alive
WebCache | WebCache | 23908 | Alive
WebCache | WebCacheAdmin | 23909 | Alive

To Start and stop individual ias components use : opmnctl startproc ias-component=OC4J (See IAS-Component Above)

To Start and stop individual processes use : opmnctl startproc process-type=OC4J_EMPROV (See Process-Type Above)

$ /opt/oracle/product/oms10g/opmn/bin/opmnctl stopall
opmnctl: stopping opmn and all managed processes...
================================================================================
opmn id=test:6201
5 of 6 processes stopped. ias-instance id=EnterpriseManager0.test
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--------------------------------------------------------------------------------
ias-component/process-type/process-set:
OC4J/OC4J_EM/default_island

Error
--> Process (pid=12150)
time out while waiting for a managed process to stop
Log:
/opt/oracle/product/oms10g/opmn/logs/OC4J~OC4J_EM~default_island~1

opmnctl: graceful stop of processes failed, trying forceful shutdown...

$ /opt/oracle/product/oms10g/opmn/bin/opmnctl status
Unable to connect to opmn.
Opmn may not be up.


Connectivity Log File Location
prereq.log $OMS_HOME/sysman/prov/agentpush//prereqs/local
prereq.out $OMS_HOME/sysman/prov/agentpush//prereqs/local
prereq.err $OMS_HOME/sysman/prov/agentpush//prereqs/local

Prerequisite Log File Location
prereq.log $OMS_HOME/sysman/prov/agentpush//prereqs/
prereq.out $OMS_HOME/sysman/prov/agentpush//prereqs/
prereq.err $OMS_HOME/sysman/prov/agentpush//prereqs/

Logs Location
EMAgentPush.log /sysman/prov/agentpush/logs/ : :Agent Deploy application logs.
remoteInterfaces.log /sysman/prov/agentpush/logs/ :Logs of the remote interfaces layer
install.log/.err /sysman/prov/agentpush//logs// :Log or error of the new agent installation or new cluster agent installation
upgrade.log/.err /sysman/prov/agentpush//logs// : Log or error of the upgrade operation using Agent Deploy
nfsinstall.log/err /sysman/prov/agentpush//logs//: Log or error of the agent installation using the Shared Agent Home option in Agent Deploy.
clusterUpgrade.log/err /sysman/prov/agentpush//logs//:Log or error of the cluster upgrade operation using Agent Deploy
sharedClusterUpgradeConfig.log/err /sysman/prov/agentpush//logs//:Log or error of the config operation in case of upgrade on a shared cluster
config.log/err /sysman/prov/agentpush//logs//:Log or error of the configuration of shared cluster in case of an agent installation on a shared cluster.
preinstallscript.log/.err /sysman/prov/agentpush//logs//:Log/error of the running of preinstallation script, if specified






  1. If you have a working EM grid console configuration, you can TAR a client configuration and clone it at other nodes.
  2. Tar the working agent at node

  3. $ cd /opt/oracle/product

    $ ls -l
    total 8
    drwxr-xr-x 65 oracle9 oinstall 2048 Oct 10 2006 9.2.0
    drwxrwx--- 3 oracle9 oinstall 96 Nov 19 13:29 agent_10g

    $ tar –cvf agent.tar agent_10g

  4. Set the ORACLE_HOME to /opt/oracle/product/agent10g< ?xml:namespace prefix = o />

$export ORACLE_HOME=/opt/oracle/product/agent10g

4. Run the installer as shown below -

$cd $ORACLE_HOME/oui/bin Or

$cd /opt/oracle/product/agent10g/oui/bin

$./runInstaller -clone -forceClone ORACLE_HOME=$ORACLE_HOME ORACLE_HOME_NAME=agent10g -noconfig –silent

5. Run the root.sh as root user

/opt/oracle/product/agent10g/agent10g/root.sh

6. Amend the REPOSITORY_URL and emdWalletSrcUrl parameters in the /sysman/config/emp.properties configuration file to make the agent point to right grid server

7. Run the agent configuration assistant

$cd $ORACLE_HOME/bin or

$cd /opt/oracle/product/agent10g/bin

$./agentca -f

8. Secure the agent

$./emctl secure agent

9. Start the agent, if not already

$./emctl start agent


1 comment:

Unknown said...

I think that various data corruption incidents may be also resolved by the sql server recover database utility