Resource usage check on HPI

In order to monitor the resource usage of critical Grid processes, make following file res_monitor.cfg, and run command “glance -aos res_monitor.cfg -j 3600”, then information will be collected every 1hr.
###### res_monitor.cfg ######
headersprinted=0
if headersprinted == 0 then {
print “====== time ====== PROC PID VSZ RSS CPU”
headersprinted = 1
}
process loop {
if ((proc_proc_name==”crsd.bin”)
or (proc_proc_name==”ohasd.bin”)
or (proc_proc_name==”oraagent.bin”)
or (proc_proc_name==”ocssd.bin”)
or (proc_proc_name==”gipcd.bin”)
or (proc_proc_name==”gpnpd.bin”)
or (proc_proc_name==”mdnsd.bin”)
or (proc_proc_name==”gnsd.bin”)
or (proc_proc_name==”cssdagent”)
or (proc_proc_name==”evmd.bin”)
or (proc_proc_name==”evmlogger.info”)
or (proc_proc_name==”orarootagent.bin”)
or (proc_proc_name==”octssd.bin”)
or (proc_proc_name==”cssdmonitor”)
or (proc_proc_name==”diskmon.bin”))
then {
print GBL_STATDATE,”|”,GBL_STATTIME,”|”,proc_proc_name,”|”,proc_proc_id, “|”, proc_mem_virt, “|”, proc_mem_res,”|”,proc_cpu_last_use
d,”|”
}
}

Posted in Daily work | Leave a comment

EM 12c configuration records

OMS server is Linux, install and configure agent on Windows.
1.Add the patch to support the platform.

2.Setup->Add Target->Add Targets Manually->Add Host Targets
Because the OMS default use commands under c:/cygwin/bin so I changed the following file to use commands in c:/mksnt/mksnt or
c:/mksnt/bin.
/opt/ora/middleware/oms/oui/prov/resources/ssPaths_msplats.properties

If the configuration failed, then on the agent node, delete the old service and run the command on agent manually.
(1)sc delete Oracleagent12c1Agent
(2)c:/app/agent12c/ADATMP_2014-03-02_17-45-13-PM/agentDeploy.bat -ignorePrereqs ORACLE_HOSTNAME=******
AGENT_BASE_DIR=c:/app/agent12c OMS_HOST=****** EM_UPLOAD_PORT=4900 AGENT_INSTANCE_HOME=c:/app/agent12c/agent_inst
b_doDiscovery=false b_startAgent=false b_forceInstCheck=true AGENT_PORT=3872 AGENT_REGISTRATION_PASSWORD=******
(3)c:/app/agent12c/core/12.1.0.3/bin/emctl start agent

3.Setup->Add Target->Auto Discovery Results
Select the agent and “Prompt”.

4.Then I found that only agent online in OMS but no hosts and others. Then on agent node, run following command and refresh the
interface.
C:\app\agent12c\core\12.1.0.3.0\bin>emctl config agent addinternaltargets

5.Setup->Extensibility->Plug-ins
Databases->Oracle Database(Right click)->Deploy On->Management Agent
Note: This step Seems can automatically completed when add cluster.

6.Add the cluster.
Set->Add Target->Add Targets Manually->Add Targets Using Guided Process(Also Adds Related Targets)
Choose “Target Types” as “Oracle Cluster and High Availability Service”.

7.Add database instances.
Choose “Target Types” as “Oracle Database,Listener and Automatic Storage Management”.

8.Enable QoS.
(1)Change the password for “qosadmin”
On agent node:
srvctl stop oc4j
qosctl qosadmin -setpasswd qosadmin
srvctl start oc4j
(2)Enable QoS for cluster.
Cluster name->Administration->Quality of Service Management->Create Policy Set
(3)Enable Qos for database.

9.If you add new services or databases, add performance classes in QoS of the cluster.

Some problems and solutions:
SCENARIO 1:
C:\app\agent_12c\core\12.1.0.3.0\bin>emctl pingOMS
Oracle Enterprise Manager Cloud Control 12c Release 3
Copyright (c) 1996, 2013 Oracle Corporation. All rights reserved.
—————————————————————
EMD pingOMS error: OMS sent an invalid response: “ERROR- Failed to update Target type Metadata”

In C:/app/agent_12c/agent_inst/sysman/log/gcagent.log:
2014-02-28 02:02:04,173 [109:9255CBDA] INFO – attempting initial heartbeat
2014-02-28 02:02:04,219 [109:9255CBDA] WARN – improper ping interval (EM_PING_NOTIF_RESPONSE: ERROR- Failed to update Target type
Metadata)
2014-02-28 02:02:04,221 [109:9255CBDA] WARN – Ping protocol error

In /opt/ora/gc_inst/em/EMGC_OMS1/sysman/log/emoms_pbs.log
2014-02-27 19:44:31,182 [[ACTIVE] ExecuteThread: ’12’ for queue: ‘weblogic.kernel.Default (self-tuning)’] ERROR gcloader.Receiver
logp.251 – Error occurred while accessing to repository
java.sql.SQLException: ORA-20238: Target ***:3872:oracle_emd is currently in the process of being deleted
ORA-06512: at “SYSMAN.EM_TARGET”, line 3049
ORA-06512: at “SYSMAN.EM_TARGET”, line 3317
ORA-06512: at “SYSMAN.EM_TARGET”, line 3992
ORA-06512: at line 1

OPERATION:
$ ./emcli login -username=SYSMAN
$ ./emcli get_targets -target=’***:3872:oracle_emd’
$ ./emcli delete_target -name=’***:3872″ -type=”oracle_emd’ -delte_monitored_targets -async
$ ./emcli sync
$ ./emcli logout

SCENARIO 2:
The agent appears KEY MISMATCH.
In emoms_pbs.log:
2014-02-27 00:17:44,971 [[ACTIVE] ExecuteThread: ’23’ for queue: ‘weblogic.kernel.Default (self-tuning)’] ERROR
receiver.AbstractOMSHandshake logp.251 – OMSHandshake failed.(AGENT URL = https://***:3872/emd/main/)(ERROR =
KEY_MISMATCH as Last AgentURL is null and the Agent Key in repos is not same as that sent by the Agent)

In C:/app/agent_12c/agent_inst/sysman/log/gcagent.log:
2014-02-27 02:05:06,446 [148:HTTP Listener-148 – /emd/lifecycle/main/] ERROR – unspecified request header Client-Type
oracle.sysman.gcagent.comm.ProtocolException: unspecified request header Client-Type

OPERATION:
emctl stop agent
emctl unsecure agent (maybe not needed)
emctl secure agent

SCENARIO 3:
In emdctlj.log:
2014-03-06 00:35:30,598 [main] INFO – unable to connect to http server at https://***:3872/emd/lifecycle/mai
n/. [peer not authenticated]
oracle.sysman.emSDK.agent.comm.exception.VerifyConnectionException: unable to connect to http server at https://***:3872/emd/lifecycle/main/. [peer not authenticated]
at oracle.sysman.gcagent.comm.http.ClientConnection.verifySecureConnection(ClientConnection.java:858)
at oracle.sysman.gcagent.comm.http.ClientConnection.makeConnection(ClientConnection.java:829)
at oracle.sysman.gcagent.comm.oms.http.TMClientConnection.<init>(TMClientConnection.java:88)
at oracle.sysman.gcagent.comm.oms.http.HTTPClientTerminus.connect(HTTPClientTerminus.java:239)
at oracle.sysman.gcagent.oms.TMRemoteClientFactory.getCommunicatingClient(TMRemoteClientFactory.java:162)
at oracle.sysman.gcagent.util.clients.emdctlj.EmdCtlClientFactory.createEmdCtlClient(EmdCtlClientFactory.java:55)
at oracle.sysman.gcagent.clients.emdctlj.commands.EmdCtlAgentStatusCommand.execute(EmdCtlAgentStatusCommand.java:98)
at oracle.sysman.gcagent.clients.emdctlj.EmdCtlParsedCommandHandler.executeCommand(EmdCtlParsedCommandHandler.java:88)
at oracle.sysman.gcagent.clients.emdctlj.EmdCtl.parseAndExecute(EmdCtl.java:160)
at oracle.sysman.gcagent.clients.emdctlj.EmdCtl.main(EmdCtl.java:399)
Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at com.sun.net.ssl.internal.ssl.SSLSessionImpl.getPeerCertificateChain(SSLSessionImpl.java:401)
at oracle.sysman.gcagent.comm.http.BaseHttpsContext.verifyConnection(BaseHttpsContext.java:119)
at oracle.sysman.gcagent.comm.http.ClientConnection.verifySecureConnection(ClientConnection.java:854)
… 9 more

In emoms_pbs.log:
2014-03-06 00:49:28,887 [Job Critical Pool:JobWorker Step 14458741] WARN emSDK.comm setInstanceReadTimeout.10608 – unable to set
the instance read timeout: unable to connect to http server at https://***:3872/emd/main/. [peer not
authenticated]

In gcagent.log:
2014-03-06 00:40:55,826 [224:980F9148] INFO – attempting another heartbeat
2014-03-06 00:41:54,388 [230:GC.SysExecutor.3 (AgentSystemMonitorTask)] ERROR – 313.035 seconds have passed without any
incoming (status) requests; the HTTP listener is not functioning properly
2014-03-06 00:41:54,388 [230:EA402D23:GC.SysExecutor.3 (AgentSystemMonitorTask)] INFO – agent status is being changed to
ABORT_AGENT

OPERATION:
(1)emctl stop agent
(2)Edit the file emd.properties and change: AgentListenOnAllNICs=TRUE to AgentListenOnAllNICs=FALSE
This will start the Agent so that the HTTP listener listens only on the hostname and port mapping the Agent hostname.
(3)emctl start agent
(4)Resync the agent on OMS.

REFERENCE:
EM12c Cloud Console Agent: Agent Heartbeat Status “OMS is unreachable bad response” “ORA 20500 Multiple agents detected for” (Doc
ID 1586918.1)
EM 12c Emctl Start Agent Fails With Error :HTTP Listener failed at Startup (Doc ID 1513571.1)

Posted in Oracle | Leave a comment

Remove gns and reconfigure SCAN/vip

BACKGROUND:
Remove gns and reconfig vip,scan vip in 11R2

OPERATION:
1.srvctl stop gns
2.srvctl remove gns -f
3.srvctl modify netowrk -w mixed
4.srvctl stop listener -n node1/2
srvctl stop vip -n node1/2
5.srvctl modify nodeapps -n node1 -A 10.211.163.80/255.255.240.0/lan0
srvctl modify nodeapps -n node2 -A 10.211.163.81/255.255.240.0/lan0
6.srvctl modify network -w static
7.srvctl start vip -n node1/2
8.srvctl stop scan_listener
srvctl stop scan
9.srvctl modify scan -n node12-scn
srvctl start scan
srvctl start scan_listener
Error as below:
ora.LISTENER_SCAN1.lsnr
1 ONLINE INTERMEDIATE node1 Not All Endpoints Registered
ora.LISTENER_SCAN2.lsnr
1 ONLINE INTERMEDIATE node1 Not All Endpoints Registered
ora.LISTENER_SCAN3.lsnr
1 ONLINE INTERMEDIATE node2 Not All Endpoints Registered

In crsd oraagent log:
2013-08-16 03:14:07.022: [ora.LISTENER_SCAN3.lsnr][89]{1:57738:175} [check] LsnrAgent::isDynamicEndpointOK can’t locate endpoint (ADDRESS=(PROTOCOL=TCP)(HOST=144.25.79.251)(PORT=1521))
==> We can see the scan listener still register on the old scan vip
2013-08-16 03:14:07.022: [ora.LISTENER_SCAN3.lsnr][89]{1:57738:175} [check] clsnUtils::error Exception type=2 string=CRS-5020: Not all endpoints are registered for listener LISTENER_SCAN3

In the output of scan vip resource attributes:
$ crsctl stat res ora.scan3.vip -p | grep GEN_USR_ORA_VIP
GEN_USR_ORA_VIP=144.25.79.251

# crsctl modify res ora.scan2.vip -attr “GEN_USR_ORA_VIP=””
$ srvctl stop scan_listener
$ srvctl start scan_listener

NOTE:
1.Make an OCR backup before these operations. (ocrconfig -manualbackup)
2.If there are databases, need to modify the intilization parameter “REMOTE_LISTENER”.
REFERENCE:How To Convert an 11gR2 GNS Configuration To A Standard Configuration Using DNS Only (Doc ID 1489121.1)

Posted in Apache, Oracle | Leave a comment

Block multicast packets by using IPFilter

BACKGROUND:
OS: HP-UX B.11.31 U ia64
To disable multicast on private NIC. On linux we can use iptables to block all multicast packets.
/sbin/iptables -A OUTPUT -m pkttype –pkt-type multicast -o eth1 -j DROP
/sbin/iptables -A INPUT -m pkttype –pkt-type multicast -i eth1 -j DROP

OPERATION:
1. Enable IPFilter.
Change the setting in /etc/rc.config.d/ipfconf:
# Load the ipfilter module ?
# 1 = Start, 0 = Do not start
#
IPF_START=1
Restart ipf:
# /sbin/init.d/ipfboot stop
# /sbin/init.d/ipfboot start
To check whether the module has been loaded. If not then need to reboot the node.
# ipfstat -ioh
IPFilter is enabled but not filtering, module is not present in stack
# reboot

2.Configure the rules in ipf config file. From /etc/rc.config.d/ipfconf we can see the packet filtering configuration file is in /etc/opt/ipf/ipf.conf.
IPF_CONFDIR=/etc/opt/ipf
#
# Packet filtering configuration file
#
IPF_CONF=${IPF_CONFDIR}/ipf.conf

Change the ipf.conf as following to block all multicast packets on the private NIC:
block in quick on lan1 from 230.0.1.0 to any
block out quick on lan1 from any to 230.0.1.0
block in quick on lan1 from 224.0.0.251 to any
block out quick on lan1 from any to 224.0.0.251

3.To load the ruleset file.
Flushe all the internal rules tables:
# ipf -Fa
Enable the rule table:
# ipf -f /etc/opt/ipf/ipf.conf
Check the packet filtering status:
# ipfstat -ioh
0 block out quick on lan1 from any to 230.0.1.0/32
0 block out quick on lan1 from any to 224.0.0.251/32
0 block out quick on lan2 from any to 230.0.1.0/32
0 block out quick on lan2 from any to 224.0.0.251/32
0 block in quick on lan1 from 230.0.1.0/32 to any
0 block in quick on lan1 from 224.0.0.251/32 to any
0 block in quick on lan2 from 230.0.1.0/32 to any
0 block in quick on lan2 from 224.0.0.251/32 to any

Posted in Uncategorized | Leave a comment

Restore ocr and votedisk

BACKGROUND:
ocr and votedisk lost by using dd:
$dd if=/dev/zero of=/dev/rdisk/oradisk1 bs=1024000 count=200
ocr backup stored on ASM DG:
#ocrconfig -manualbackup
node1 2013/08/12 23:39:55 +ocrdg:/cluster/ocrbackup/backup_20130812_233955.ocr.256.823304397

OPERATION:
1.Stop crs on all nodes and start crs in exclusive mode on one node.
#crsctl stop crs -f #on all nodes
#crsctl start crs -excl -nocrs #on one node
2.Create a new DG with same name as the old ocr DG. Also mount the ocr backup DG.
$export ORACLE_SID=+ASM1
$export ORACLE_HOME=/scratch/app/11.2.0/grid
$sqlplus / as sysasm
SQL> create diskgroup vf external redundancy disk ‘/dev/rdisk/oradisk1’
> attribute ‘au_size’=’4M’,’compatible.asm’=’11.2.0.0′,’compatible.rdbms’=’11.2.0.0′;
SQL> alter diskgroup ocrdg mount; ==> where ocr backup stored

3.#ocrconfig -restore +ocrdg:/cluster/ocrbackup/backup_20130812_233955.ocr.256.823304397
#ocrcheck #Make sure no ocr corrupt

4.$crsctl replace votedisk +vf
CRS-4602: Failed 27 to add voting file 4ef6b63ac64f4fdcbf1d126eed744946.
Failed to replace voting disk group with +vf.
CRS-4000: Command Replace failed, or completed with errors.

From gpnp profile.xml we can see the “DiscoveryString” and “SPFile” are null.
id=”asm” DiscoveryString=”” SPFile=””
Set values in ASM instance such as following:
SQL> alter system set asm_diskstring=’/dev/rdisk/ora*’;
SQL> create spfile=’+VF/cluster/spfile’ from memory;
==> The default spfile stored in crs_home/dbs, maybe not proper.
SQL> startup force mount;
SQL> show parameter spfile
NAME TYPE VALUE
———————————— ———– ——————————
spfile string +VF/cluster/spfile

$crsctl replace votedisk +vf
Successful addition of voting disk a12cbe47e3504f20bf66a7d5446863ed.
Successfully replaced voting disk group with +vf.
CRS-4266: Voting file(s) successfully replaced

5.#crsctl stop crs -f
6.#crsctl start crs #On all nodes

Posted in Oracle | Leave a comment

vipca and nodeapps in grid10g

BACKGROUND:
GRID/RAC:10.2.0.1 => 10.2.0.4

SCENARIO:
1.When install Grid 10.2.0.1, there was an error when run root.sh such as below:

Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), “lan0” is not public. Public interfaces should be used to configure virtual IPs.
Due to the public NIC used a IP address “10.211.*.*” which considered as private network. So manually add nodeapps:
# /app/grid/bin/srvctl add nodeapps -n node1 -o /app/grid -A 10.211.161.130/255.255.255.0/lan0

2. Unfortunately I used a wrong netmask and again used public physical IP but not a virtual IP. So vip, listener and nodeapps were
offline. Modify the nodeapps and restart it.
# /app/grid/bin/srvctl modify nodeapps -n node1 -A 10.211.163.80/255.255.240.0/lan0
# /app/grid/bin/srvctl start nodeapps -n node1

To know what is the nodeapps configuration in particular for VIP, issue the command:
# /app/grid/bin/srvctl config nodeapps -n node1 -a -l -g -s
VIP exists.: /node1-v.us.oracle.com/10.211.163.80/255.255.240.0/lan0
GSD exists.
ONS daemon exists.
Listener exists.

3. Hit an issue again.
CRS-1006: No more members to consider
CRS-0215: Could not start resource ‘ora.node1.vip’.

Stop nodeapps and modify script $ORA_CRS_HOME/bin/racgvip to change the value of variable FAIL_WHEN_DEFAULTGW_NOT_FOUND=0 . Then
start nodeapps.

4.The listener still cannot start.
TNS-12542: TNS:address already in use
TNS-12560: TNS:protocol adapter error
TNS-00512: Address already in use
HPUX Error: 226: Address already in use

Use $ORA_RAC_HOME/bin/netca to delete and add a listener again.

5.Then we can see all the resources are online now:
# /app/grid/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE node1
ora….1.lsnr application ONLINE ONLINE node1
ora….1.gsd application ONLINE ONLINE node1
ora….1.ons application ONLINE ONLINE node1
ora….1.vip application ONLINE ONLINE node1
ora….SM2.asm application ONLINE ONLINE node2
ora….2.lsnr application ONLINE ONLINE node2
ora….2.gsd application ONLINE ONLINE node2
ora….2.ons application ONLINE ONLINE node2
ora….2.vip application ONLINE ONLINE node2

Note that all need to do for all the nodes in the cluster.

REFERENCE:
VIPCA Fails Complaining That Interface Is Not Public (Doc ID 316583.1)
CRS-0215: Could not start resource ‘ora..vip’ (Doc ID 356535.1)

Posted in Oracle | Leave a comment

choice

So tired! Am I right to choose this?

Posted in Gossip | Leave a comment

Farewell to former company and colleagues

I quitted from the former company last month, now I join the new company. I left Wuhan after living there for about nine years. In the new city, many things changed. But I think changing may cause more challenges and also opportunities.

Posted in Gossip | Leave a comment

SL500 drive error

More than one month ago, a drive in SL500 did not work normally. The following errors appeared in the /var/adm/messages:
Feb 27 08:21:00 ******** bptm[17486]: [ID 507448 daemon.warning] TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBM.ULTRIUM-TD3.002 (index 2), Media Id 000097
We replaced the drive, the bezel, the SFP, even the fiber in the drive. And also downgrade the firmware version. But error still existed.
Then we exchanged the location of the failed drive and another good drive. Errors still occured on the failed drive. So replaced the drive again and error disappeared.
In the process of exchanged the drive location, the drive path does not corresponding to the tape device one by one.
# luxadm -e port
# luxadm probe
……
Node WWN:500104f0009d2dfe Device Type:Tape device
Logical Path:/dev/rmt/0n
Node WWN:500104f0009d2e01 Device Type:Tape device
Logical Path:/dev/rmt/1n
Logical Path:/dev/rmt/2n
Node WWN:500104f0009d2e0a Device Type:Tape device
Logical Path:/dev/rmt/3n
Node WWN:500104f0009d2e04 Device Type:Tape device
Logical Path:/dev/rmt/4n
Node WWN:500104f0009d2e07 Device Type:Tape device
Logical Path:/dev/rmt/5n
# luxadm display 500104f0009d2e01
DEVICE PROPERTIES for tape: 500104f0009d2e01
Vendor: IBM
Product ID: ULTRIUM-TD3
Revision: 73P5
Serial Num: 9210259391
Device Type: Tape device
Path(s):
/dev/rmt/1n
/devices/pci@8,600000/SUNW,qlc@1/fp@0,0/st@w500104f0009d2e02,0:n
LUN path port WWN: 500104f0009d2e02
Host controller port WWN: 210000e08b1eaf21
Path status: Not Ready
/dev/rmt/2n
/devices/pci@8,600000/SUNW,qlc@1/fp@0,0/st@w500104f0009d2dfc,0:n
LUN path port WWN: 500104f0009d2dfc
Host controller port WWN: 210000e08b1eaf21
Path status: Not Ready
# cfgadm -al
……
c2 fc-fabric connected configured unknown
c2::500104f0009d2e05 tape connected configured unknown
c2::500104f0009d2e08 tape connected configured unknown
c2::500104f0009d2e0b tape connected configured unknown
c3 fc-fabric connected configured unknown
c3::500104f0009d2df2 med-changer connected configured unusable
c3::500104f0009d2dfc tape connected configured unknown
c3::500104f0009d2dff tape connected configured unknown
c3::500104f0009d2e02 tape connected configured unknown
……
# cfgadm -c unconfigure c3
# cfgadm -c configure c3
# luxadm probe
……
Node WWN:500104f0009d2dfe Device Type:Tape device
Logical Path:/dev/rmt/0n
Node WWN:500104f0009d2e01 Device Type:Tape device
Logical Path:/dev/rmt/1n
Node WWN:500104f0009d2dfb Device Type:Tape device
Logical Path:/dev/rmt/2n
Node WWN:500104f0009d2e0a Device Type:Tape device
Logical Path:/dev/rmt/3n
Node WWN:500104f0009d2e04 Device Type:Tape device
Logical Path:/dev/rmt/4n
Node WWN:500104f0009d2e07 Device Type:Tape device
Logical Path:/dev/rmt/5n

Posted in Daily work | Leave a comment

RMAN-20242

BACKGROUND:
1.Move the archivelog (sequence 11330-11339) out of the archivelog directory.
2.RMAN> backup archivelog sequence between 11320 and 11350 not backed up 1 times;
3.RMAN> crosscheck archivelog all;
RMAN> resync catalog;
4.Move the logs (sequence 11330-11339) back into the archivelog directory, and backup them using the following command:
RMAN> backup archivelog sequence between 11330 and 11339;
Then error occurs:
RMAN-06004: ORACLE error from recovery catalog database:
RMAN-20242: specification does not match any archive log in the recovery catalog
OPERATION:
In this scenario, we can see no records for the below commands:
RMAN> list backup of archivelog sequence between 11330 and 11339;
RMAN> report need backup;
RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
Report of files with less than 1 redundant backups
File #bkps Name
—-

Posted in Oracle | Leave a comment