| Veritas Cluster TroubleshootingKrazyWorks

Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Veritas

Veritas Cluster Troubleshooting

Submitted by Igor on February 27, 2008 – 1:26 am 6 Comments

(source)

Initial Notes

Veritas cluster server is a high availability server. This means that
processes switch between servers when a server fails.
All database processes are run through this server – and as such,
this needs to run smoothly.
Note that the oracle process should only actually be running
on the server which is active.
On monitoring tools, the procs light for whichever box is
secondary should be yellow, because oracle is not running.
Yet, the cluster is running on both systems.

Cluster Not Up — HELP

The normal debugging of steps includes:

checking on status, restarting if no faults, checking licenses, clearing faults if needed, and checking logs.

To find out Current Status:

/opt/VRTSvcs/bin/hastatus -summary

This will give the general status of each machine and processes

/opt/VRTSvcs/bin/hares -display

This gives much more details – down to the resource level.

If hastatus fails on both machines (it returns that the cluster is not up or returns nothing), try to start the cluster

/opt/VRTSvcs/bin/hastart

/opt/VRTSvcs/bin/hastatus -summary

will tell you if processes started properly.
It will NOT start processes on a FAULTED system.

Starting Single System NOT Faulted

If the system is NOT FAULTED and only one system is up,

the cluster probably needs to have gabconfig manually started.

Do this by running:

/sbin/gabconfig -c -x

/opt/VRTSvcs/bin/hastart

/opt/VRTSvcs/bin/hastatus -summary

If the system is faulted, check licenses and clear the faults as described next.

To check licenses:

vxlicense -p

Make sure all licenses are current – and NOT expired! If they are
expired, that is your problem. Call VERITAS to get temporary licenses.

There is a BUG with veritas licences. Veritas will not run if
there are ANY expired licenses — even if you have the valid ones you
need. To get veritas to run, you will need to MOVE the expired
licenses. [Note: you will minimally need VXFS, VxVM and RAID licenses
to NOT be expired from what I understand.]

vxlicense -p

Note the NUMBER after the license (ie: Feature name: DATABASE_EDITION [100])

cd /etc/vx/elm

mkdir old

mv lic.number old [do this for all expired licenses]

vxlicense -p [Make sure there are no expired licenses AND your good licenses are there]

hastart

If still fails, call veritas for temp licenses.
Otherwise, be certain to do the same on your second machine.

To clear FAULTS:

hares -display

For each resource that is faulted run:

hares -clear resource-name -sys faulted-system

If all of these clear, then run hastatus -summary and make sure
that these are clear. If some don’t clear you MAY be able to clear them
on the group level. Only do this as last resort:

hagrp -disableresources groupname

hagrp -flush group -sys sysname

hagrp -enableresources groupname

To get a group to go online:

hagrp -online group -sys desired-system

If it did NOT clear, did you check licenses?

Bringing up Machines when fault will NOT clear:

System has the following EXACT status:

gedb002# hastatus -summary

-- SYSTEM STATE-- System               State                Frozen

A  gedb001              RUNNING              0A  gedb002              RUNNING              0

-- GROUP STATE-- Group           System               Probed     AutoDisabled    State

B  oragrp          gedb001              Y          N               OFFLINE       B  oragrp          gedb002              Y          N               OFFLINE

gedb002#  hares -display | grep  ONLINEnic-qfe3  State           gedb001   ONLINEnic-qfe3  State           gedb002   ONLINE

gedb002# vxdg listNAME         STATE           IDrootdg       enabled  957265489.1025.gedb002

gedb001# vxdg listNAME         STATE           IDrootdg       enabled  957266358.1025.gedb001

Recovery Commands:

hastop -all

on one machine hastart

wait a few minutes

on other machine hastart

Reviewing Log Files:

If you are still having troubles, look at the logs in /var/VRTSvcs/log.
Look at the most recent ones for debugging purposes (ls -ltr).
Here is a short description of the logs in /var/VRTSvcs/log:

hashadow-log_A: hashadow checks to see if the ha cluster daemon

(had) is up and restarts it if needed. This is the log of that process.

engine.log_A: primary log, usually what you will be reading for debugging

Oracle_A: oracle process log (related to cluster only)

Sqlnet_A: sqlnet process log (related to cluster only)

IP_A: related to shared IP

Volume_A: related to Volume manager

Mount_A: related to mounting actual filesystes (filesystem)

DiskGroup_A: related to Volume Manager/Cluster Server

NIC_A: related to actual network device

By looking at the most recent logs, you can know what failed last
(or most recently). You can also tell what did NOT run which may be jut
as much of a clue.
Of course, if none of this helps, open a call with veritas tech
support.

Calling Tech Support:

If you have tried the previously described debugging methods,
call Veritas tech support: 800-634-4747.
Your company needs to have a Veritas support contract.

Restarting Services:

If a system is gracefully shutdown and it was running oracle or
other high availability services, it will NOT transfer them. It only
transfers services when the system crashes or has an error.

hastart

hastatus -summary

will tell you if processes started properly. It will NOT start
processes on a FAULTED system. If the system is faulted, clear the
faults as described above.

Doing Maintenance on DBs:

BEFORE working on DB

Run hastop -all -force

AFTER working on Dbs:

You MUST bring up oracle on same machine

Once Oracle is up, run:

hastart on the same machine as you started the work on (the first on system with oracle running)

wait 3-5 minutes

then run hastart on the other system

If you need the instance to run on the other system, you can run: hagrp -switch oragrp -to othersystem

Shutting down db machines:

If you shutdown the machine that is running veritas cluster,
it will NOT start on the other machine. It only fails over if the
machine crashes. You need to manually switch the services
if you shutdown the machine.
To switch processes:

Find out groups to transfer over

hagrp -display

Switch over each group

hagrp -switch group-to-move -to new-system

Then shutdown machine as desired. When rebooted will start cluster daemon automatically.

Doing Maintenance on Admin Network:

If the admin network is brought down (that the veritas cluster uses),
veritas WILL fault both machines AND bring down oracle (nicely).
You will need to do the following to recover:

hastop -all

On ONE machine: hastart

wait 5 minutes

On other machine: hastart

Manual start/stop WITHOUT veritas cluster:

THIS IS ONLY USED WHEN THERE ARE DB FAILURES

If possible, use the section on DB Maintenance.
Only use this if system fails on coming up AND you KNOW that it is due to a db configuration error. If you manually startup filesystems/oracle — manually shut them down and restart using hastart when done.

To startup:

Make sure ONLY rootdg volume group is active on BOTH NODEs. This
is EXTREMELY important as if it is active on both nodes corruption
occurs. [ie. oradg or xxoradg is NOT present]

vxdg list

hastatus (stop on both as you are faulted on both machines )

hastop -all (if either was active make sure you are truly shutdown!)

Once you have confirmed that the oracle datagroup is not active, on ONE machine do the following:

vxdg import oradg

[this may be xxoradg where xx is the client 2 char code]

vxvol -g oradg startall

mount -F vxfs /dev/vx/dsk/oradg/name /mountpoint [Find volumes and mount points in /etc/VRTSvcs/conf/config/main.cf]

Let DBAs do their stuff

To shutdown:

umount /mountpoint

[foreach mountpoint]

vxdg deport oradg

vxvol -g oradg stopall

clear faults; start cluster as described above

6 Comments »

mavis24 says:

March 26, 2013 at 11:24 am

I have a 2003 Chevy Venture. The temp gauge for this thing is no longer good. I got a replacement instrument cluster from Ebay and I want to replace the old one. If you are a mechanic, can you please send me the detailed steps to change the instrument cluster please.

Loading...

Reply to this comment »
davemc74656 says:

March 27, 2013 at 4:38 am

My Car is a 2007 350z, and I’d like to know how to remove the cluster gauge to a 350z. I want to change the lights in it from the stock orange to a red. I already bought the lights for it, but the instructions only come for after removal of the cluster gauge. The instructions require removal of the needles as well.

Loading...

Reply to this comment »
The Dark Knight says:

March 27, 2013 at 10:54 am

I have a 2003 Chevy 2500HD. First it was the Speedo, but would reset after a few starts. Then it was the whole cluster. It too would reset after a few start-ups. Now it seemd it will not reset and all gauges are pegged at 0.

No recall has been issued that I am aware of.
Please give me your comments and ideas.

Loading...

Reply to this comment »
nothin_nyce1 says:

March 28, 2013 at 4:28 am

For example if I want to survey 500 people from 3 different states of a country on the types of food that they like, should I use stratified or Cluster sampling? Why is it so? Will it be effective?

Loading...

Reply to this comment »
SKATEskum says:

March 31, 2013 at 2:38 am

I was in the hospital because my headaches are so bad..they did a cat scan and blood work..it was nothing serious..but i do have cluster headaches and migraines..can someone tell me what some home remedies are.. my head really hurts.

Loading...

Reply to this comment »
Praveen says:

March 31, 2013 at 11:33 pm

I have a broken instrument cluster in my car, and I was going to buy a new one on eBay. I have a 3400 model and the original cluster didn’t have the tachometer, but I was about to buy one that did have one. Is this a good idea? Will it just plug right in and work?

Loading...

Reply to this comment »