Wednesday, September 12, 2007

Monitoring system using solaris commands.......

Monitoring CPU Usage

Method:1
We can use SAR command to monitor CPU usage. The SAR commands gives you a quick snapshot of haw heavily the CPU is bogged down.
# sar -u 5 5
SunOS flex-prod 5.10 Generic_118833-23 sun4u 09/12/2007

12:53:16 %usr %sys %wio %idle
12:53:21 33 2 0 66
12:53:26 33 1 0 66
12:53:31 36 2 0 63
12:53:36 55 4 0 40
12:53:41 59 7 0 34

Average 43 3 0 54
This is tell you:
%usr Percentage of CPU running in user mode
%sys Percentage of CPU running in system mode
%wio Percentage of CPU running in with a process waiting for block I/O
%idle Percentage of CPU that is IDLE
TIPS: Use the sar -u command to see how heavily the CPU is bogged down. The Low CPU idle time can be an I/O issue and not a CPU issue.

Method:2
# vmstat 5 5
procs memory page disk faults cpu
r b w avm fre re at pi po fr de sr d0 s1 d2 s3 in sy cs us sy id
0 0 0 0 1088 0 2 2 0 1 0 0 0 0 0 0 26 72 24 0 1 98
Note: The CPU is spending most of its time in IDLE mode (id). That means that the CPU is not being heavily used at all. There are no processes that are waiting to be run (r), blocked (b), or waiting for IO (w) in the RUN QUEUE.

Method:3
# sar -qu 5 5
Note: The CPU is spending most (94%) of its time in idle mode. This CPU is not being heavily used at all. if CPU is using heavily, two solutions to this are:
1. Obtain a faster processor
2. Use more CPU's.

Monitor I/O problem
Method :1
Use following command for Monitoring I/O
# sar -d 5 2
Note: This Command lists the % busy, avgue (average queue length), r+w/s, blks/s ( Number of block transfered), avwait and avserv.
Tips: A high % busy and high avque indicate a disk I/O bottleneck. if this condition persist, an analysis of disk should lead to a reorganised of information from heavy load to a less used disk.

Method: 2
# iostat -d 5 5
Note: iostat will display the number of kilobytes transferred per second, the number of transfers per second, and the milliseconds per average seek.
Tips: KPS rates over 30 indicate heavy usage of a particular disk. If only one disk shows heavy usage, consider moving some of your datafiles off it or striping your data across several disks.

Method: 3
# sar -b 5 5
15:52:57 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
15:53:12 0 2 90 1 2 38 0 0
Note: The "-b" option indicates the overall health of the IO subsystem.


Tips: The %rcache should be greater than 90% and %wcache should be greater than 60%. If this is not the case, your system may be bound by disk IO. The sum of bread, bwrit, pread, and pwrit gives a good indicator of how well your file subsystem is doing. The sum should not be greater than 40 for 2 drives and 60 for 4-8 drives. If you exceed these values, your system may be IO bound.
When analyzing disk IO, make sure that you have balanced the load on your system. Here is a tips of steps for designing a disk layout for Oracle:

• Make sure that your logfiles and archived logfiles are NOT on the same disk as your datafiles. This is a basic safety precaution against disk failure.
• Allocate one disk for the User Data Tablespace.
• Place Rollback, Index, and System Tablespaces on separate disks.

Monitor Process which is using the most CPU
Use following command:
# ps -e -o pcpu,pid,user,args | sort -k 0,0 -r

This command list the %CPU used , PID, USER and Command that was executed. if the top user was Oracle User, you must to know the information on the process form oracle.

use following query for that:
SQL> select a.username,a.osuser,a.program, spid,sid,a.serial# from v$session a,v$process b where a.paddr=b.addr and spid='&pid' ;
Enter value for pid: 17929
old 3: and spid='&pid'
new 3: and spid='17929'

USERNAME OSUSER PROGRAM SPID SID SERIAL#
APPS applprod 17929 17 28394

APPS applprod 17929 192 20763

APPS applprod 17929 108 17788

SQL> select b.username,a.sql_text from v$sql a,v$session b where b.sql_address = a.address and b.sql_hash_value = a.hash_value and b.sid = '&SID' and b.serial# = '&PROCESS'

Tips: Enter SID and Serial# because the values retrieved in the first query.
SQL>select b.username,a.sql_text from v$open_cursor a, v$session b where b.sql_sddress = a.address and b.sql_hash_value=a.hash_value and b.sid= '&SID' and b.serial# = '&PROCESS'

Tips: If you had a ad-hoc query user problem and problem queries showed up in this result regularly.


Identify CPU Bottlenecks
Use following Command:

# mpstat 10 5
The mpstat command is tool that report per-processor statistics in tabular form. Each row of the table represent the activity of one processor.
Pay attention on smtx measurement, it measure the number of times the CPU faild to obtain a mutual exclusion lock.

TIPS: If the smtx column for the mpstat output is greater than 200, you are heading toward CPU bottleneck problems.
Monitoring Paging/Swaping

One of the most common problems when running large numbers of concurrent users on UNIX machines is lack of memory. In this case, a quick review memory management is useful to see what effect lack of RAM can have on performance.

When analyzing your machine, make sure that the machine is not swapping at all and at worst paging lightly. This indicates a system with a healthy amount of memory available. To analyze paging and swapping, use the following commands.


$ vmstat 5 5
procs memory page disk faults cpu
r b w avm fre re at pi po fr de sr d0 s1 d2 s3 in sy cs us sy id
0 0 0 0 1088 0 2 2 0 1 0 0 0 0 0 0 26 72 24 0 1 98

Note: There are NO pageouts (po) occurring on this system. There are also 1088 * 4k pages of free RAM available (4 Meg). It is OK and normal to have page out (po) activity. You should get worried when the number of page ins (pi) starts rising. This indicates that you system is starting to page.

$ sar -wpg 5 5
09:54:29 swpin/s pswin/s swpot/s pswot/s pswch/s
atch/s pgin/s ppgin/s pflt/s vflt/s slock/s
pgout/s ppgout/s pgfree/s pgscan/s %s5ipf
09:54:34 0.00 0.0 0.00 0.0 12
0.00 0.22 0.22 0.65 3.90 0.87
0.00 0.00 0.00 0.00 0.00


Note: There is absolutely no swapping or paging going on. (swpin,swpot,ppgin,ppgout).

$ sar -r 5 5
10:10:22 freemem freeswp
10:10:27 790 5862


This will give you a good indication of how much free swap and RAM you have on your machine. There are 790 pages of memory available and 5862 disk blocks of SWAP available.

177 comments: