Thursday, April 28, 2005

Pesky style sheet format.....

Sorry that some of the lines are wrapped and others run off the edge, I'm still figuring out how to make the composition tools and template do what I want, rather than what they want....

Automated rotating exdump

I've added an option that automates the processing of exacct data to text file format and rotates the logs, it is hard wired to create a new log file in /var/adm/exacct (which is an empty directory created during Solaris install as standard) and it includes the hostname and the date+time in the new file.

The previous log name is used as the base of the output file, which is written to a specified directory with a .txt extension.

The file /etc/acctadm.conf is maintained by the acctadm command, I read the log file names from it, and if there is no log file I don't start one. i.e. you need to manually start accounting with acctadm the first time to decide which logs you want to generate. The command syntax now looks like this:

# ./exdump
Usage: exdump [-vwr] [ file | -a dir ]
-v: verbose
-w: wracct all processes first
-r: rotate logs
-a dir: use acctadm.conf to get input logs, and write output files to dir

The usual way to run the command will be from cron as shown

0 * * * * /opt/exdump/exdump -war /var/tmp/exacct > /dev/null 2>&1
2 * * * * /bin/find /var/adm/exacct -ctime +7 -exec rm {} \;

This also shows how to clean up old log files, I only delete the binary files in this example, and I created /var/tmp/exacct to hold the text files. The process data in the text file looks like this:

timestamp locltime duration procid ppid uid usr sys majf rwKB vcxK icxK sigK sycK arMB mrMB command
1114734370 17:26:10 0.0027 16527 16526 0 0.000 0.002 0 0.53 0.00 0.00 0.00 0.1 0.7 28.9 acctadm
1114734370 17:26:10 0.0045 16526 16525 0 0.000 0.001 0 0.00 0.00 0.00 0.00 0.1 1.1 28.9 sh
1114734370 17:26:10 0.0114 16525 8020 0 0.001 0.005 0 1.71 0.00 0.00 0.00 0.3 1.0 28.9 exdump
1109786952 10:09:12 -1.0000 0 0 0 0.012 118.883 131 0.00 0.33 0.00 0.00 0.0 0.0 0.0 sched
1109786952 10:09:12 -1.0000 0 0 0 0.012 118.883 131 0.00 0.33 0.00 0.00 0.0 0.0 0.0 sched
1109786952 10:09:12 -1.0000 0 0 0 0.012 118.883 131 0.00 0.33 0.00 0.00 0.0 0.0 0.0 sched
1109786959 10:09:19 -1.0000 1 0 0 4.311 3.066 96 47504.69 49.85 0.18 0.34 456.2 0.9 1.0 init
1109786959 10:09:19 -1.0000 2 0 0 0.000 0.000 0 0.00 0.00 0.00 0.00 0.0 0.0 0.0 pageout

Since the -w option forced long running processes to write an accounting record they have not finished, and I chose to mark this by showing the duration as -1. One odd thing shown here is that the sched process generated three identical accounting records, this needs some more investigation.

I'm going to run this for a while, and install it on a few other machines including SPARC Solaris 10 and Solaris 8 to make sure its robust.



Monday, April 18, 2005

Flow accounting printout

I went to edit the code and realized that I hadn't finished some previous work, so I added a printout of the flow accounting data, an example is shown below.

The timestamp is the start of the flow, I translate to local timezone and show that as well. I difference the two timestamps to get the duration of the flow in seconds; add the uid and project id (which are only present for outbound flows); show the protocol; decode the IP addresses and look them up as names if possible, along with their port numbers; list the number of packets and bytes in the flow; and end with the filter name from the ipqos definition.

timestamp  locltime dur   uid  proj prot           srcip:port            dstip:port  npkts  nbytes aname
1113850483 11:54:43 13 0 0 tcp 66.102.15.100:80 crun:55783 21 4148 acct
1113850483 11:54:43 13 100 10 tcp crun:55783 66.102.15.100:80 13 15082 acct
1113850483 11:54:43 13 0 0 tcp crun:55783 66.102.15.100:80 6 240 acct
1113850480 11:54:40 16 100 10 tcp crun:55782 66.102.15.101:80 3 1253 acct
1113850480 11:54:40 16 0 0 tcp crun:55782 66.102.15.101:80 7 280 acct

Thursday, April 14, 2005

Sequencing exacct logs

I decided to simplify the next step and to just get raw data logged in a useful manner. To do this there needs to be a periodic process that generates readable log files. After thinking about this for a while, I think I have a simple and effective way to do it.

I will modify the exdump code to add another option "-a". This will invoke acctadm to switch to a new set of log files, with a datestamped name. Before it does the switch, it will need to cause all current processes to write accounting entries, so "-a" will normally be used in conjunction with the "-w" option I added already. After the log switch, the old log files will be processed from their binary form to a text file with one record per line, ready for consumption by futher processing steps.
When "-a" is specified, a directory can be specified on the command line to hold output files, the input filename does not need to be specified as it was for the original version. This makes it easy to invoke exdump directly from cron without needing a wrapper script.

The sequence is:

Obtain existing log files by running acctadm or reading /etc/acctadm.conf
Note which logs are being generated and their filenames
Run wracct on all existing processes
Run acctadm to switch to new log files with date-time-stamped names
Process existing log files to text form in the current directory or a
specified directory using same base filenames

To start exacct the user needs to manually run acctadm once to enable the logs of interest (flow, task or proc) then exdump will generate the output logs at the cron interval. A find command can be used from cron to remove old logs and keep disk space managed.

Since many commands run on the hour, it might be worth starting exdump a few seconds early, and have the file timestamp round up to the next minute so that any command sequences started from cron on the hour are less likely to get split up.

This still leaves the problem of processing individual accounting records into useful data, but at least the one line per record text form with interval data will be more tractable than the binary file.

Friday, April 01, 2005

Writing accounting records at time intervals

A major new feature of the exacct system is the ability to get an accounting record logged without terminating the process. There are two forms of this, for tasks you can get the record to dump the delta since the last record was logged. Somehow the task remembers the data each time it cuts a record so it can do the differencing. This seems to be too much overhead at the process level, so the other option is to cut a record that logs the same data as if the process had just exited, and this option is available for both tasks and processes.

The command that causes a record to be written is "wracct" and it takes a list of process or task id's and makes a system call to cause the record to be written. You have to be root to do this. The wracct command line syntax is a pain if you want to get it to dump multiple processes, as shown in this example from the manpage:

# /usr/sbin/wracct -i "`pgrep sendmail`" process

I want to make every process cut a record, and if you attempt to do this with wracct you need to form a list of every process id on the system. I tried to do this by listing all the entries in the /proc filesystem, but if any of the pid's exit before wracct gets to them it gets an error and quits. This is stupid, because if a process has exited, it has already cut an accounting record! The wracct command should have a "-a" option that writes all records and ignores errors.

I modified the exdump command to have a "-w" option that loops over all processes and forces an accounting record to be written before it reads the accounting file. If you aren't root, it has no effect. The code looks like this:


if (wflag) {
DIR *dirp;
struct dirent *dp;

dirp = opendir("/proc");
do {
if ((dp = readdir(dirp)) != NULL)
(void) wracct(P_PID, atoi(dp->d_name), EW_PARTIAL);
} while (dp != NULL);
(void) closedir(dirp);

}


The next step is a bit more complex. Both for flow accounting and interval process accounting, I need code that remembers previous records and matches them with subsequent ones so that they can be accumulated (flows) or differenced (process intervals).