This document is intended for someone who is installing components of PBS, or who has administrative tasks, such as placing a machine online/offline. Other users please refer to the user's guide linked above.
We use a queue submission manager in the "cartoon network" machines named "PBS". A queue manager allows us to control how jobs are executed. It has three components: the pbs server, the pbs client and a scheduler. The pbs server and the scheduler run on a single machine, the server machine.
The pbs server (pbs_server) controls status of hosts, and
keeps track of job states. It communicates with the client and the
scheduler, and effectively commands jobs to start in the chosen host.
The scheduler is responsible for deciding the order in which jobs are
executed (based on user's usage etc), and the host where they will be
executed, and communicates these decisions to the pbs server. TORQUE
has its own scheduler (pbs_sched) but it is very slow,
taking minutes to start jobs as machines free up. We therefore use
maui as the scheduler.
The pbs client (pbs_mom) runs on every machines that
executes jobs, and controls job execution in that host.
The queue server is "redwood", and the queues we currently have are:
workq - the default queue.s4 - queue used for regression tests. Jobs submitted
here will
only get executed on the machine george.calo - queue dedicated to the CALO project.pbs_server)yum install lapack
/afs.cs.cmu.edu/user/sphinx/archive/TORQUE/. On the server, you
will need to install at least the torque,
torque-server, torque-client,
torque-devel, torque-mom, and
torque-docs packages. Do NOT install the
torque-sched package.
./configure --prefix=/usr --mandir=/usr/share/man --libdir=/usr/lib \ --includedir=/usr/include --with-server-home=/var/spool/torque \ --with-pam=/lib/security \ --with-default-server=redwood.speech.cs.cmu.edu --disable-gui \ --enable-syslog --with-tcl --enable-rpp --with-rcp=scp CC=gcc \ CFLAGS="-g -O2 -D_LARGEFILE64_SOURCE" LDFLAGS=
/usr/bin/qmgr, and create
the queues. You can cut and paste the lines in the box below to your
qmgr prompt. This will recreate the queue as it is
now. It is important to set the mail_from field to an
existing account, since the server reports problems by sending
messages to this account.
# # Create queues and set their attributes. # # # Create and define queue x86_64 # create queue x86_64 set queue x86_64 queue_type = Execution set queue x86_64 resources_default.neednodes = x86_64 set queue x86_64 enabled = True set queue x86_64 started = True # # Create and define queue s4 # create queue s4 set queue s4 queue_type = Execution set queue s4 acl_host_enable = False set queue s4 acl_hosts = george.speech.cs.cmu.edu set queue s4 keep_completed = 5 set queue s4 enabled = True set queue s4 started = True # # Create and define queue workq # create queue workq set queue workq queue_type = Execution set queue workq keep_completed = 5 set queue workq enabled = True set queue workq started = True # # Create and define queue i686 # create queue i686 set queue i686 queue_type = Execution set queue i686 resources_default.neednodes = i686 set queue i686 enabled = True set queue i686 started = True # # Create and define queue slow # create queue slow set queue slow queue_type = Execution set queue slow resources_default.neednodes = slow set queue slow enabled = True set queue slow started = True # # Set server attributes. # set server scheduling = True set server acl_host_enable = True set server acl_hosts = *.speech.cs.cmu.edu set server managers = dhuggins@*.speech.cs.cmu.edu set server managers += egouvea@*.speech.cs.cmu.edu set server managers += root@*.speech.cs.cmu.edu set server default_queue = workq set server log_events = 127 set server mail_from = sphinx set server query_other_jobs = True set server resources_default.ncpus = 1 set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server poll_jobs = True set server pbs_version = 2.1.2 set server allow_node_submit = True
/etc/init.d/pbs_server
restart. This will restart any job that might have been running
using the queue, so at least check the current jobs to see if there
are any jobs running for several hours.
/usr0/spool/torque/server_priv so that it is readable by
anyone. This will allow the tracejob to provide more
useful information for those trying to find out what happened to they
jobs.
maui)./configure --prefix=/usr --with-spooldir=/var/spool/maui
include/msched.h, change 4096 to 32768 in the following lines:
#define MMAX_JOB 4096 #define MAX_MJOB MMAX_JOB #define MAX_MJOB_TRACE 4096
include/msched-common.h, change 4096 to 16384 in the following line:
# define MAX_MTASK 4096
-rwx------ the maui code has to be local to the server, since it has to be readable by the root in the server. Also, maui doesn't include an init script to start automatically at boot time, so we have provided one in /afs.cs.cmu.edu/user/sphinx/archive/TORQUE/maui.init.sh:
#!/bin/sh cp /afs.cs.cmu.edu/user/sphinx/archive/TORQUE/maui.init.sh /etc/init.d/maui /sbin/chkconfig maui on
.speech.cs.cmu.edu in the server, then it should
not be included here either.
# maui.cfg 3.2.6p16 SERVERHOST redwood.speech.cs.cmu.edu # primary admin must be first in list ADMIN1 root ADMIN2 egouvea ADMIN3 dhuggins # Resource Manager Definition RMCFG[base] TYPE=PBS # Allocation Manager Definition AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:00:30 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 1 FSWEIGHT 1 FSUSERWEIGHT 20 CREDWEIGHT 1 CLASSWEIGHT 1 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html FSPOLICY DEDICATEDPES FSDEPTH 7 FSINTERVAL 86400 FSDECAY 0.80 # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.ht ml # NONE SPECIFIED # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST RESERVATIONDEPTH 5 # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html # NODEALLOCATIONPOLICY FASTEST NODEALLOCATIONPOLICY PRIORITY NODECFG[DEFAULT] PRIORITYF='SPEED + CPROCS - JOBCOUNT' JOBNODEMATCHPOLICY EXACTNODE # QOS: http://supercluster.org/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservati ons.html # SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html USERCFG[DEFAULT] FSTARGET=100.0 # USERCFG[DEFAULT] FSTARGET=25.0 # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR CLASSCFG[s4] FSTARGET=100.0 PRIORITY=1.0 CLASSCFG[workq] FSTARGET=100.0 PRIORITY=1.0 CLASSCFG[x86_64] FSTARGET=100.0 PRIORITY=1.0 CLASSCFG[slow] FSTARGET=100.0 PRIORITY=1.0 CLASSCFG[i686] FSTARGET=100.0 PRIORITY=1.0 # Nodes - valid only with maui3.0.7 or higher NODECFG[alder] SPEED=3.0 NODECFG[astro] SPEED=2.2 NODECFG[batman] SPEED=3.0 NODECFG[beaker] SPEED=1.0 NODECFG[bert] SPEED=2.4 NODECFG[betty] SPEED=3.0 NODECFG[bigbird] SPEED=3.2 NODECFG[blossom] SPEED=0.75 NODECFG[bubbler] SPEED=0.75 NODECFG[buckeye] SPEED=3.0 NODECFG[bunsen] SPEED=1.0 NODECFG[buttercup] SPEED=0.75 NODECFG[catalpa] SPEED=3.0 NODECFG[daphne] SPEED=2.2 NODECFG[dogwood] SPEED=3.0 NODECFG[dumbo] SPEED=1.0 NODECFG[elroy] SPEED=2.2 NODECFG[ernie] SPEED=1.0 NODECFG[eucalyptus] SPEED=3.0 NODECFG[facloan-1850-1] SPEED=3.6 NODECFG[facloan-1850-2] SPEED=3.6 NODECFG[filbert] SPEED=4.0 NODECFG[fozzie] SPEED=1.0 NODECFG[fred] SPEED=1.0 NODECFG[george] SPEED=2.2 NODECFG[ginkgo] SPEED=4.0 NODECFG[gonzo] SPEED=1.0 NODECFG[goofy] SPEED=1.0 NODECFG[jane] SPEED=2.2 NODECFG[judy] SPEED=2.2 NODECFG[karybdis] SPEED=2.8 NODECFG[kermit] SPEED=2.3 NODECFG[mafalda] SPEED=3.0 NODECFG[mickey] SPEED=1.7 NODECFG[muttley] SPEED=3.0 NODECFG[piggy] SPEED=2.8 NODECFG[redwood] SPEED=2.6 NODECFG[scooby] SPEED=2.2 NODECFG[scrappy] SPEED=2.2 NODECFG[scylla] SPEED=2.8 NODECFG[shaggy] SPEED=2.2 NODECFG[spacely] SPEED=3.0 NODECFG[utonium] SPEED=0.75 NODECFG[velma] SPEED=2.2 NODECFG[wilma] SPEED=1.0
pbs_mom)nfs and amd) on the
new machine.export all partitions/usr0 *.speech.cs.cmu.edu(rw)" to the file
/etc/exports to export partition /usr0
/usr0 *.fac.cs.cmu.edu(rw)" to the file
/etc/exports to export partition /usr0 if
there are machines named *.fac.cs.cmu.edu in the queue.
nfs daemon.
/etc/rc.d/rc3.d/. The links are
named so that the scripts will start when the machine goes into state 3
(i.e. multiuser mode with a network), and are numbered so that they
start before the PBS daemon.
ln -s /etc/rc.d/init.d/amd /etc/rc.d/rc3.d/S87amd ln -s /etc/rc.d/init.d/nfs /etc/rc.d/rc3.d/S87nfs
/etc/rc.d/rc.local
/etc/rc.d/init.d/amd start /etc/rc.d/init.d/nfs start
yum install lapack
% cat ~sphinx/archive/pbs/install_torque_client.sh #!/bin/bash /etc/init.d/pbs stop TORQUE_ROOT=/afs/cs.cmu.edu/user/sphinx/archive/TORQUE/ cd $TORQUE_ROOT/torque-2.1.2/ make install_mom install_clients cp $TORQUE_ROOT/pbs_mom /etc/init.d/ ln -s /etc/init.d/pbs_mom /etc/rc3.d/S95pbs_mom ln -s /etc/init.d/pbs_mom /etc/rc3.d/K15pbs_mom # The following needs to be fixed # SYS=`uname -i` # TORQUE_ROOT=/afs/cs.cmu.edu/user/sphinx/archive/TORQUE/$SYSr # rpm -ivh $TORQUE_ROOT/torque-2.1.2-1cri.$SYS.rpm # rpm -ivh $TORQUE_ROOT/torque-client-2.1.2-1cri.$SYS.rpm # rpm -ivh $TORQUE_ROOT/torque-docs-2.1.2-1cri.$SYS.rpm # rpm -ivh $TORQUE_ROOT/torque-mom-2.1.2-1cri.$SYS.rpm ~sphinx/archive/pbs/pbs_housekeeping.sh % ~sphinx/archive/pbs/install_torque_client.sh
pbs_housekeeping.sh,
called in the last line above, is presented below.
#!/bin/bash
# The argument is the number of CPUs, from which we estimate
# thresholds for when a machine is considered 'busy' based on cpu
# average load.
CPU=`grep -c processor /proc/cpuinfo`
# The "ideal" number is the CPU level at which the machine is considered
# "free" again. It's higher than at least the number of CPUs minus 1.
ideal=`echo $CPU | awk '{lim = 0.7 * $1};{inf = $1 - .9};{printf "%.1f", ((lim < inf) ? inf : lim)}'`
# The "max" number is the CPU level at which the machine is considered
# "busy". It's higher than at least the "ideal" above, and slightly
# less than the total number of CPUs
max=`echo $CPU | awk '{lim = 0.9 * $1};{inf = $1 - .3};{printf "%.1f", ((lim < inf) ? inf : lim)}'`
# The pbs_mom config file, pointing to the pbs server, allowing copies
# via 'cp' over NFS, and defining the thresholds for cpu load average.
# The *.fac.cs.cmu.edu was added when facilities loaned some machines
# to the speech group. It can be removed if the machines are all named
# .speech.cs.cmu.edu.
PBS_CONFIG=/var/spool/torque/mom_priv/config
cat > $PBS_CONFIG<<EOF
\$clienthost redwood.speech.cs.cmu.edu
\$clienthost redwood
\$restricted *.cs.cmu.edu
\$usecp *.speech.cs.cmu.edu:/ /
\$usecp *.SPEECH.CS.CMU.EDU:/ /
\$usecp *.fac.cs.cmu.edu:/ /
\$usecp *.FAC.CS.CMU.EDU:/ /
\$ideal_load $ideal
\$max_load $max
EOF
# Cron job, running automatically every day, cleaning up log files. We
# remove the log files that are more than a month old.
CRON=/etc/cron.daily/pbs_log_cleanup.cron
cat > $CRON <<EOF
#!/bin/sh
# These are log files locations for TORQUE
find /var/spool/torque/mom_logs -mtime +31 -exec rm {} \;
find /var/spool/torque/spool -mtime +31 -exec rm {} \;
find /var/spool/torque/undelivered -mtime +31 -exec rm {} \;
EOF
chmod +x $CRON
# Links to server log files, so that tracejob works from any machine
ln -s /net/redwood/usr0/spool/torque/server_priv /var/spool/torque/
ln -s /net/redwood/usr0/spool/torque/server_logs /var/spool/torque/
/bin/rm -rf /usr/pbs
mkdir -p /usr/pbs/bin
# Links to binaries so that people's old scripts don't break
ln -s /usr/bin/chk_tree \
/usr/bin/hostn \
/usr/bin/nqs2pbs \
/usr/bin/pbs_tclsh \
/usr/bin/pbsdsh \
/usr/bin/pbsnodes \
/usr/bin/printjob \
/usr/bin/printtracking \
/usr/bin/qalter \
/usr/bin/qdel \
/usr/bin/qdisable \
/usr/bin/qenable \
/usr/bin/qhold \
/usr/bin/qmgr \
/usr/bin/qmove \
/usr/bin/qmsg \
/usr/bin/qorder \
/usr/bin/qrerun \
/usr/bin/qrls \
/usr/bin/qrun \
/usr/bin/qselect \
/usr/bin/qsig \
/usr/bin/qstart \
/usr/bin/qstat \
/usr/bin/qstop \
/usr/bin/qsub \
/usr/bin/qterm \
/usr/bin/tracejob \
/usr/pbs/bin
cp ~sphinx/archive/pbs/pbshosts.pl /usr/pbs/bin/
ln -s /usr/pbs/bin/pbshosts.pl /usr/pbs/bin/pbshosts
# Stop the daemon preparing for directory changes
/etc/init.d/pbs_mom stop
# Move the spool area to a partition with more space.
# We chose /usr0, since this partition exists in all machines.
if ! test -h /var/spool/torque; then
/bin/rm -rf /usr0/spool/torque
mv /var/spool/torque /usr0/spool
# Create links, so PBS will not get lost.
ln -s /usr0/spool/torque /var/spool
fi
echo "redwood.speech.cs.cmu.edu" > /var/spool/torque/server_name
# Restart the queue
/etc/init.d/pbs_mom start
pbshosts. This is a script created in house. If it is not installed, use the commands below.
/bin/rm /usr/pbs/bin/pbshosts cp ~sphinx/archive/pbs/pbshosts.pl /usr/pbs/bin/ ln -s /usr/pbs/bin/pbshosts.pl /usr/pbs/bin/pbshosts
pbshosts itself is a perl script presented below.
#!/usr/bin/perl
#
# Assumptions:
# -if there's no "=", the line contains node name
# -pbs server is named "redwood.speech.cs.cmu.edu"
# -info is presented as "name = value"
#
# Modified for torque dhuggins@cs, 16 Aug 2006
#
# Created egouvea@cs.cmu.edu, 27 Sept. 2002
#
use strict;
my %list = ();
my ($machine, $state, $cpu, $jobs);
open(PBS, "pbsnodes -a |") or die "Can't run pbsnodes";
while (<PBS>) {
chomp();
s/\.redwood\.speech\.cs\.cmu\.edu//gi;
if (/^$/) {
$list{$machine} = "$cpu|$state|$jobs";
$machine = "";
$state = "";
$cpu = "";
$jobs = "";
next;
}
if (!/=/) {
$machine = $_;
next;
}
$state = $1 if (/state = (.*)/);
$cpu = $1 if (/np = (.*)/);
$jobs = $1 if (/jobs = (.*)/);
}
my $format = "%-15s%2s %-15s %s\n";
printf $format, "Node", "np", "state", "jobs";
print "================================================================\n";
foreach my $host (sort keys %list) {
($cpu, $state, $jobs) = split /\|/, $list{$host};
printf $format, $host, $cpu, $state, $jobs;
}
print "\n";
print "http://www.cs.cmu.edu/~robust/machines.html for info about machines\n";
print "\n";
~robust/script/diskSpace/nodes)
# Make sure we're sending this to someone in the group MAILTO=sphinx+web # robust's disk space monitoring, per user data 0 7 * * 6 /afs/cs.cmu.edu/user/robust/script/diskSpace/collectDuInMachine.sh # air's cluster monitoring # [200607] (air) # run daily; mostly to keep track of the disk space situation 56 05 * * * /afs/cs/user/air/src/linuxstat/daily.sh
/etc
scp sphinx@redwood:archive/TORQUE/smbmount.credentials /etc scp sphinx@redwood:archive/TORQUE/smbgale.credentials /etc
LDC, CALO_DATA, and GALE, used in the step below.
mkdir /LDC /CALO_DATA /GALE
/etc/fstab file, so the machines automatically mount data
on bar.speech. Older machines will require smbfs instead of cifs below.
//bar/LDC /LDC cifs ro,credentials=/etc/smbmount.credentials 0 0 //bar/CALO_DATA /CALO_DATA cifs ro,credentials=/etc/smbmount.credentials 0 0 //bar/GALE /GALE cifs rw,credentials=/etc/smbgale.credentials,file_mode=0664,dir_mode=0775,gid=100 0 0
mount -a
% /usr/bin/qmgr Qmgr: create node betty np=4 Qmgr: quit
/etc/hosts.equiv in the
server. You only need to do this in the server. You need to add new
machines to this file because the server uses rsh to communicate with
other machines, and rsh checks this file for permissions.
/var/spool/PBS/maui/maui.cfg. Speed is relative to the slowest machines. For the machine scrappy, it is a line such as
NODECFG[scrappy] SPEED=2.2
% /etc/init.d/pbs_server restart
% /usr/bin/qterm % /etc/init.d/pbs_server start
% /etc/init.d/maui restart
% ps ax | grep rpciod 945 ? SW 9:48 [rpciod] % kill -9 945
tracejob <job id>
/usr/bin/checkjob <job id>
/usr/bin/diagnose -p
/usr/bin/diagnose -f
qstat or qsub does not work, and gives
me a message like "server not responding".The server is possibly down, and needs to be restarted. You can do this by rebooting the machine, or by restarting the server process, as below. Notice that restarting the server this way will restart (i.e., stop and run again from scratch) all running jobs. For a solution that keeps the jobs running, check the How to section.
/etc/init.d/pbs_server restart
Q state,
but the jobs do not get executedThe scheduler, maui, is possibly down. You can reboot the machine, or restart the scheduler, as below.
/etc/init.d/maui restart
It may happen that a process may get stuck. A process (what you
get when you do a ps) is different from a job (what you
get when you do a qstat). Most commonly, if a process gets
stuck, it gets stuck in the "D" state, the so called
uninterruptible I/O state.
The uninterruptible I/O state occurs when the process is waiting for I/O, either reading from a file or writing to a file. It is normal that a process goes into this state for a couple of seconds. But because of network traffic, the process may fall into a situation where it waits for I/O forever, and the network just does not respond.
You can verify whether a process is in "D" state by typing
ps x and looking at the letter under the column named
STAT. Normally, this column will have a letter like "S"
or "R", or "D" for a few seconds. If it is always in "D", you
will need root access, or ask someone with root access to get out of
it. You do not need to ask facilities to do this, just ask someone
around you.
A state in the "D" state cannot be killed, otherwise it would not be
"uninterruptible". Look for a process named rpciod
(e.g. ps ax | grep rpciod). Send a "KILL" signal to this
process. This will not kill the process, but just send a signal to
it. The processes in "D" state will cease waiting for I/O, and may
finish, if they were waiting to finish.
% ps ax | grep rpciod 945 ? SW 9:48 [rpciod] % kill -9 945
This happens when PBS launches a job in a machine, and the user does
not have an account there. To fix this, find out where the attempted
execution happened, looking for the field exec_host in
the output of the tracejob command, and create an account
for the user in that machine.
Please also check the troubleshooting section in the User's manual.
Page created by Evandro B. Gouvêa on 29 June 2004
Page maintained by Evandro B. Gouvêa () and David Huggins-Daines ()
Last modified: Fri May 18 15:59:33 Eastern Daylight Time 2007