mpirun
Run mpi programsDescription
"mpirun" is a shell script that attempts to hide the differences in starting jobs for various devices from the user. Mpirun attempts to determine what kind of machine it is running on and start the required number of jobs on that machine. On workstation clusters, if you are not using Chameleon, you must supply a file that lists the different machines that mpirun can use to run remote jobs or specify this file every time you run mpirun with the -machine file option. The default file is in util/machines/machines.<arch>.mpirun typically works like this
mpirun -np <number of processes> <program name and arguments>
If mpirun cannot determine what kind of machine you are on, and it is supported by the mpi implementation, you can the -machine and -arch options to tell it what kind of machine you are running on. The current valid values for machine are
chameleon (including chameleon/pvm, chameleon/p4, etc...) meiko (the meiko device on the meiko) paragon (the ch_nx device on a paragon not running NQS) p4 (the ch_p4 device on a workstation cluster) ibmspx (ch_eui for IBM SP2) anlspx (ch_eui for ANLs SPx) ksr (ch_p4 for KSR 1 and 2) sgi_mp (ch_shmem for SGI multiprocessors) cray_t3d (t3d for Cray T3D) smp (ch_shmem for SMPs) execer (a custom script for starting ch_p4 programs without using a procgroup file. This script currently does not work well with interactive jobs)You should only have to specify mr_arch if mpirun does not recognize your machine, the default value is wrong, or you are using the p4 or execer devices. The full list of options is
Parameters
The options for mpirun must come before the program you want to run and must be spelled out completely (no abreviations). Unrecognized options will be silently ignored.mpirun [mpirun_options...] <progname> [options...]
-arch <architecture> |
specify the architecture (must have matching machines.<arch>
file in ${MPIR_HOME}/util/machines) if using the execer
| |
-h | This help
| |
-machine <machine name> |
use startup procedure for <machine name>
| |
-machinefile <machine-file name> |
Take the list of possible machines to run on from the
file <machine-file name>
| |
-np <np> |
specify the number of processors to run on
| |
-nolocal |
do not run on the local machine (only works for
p4 and ch_p4 jobs)
| |
-stdin filename |
Use filename as the standard input for the program. This
is needed for programs that must be run as batch jobs, such
as some IBM SP systems and Intel Paragons using NQS (see
-paragontype below).
| |
-t | Testing - do not actually run, just print what would be
executed
| |
-v | Verbose - throw in some comments
| |
-dbx | Start the first process under dbx where possible
| |
-gdb | Start the first process under gdb where possible
(on the Meiko, selecting either -dbx or -gdb starts prun
under totalview instead)
| |
-xxgdb | Start the first process under xxgdb where possible (-xdbx
does not work)
| |
-tv | Start under totalview
|
Special Options for NEC - CENJU-3
-batch | Excecute program as a batch job (using cjbr)
| |
-stdout filename |
Use filename as the standard output for the program.
| |
-stderr filename |
Use filename as the standard error for the program.
|
Special Options for Nexus device
-nexuspg filename |
Use the given Nexus startup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
| |
-nexusdb filename |
Use the given Nexus resource database.
|
Special Options for Workstation Clusters
-e | Use execer to start the program on workstation
clusters
| |
-pg | Use a procgroup file to start the p4 programs, not execer
(default)
| |
-leave_pg |
Do not delete the P4 procgroup file after running
| |
-p4pg filename |
Use the given p4 procgroup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
| |
-tcppg filename |
Use the given tcp procgroup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
| |
-p4ssport num |
Use the p4 secure server with port number num to start the
programs. If num is 0, use the value of the
environment variable MPI_P4SSPORT. Using the server can
speed up process startup. If MPI_USEP4SSPORT as well as
MPI_P4SSPORT are set, then that has the effect of giving
mpirun the -p4ssport 0 parameters.
|
Special Options for Batch Environments
-mvhome | Move the executable to the home directory. This
is needed when all file systems are not cross-mounted.
Currently only used by anlspx
| |
-mvback files |
Move the indicated files back to the current directory.
Needed only when using -mvhome; has no effect otherwise.
| |
-maxtime min |
Maximum job run time in minutes. Currently used only
by anlspx. Default value is 15 minutes
| |
-nopoll | Do not use a polling-mode communication.
Available only on IBM SPx.
| |
-mem value |
This is the per node memory request (in Mbytes). Needed for some
CM-5s.
| |
-cpu time |
This is the the hard cpu limit used for some CM-5s in
minutes.
|
Special Options for IBM SP2
- -cac name
- CAC for ANL scheduler. Currently used only by anlspx. If not provided will choose some valid CAC.
Special Options for Intel Paragon
-paragontype name |
Selects one of default, mkpart, NQS, depending on how you want
to submit jobs to a Paragon.
| |
-paragonname name |
Remote shells to name to run the job (using the -sz method) on
a Paragon.
| |
-paragonpn name |
Name of partition to run on in a Paragon (using the -pn name
command-line argument)
|
Return value
On exit, mpirun returns a status of zero unless mpirun detected a problem, in which case it returns a non-zero status (currently, all are one, but this may change in the future).
Specifying Heterogeneous Systems
Multiple architectures may be handled by giving multiple -arch and -np arguments. For example, to run a program on 2 sun4s and 3 rs6000s, with the local machine being a sun4, use
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 programThis assumes that program will run on both architectures. If different executables are needed (as in this case), the string %a will be replaced with the arch name. For example, if the programs are program.sun4 and program.rs6000, then the command is
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 program.%aIf instead the execuables are in different directories; for example, /tmp/me/sun4 and /tmp/me/rs6000, then the command is
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 /tmp/me/%a/programIt is important to specify the architecture with -arch before specifying the number of processors. Also, the first -arch command must refer to the processor on which the job will be started. Specifically, if -nolocal is not specified, then the first -arch must refer to the processor from which mpirun is running.
(You must have machines.<arch> files for each arch that you use in the util/machines directory.)
Another approach that may be used the the ch_p4 device is to create a procgroup file directly. See the MPICH Users Guide for more information.
Location:/home/MPI/mansrc/commands