Programs fail at startup


Up: In case of trouble Next: General Previous: IBM SP



Up: In case of trouble Next: General Previous: IBM SP


General


Up: Programs fail at startup Next: Workstation Networks Previous: Programs fail at startup

    1. Q: With some systems, you might see
    /lib/dld.sl: Bind-on-reference call failed  
        /lib/dld.sl: Invalid argument 
    
    (This example is from HP-UX).
    ld.so: libc.so.2: not found 
    
    (This example is from SunOS 4.1; similar things happen on other systems).

    A: The problem here is that your program is using shared libraries, and the libraries are not available on some of the machines that you are running on. To fix this, relink your program without the shared libraries. To do this, add the appropriate command-line options to the link step. For example, for the HP system that produced the errors above, the fix is to use -Wl,-Bimmediate to the link step. For SunOS, the appropriate option is -Bstatic.



Up: Programs fail at startup Next: Workstation Networks Previous: Programs fail at startup


Workstation Networks


Up: Programs fail at startup Next: Programs fail after starting Previous: General

    1.


    2. Q: Not all processes start.

    A: This can happen when using the ch_p4 device and a system that has extremely small limits on the number of remote shells you can have. Some systems using ``Kerberos'' (a network security package) allow only three or four remote shells; on these systems, the size of MPI_COMM_WORLD will be limited to the same number (plus one if you are using the local host).

    The only way around this is to try the secure server; this is documented in the mpich installation guide. Note that you will have to start the servers ``by hand'' since the chp4_servs script uses remote shell to start the servers.



Up: Programs fail at startup Next: Programs fail after starting Previous: General