Home:ALL Converter>How to programmatically detect the number of cores and run an MPI program using all cores

How to programmatically detect the number of cores and run an MPI program using all cores

Ask Time:2017-01-05T23:44:42         Author:Jason

Json Formatter

I do not want to use mpiexec -n 4 ./a.out to run my program on my core i7 processor (with 4 cores). Instead, I want to run ./a.out, have it detect the number of cores and fire up MPI to run a process per core.

This SO question and answer MPI Number of processors? led me to use mpiexec.

The reason I want to avoid mpiexec is because my code is destined to be a library inside a larger project I'm working on. The larger project has a GUI and the user will be starting long computations that will call my library, which will in turn use MPI. The integration between the UI and the computation code is not trivial... so launching an external process and communicating via a socket or some other means is not an option. It must be a library call.

Is this possible? How do I do it?

Author:Jason,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/41489038/how-to-programmatically-detect-the-number-of-cores-and-run-an-mpi-program-using
Hristo Iliev :

This is quite a nontrivial thing to achieve in general. Also, there is hardly any portable solution that does not depend on some MPI implementation specifics. What follows is a sample solution that works with Open MPI and possibly with other general MPI implementations (MPICH, Intel MPI, etc.). It involves a second executable or a means for the original executable to directly call you library provided some special command-line argument. It goes like this.\n\nAssume the original executable was started simply as ./a.out. When your library function is called, it calls MPI_Init(NULL, NULL), which initialises MPI. Since the executable was not started via mpiexec, it falls back to the so-called singleton MPI initialisation, i.e. it creates an MPI job that consists of a single process. To perform distributed computations, you have to start more MPI processes and that's where things get complicated in the general case.\n\nMPI supports dynamic process management, in which one MPI job can start a second one and communicate with it using intercommunicators. This happens when the first job calls MPI_Comm_spawn or MPI_Comm_spawn_multiple. The first one is used to start simple MPI jobs that use the same executable for all MPI ranks while the second one can start jobs that mix different executables. Both need information as to where and how to launch the processes. This comes from the so-called MPI universe, which provides information not only about the started processes, but also about the available slots for dynamically started ones. The universe is constructed by mpiexec or by some other launcher mechanism that takes, e.g., a host file with list of nodes and number of slots on each node. In the absence of such information, some MPI implementations (Open MPI included) will simply start the executables on the same node as the original file. MPI_Comm_spawn[_multiple] has an MPI_Info argument that can be used to supply a list of key-value paris with implementation-specific information. Open MPI supports the add-hostfile key that can be used to specify a hostfile to be used when spawning the child job. This is useful for, e.g., allowing the user to specify via the GUI a list of hosts to use for the MPI computation. But let's concentrate on the case where no such information is provided and Open MPI simply runs the child job on the same host.\n\nAssume the worker executable is called worker. Or that the original executable can serve as worker if called with some special command-line option, -worker for example. If you want to perform computation with N processes in total, you need to launch N-1 workers. This is simple:\n\n(separate executable)\n\nMPI_Comm child_comm;\nMPI_Comm_spawn(\"./worker\", MPI_ARGV_NULL, N-1, MPI_INFO_NULL, 0,\n MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);\n\n\n(same executable, with an option)\n\nMPI_Comm child_comm;\nchar *argv[] = { \"-worker\", NULL };\nMPI_Comm_spawn(\"./a.out\", argv, N-1, MPI_INFO_NULL, 0,\n MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);\n\n\nIf everything goes well, child_comm will be set to the handle of an intercommunicator that can be used to communicate with the new job. As intercommunicators are kind of tricky to use and the parent-child job division requires complex program logic, one could simply merge the two sides of the intercommunicator into a \"big world\" communicator that replaced MPI_COMM_WORLD. On the parent's side:\n\nMPI_Comm bigworld;\nMPI_Intercomm_merge(child_comm, 0, &bigworld);\n\n\nOn the child's side:\n\nMPI_Comm parent_comm, bigworld;\nMPI_Get_parent(&parent_comm);\nMPI_Intercomm_merge(parent_comm, 1, &bigworld);\n\n\nAfter the merge is complete, all processes can communicate using bigworld instead of MPI_COMM_WORLD. Note that child jobs do not share their MPI_COMM_WORLD with the parent job.\n\nTo put it all together, here is a complete functioning example with two separate program codes.\n\nmain.c\n\n#include <stdio.h>\n#include <mpi.h>\n\nint main (void)\n{\n MPI_Init(NULL, NULL);\n\n printf(\"[main] Spawning workers...\\n\");\n\n MPI_Comm child_comm;\n MPI_Comm_spawn(\"./worker\", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,\n MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);\n\n MPI_Comm bigworld;\n MPI_Intercomm_merge(child_comm, 0, &bigworld);\n\n int size, rank;\n MPI_Comm_rank(bigworld, &rank);\n MPI_Comm_size(bigworld, &size);\n printf(\"[main] Big world created with %d ranks\\n\", size);\n\n // Perform some computation\n int data = 1, result;\n MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);\n data *= (1 + rank);\n MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld);\n printf(\"[main] Result = %d\\n\", result);\n\n MPI_Barrier(bigworld);\n\n MPI_Comm_free(&bigworld);\n MPI_Comm_free(&child_comm);\n\n MPI_Finalize();\n printf(\"[main] Shutting down\\n\");\n return 0;\n}\n\n\nworker.c\n\n#include <stdio.h>\n#include <mpi.h>\n\nint main (void)\n{\n MPI_Init(NULL, NULL);\n\n MPI_Comm parent_comm;\n MPI_Comm_get_parent(&parent_comm);\n\n int rank, size;\n MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n MPI_Comm_size(MPI_COMM_WORLD, &size);\n printf(\"[worker] %d of %d here\\n\", rank, size);\n\n MPI_Comm bigworld;\n MPI_Intercomm_merge(parent_comm, 1, &bigworld);\n\n MPI_Comm_rank(bigworld, &rank);\n MPI_Comm_size(bigworld, &size);\n printf(\"[worker] %d of %d in big world\\n\", rank, size);\n\n // Perform some computation\n int data;\n MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);\n data *= (1 + rank);\n MPI_Reduce(&data, NULL, 1, MPI_INT, MPI_SUM, 0, bigworld);\n\n printf(\"[worker] Done\\n\");\n MPI_Barrier(bigworld);\n\n MPI_Comm_free(&bigworld);\n MPI_Comm_free(&parent_comm);\n\n MPI_Finalize();\n return 0;\n}\n\n\nHere is how it works:\n\n$ mpicc -o main main.c\n$ mpicc -o worker worker.c\n$ ./main\n[main] Spawning workers...\n[worker] 0 of 2 here\n[worker] 1 of 2 here\n[worker] 1 of 3 in big world\n[worker] 2 of 3 in big world\n[main] Big world created with 3 ranks\n[worker] Done\n[worker] Done\n[main] Result = 6\n[main] Shutting down\n\n\nThe child job has to use MPI_Comm_get_parent to obtain the intercommunicator to the parent job. When a process is not part of such a child job, the returned value will be MPI_COMM_NULL. This allows for an easy way to implement both the main program and the worker in the same executable. Here is a hybrid example:\n\n#include <stdio.h>\n#include <mpi.h>\n\nMPI_Comm bigworld_comm = MPI_COMM_NULL;\nMPI_Comm other_comm = MPI_COMM_NULL;\n\nint parlib_init (const char *argv0, int n)\n{\n MPI_Init(NULL, NULL);\n\n MPI_Comm_get_parent(&other_comm);\n if (other_comm == MPI_COMM_NULL)\n {\n printf(\"[main] Spawning workers...\\n\");\n MPI_Comm_spawn(argv0, MPI_ARGV_NULL, n-1, MPI_INFO_NULL, 0,\n MPI_COMM_SELF, &other_comm, MPI_ERRCODES_IGNORE);\n MPI_Intercomm_merge(other_comm, 0, &bigworld_comm);\n return 0;\n }\n\n int rank, size;\n MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n MPI_Comm_size(MPI_COMM_WORLD, &size);\n printf(\"[worker] %d of %d here\\n\", rank, size);\n MPI_Intercomm_merge(other_comm, 1, &bigworld_comm);\n return 1;\n}\n\nint parlib_dowork (void)\n{\n int data = 1, result = -1, size, rank;\n\n MPI_Comm_rank(bigworld_comm, &rank);\n MPI_Comm_size(bigworld_comm, &size);\n\n if (rank == 0)\n {\n printf(\"[main] Doing work with %d processes in total\\n\", size);\n data = 1;\n }\n\n MPI_Bcast(&data, 1, MPI_INT, 0, bigworld_comm);\n data *= (1 + rank);\n MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld_comm);\n\n return result;\n}\n\nvoid parlib_finalize (void)\n{\n MPI_Comm_free(&bigworld_comm);\n MPI_Comm_free(&other_comm);\n MPI_Finalize();\n}\n\nint main (int argc, char **argv)\n{\n if (parlib_init(argv[0], 4))\n {\n // Worker process\n (void)parlib_dowork();\n printf(\"[worker] Done\\n\");\n parlib_finalize();\n return 0;\n }\n\n // Main process\n // Show GUI, save the world, etc.\n int result = parlib_dowork();\n printf(\"[main] Result = %d\\n\", result);\n parlib_finalize();\n\n printf(\"[main] Shutting down\\n\");\n return 0;\n}\n\n\nAnd here is an example output:\n\n$ mpicc -o hybrid hybrid.c\n$ ./hybrid\n[main] Spawning workers...\n[worker] 0 of 3 here\n[worker] 2 of 3 here\n[worker] 1 of 3 here\n[main] Doing work with 4 processes in total\n[worker] Done\n[worker] Done\n[main] Result = 10\n[worker] Done\n[main] Shutting down\n\n\nSome things to keep in mind when designing such parallel libraries:\n\n\nMPI can only be initialised once. If necessary, call MPI_Initialized to check if the library has already been initialised.\nMPI can only be finalized once. Again, MPI_Finalized is your friend. It can be used in something like an atexit() handler to implement a universal MPI finalisation on program exit.\nWhen used in threaded contexts (usual when GUIs are involved), MPI must be initialised with support for threads. See MPI_Init_thread.\n",
2017-01-06T13:01:02
yy