Performance Evaluation

Главная > Boost > Boost 1.36.0

Задать вопрос на форуме...

Home

Libraries

People

FAQ

Message-passing performance is crucial in high-performance distributed computing. To evaluate the performance of Boost.MPI, we modified the standard NetPIPE benchmark (version 3.6.2) to use Boost.MPI and compared its performance against raw MPI. We ran five different variants of the NetPIPE benchmark:

MPI: The unmodified NetPIPE benchmark.
Boost.MPI: NetPIPE modified to use Boost.MPI calls for communication.
MPI (Datatypes): NetPIPE modified to use a derived datatype (which itself contains a single MPI_BYTE) rathan than a fundamental datatype.
Boost.MPI (Datatypes): NetPIPE modified to use a user-defined type Char in place of the fundamental char type. The Char type contains a single char, a serialize() method to make it serializable, and specializes is_mpi_datatype to force Boost.MPI to build a derived MPI data type for it.
Boost.MPI (Serialized): NetPIPE modified to use a user-defined type Char in place of the fundamental char type. This Char type contains a single char and is serializable. Unlike the Datatypes case, is_mpi_datatype is not specialized, forcing Boost.MPI to perform many, many serialization calls.

The actual tests were performed on the Odin cluster in the Department of Computer Science at Indiana University, which contains 128 nodes connected via Infiniband. Each node contains 4GB memory and two AMD Opteron processors. The NetPIPE benchmarks were compiled with Intel's C++ Compiler, version 9.0, Boost 1.35.0 (prerelease), and Open MPI version 1.1. The NetPIPE results follow:

netpipe

There are a some observations we can make about these NetPIPE results. First of all, the top two plots show that Boost.MPI performs on par with MPI for fundamental types. The next two plots show that Boost.MPI performs on par with MPI for derived data types, even though Boost.MPI provides a much more abstract, completely transparent approach to building derived data types than raw MPI. Overall performance for derived data types is significantly worse than for fundamental data types, but the bottleneck is in the underlying MPI implementation itself. Finally, when forcing Boost.MPI to serialize characters individually, performance suffers greatly. This particular instance is the worst possible case for Boost.MPI, because we are serializing millions of individual characters. Overall, the additional abstraction provided by Boost.MPI does not impair its performance.