Sample PETSc Floating Point Performance

  • Single Processor Floating Point Performance
  • Parallel Performance for Euler Solver
  • Scalability for Laplacian
  • We provide these floating point performance numbers as a guide to users to indicate the type of floating point rates they should expect while using PETSc. We have done our best to provide fair and accurate values but do not guarantee any of the numbers presented here.

    See the "Profiling" chapter of the PETSc users manual for instructions on techniques to obtain accurate performance numbers with PETSc


    Single Processor Performance

    In many PDE application codes one most solve systems of linear equations arising from the descretization of multicomponent PDEs, the sparse matrices computed naturally have a block structure.

    PETSc has special sparse matrix storage formats and routines to take advantage of that block structure to deliver much higher (two or three times as high) floating point computation rates. Below we give the floating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block size of three arising from a simple oil reservoir simulation.

    Embed here

    The next table depicts performance for the entire linear solve using GMRES(30) and ILU(0) preconditioning.

    Embed here

    These tests were run using the code src/sles/examples/tutorials/ex10.c with the options

    mpiexec -n 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_summary


    Parallel Performance for Euler Solver

    Scalability for Laplacian

    A typical "model" problem people work with in numerical analysis for PDEs is the Laplacian. Discretization of the Laplacian in two dimensions with finite differences is typically done using the "five point" stencil. This results in a very sparse (at most five nonzeros per row), ill-conditioned matrix.

    Because the matrix is so sparse and has no block structure it is difficult to get very good sequential or parallel floating point performance, especially for small problems. Here we demonstrate scalability of the parallel PETSc matrix vector product for the five point stencil on two grids. These were run on three machines: an IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and the Origin2000 at NCSA.

    Since PETSc is intended for much more general problems then the Laplacian we don't consider the Laplacian to be a particularlly important benchmark; we include it due to interest from the community.


    100 by 100 Grid: Absolute Time and Speed-Up

    100by100 grid

    Notes: The problem here is simply to small to parallelize on a distributed memory computer.

    1000 by 1000 Grid: Absolute Time and Speed-Up

    1000by1000 grid