Prometheus-1.x Users Guide
After installing and building the Prometheus library you are ready to use Prometheus. The library interface is defined below for C, C++, and FORTRAN.
The ./Test directory has the two test files ("ex1.C" and "ex2.c"). The first test problem uses the C++ interface and the second uses the C interface . These test programs are useful to clone for your interface with Prometheus (ex3.c is sample of the FORTRAN interface, written in C, and in HTML format only). ./Test also contains test scripts and a PETSc resource file that should be useful in getting up and running with Prometheus.Running jobsCurrent Prometheus limitations:
Known bugs:
- We currently support only first order hexahedra, tetrahedra, and quadrilateral shell elements when using the gemoetric method. Also, we have not tested mixed meshes with both hexahedra and tetrahedra.
- Only one(1) , three(3) , and six(6) degree of freedom (dof) per vertex problems have been tested with the gemoetric method.
- A maximum of seven(7) degrees of fredom per node is currently supported.
- All vertices must have the same number of degrees of freedom per vertex, though one should be able to simply add "dummy" dofs on vertices to "pad" the dofs on each vertex to the largest number of dofs per vertex in your problems, and "constrain" these dofs so that they do not fill off diagonal entries on the coarse grids.
- Only 3D problems are supported.
- MPI does not guarantee that blocking sends (without a corresponding receive) will return. Prometheus-1.x uses some of these blocking sends (although no blocking self sends) and thus is not guaranteed to work. All of the platforms that we have used work as we only send very small messages without a corresponding receive having been posted, but the MPI standard does not guarantee that this will work. Also, many MPI implementations have an environmental variable to specify the threshold for sending messages immediately (without hand shaking protocols with matching receives), as an environmental variable to specify the buffer size allocated interanally my MPI (if you find that Prometheus has hung on and MPI_Send then try running with fewer unknowns per processor or increasing this buffer size).
Prometheus has the option of using one of two multigrid methods:The only difference between these two methods is in the construction of the restriction operator R (or the prolongation operator P = RT) . We recommend that user start with smoothed aggregation of plain aggregation as our preliminary experiments show that it provides comparable performance (and perhaps a little better - a comprehensive comparison is forthcoming on my web page). plain aggregation it is also a simpler algorithm and we expect it to be more robust on highly complicated (pathological) meshes ("ex2" in the test suit is an example of such a mesh).
- "Geometric" multigrid: command line arguments begin with "" and "geomg_" will activate the geometric method.
- "Algebraic" (smoothed aggregation) multigrid: command line arguments begin with "aggmg_". Unsmoothed aggregation is the default method.
Prometheus methods
(C++ interface)
int Prometheus::Prometheus // simple constructor -- The C++ constructorPrometheus methods (C interface)int Prometheus::~Prometheus() // destructor -- The C++
int Prometheus::InitParameters() // should be called afer the constuctor (needed for the FEI implementation)
int Prometheus::InitLocal // provide local FE data
int Prometheus::InitGlobal // provide global graph data
int Prometheus::NewLevel // construct a new coarse grid
int Prometheus::Finalize // prepare a solution phase
int Prometheus::BuildGrids // calls PromNewLevel and PromFinalize
int Prometheus::UpdateGeometry // updates nodal coordinates for AMG methods
int Prometheus::Getx// Get the solution vector
int Prometheus::Getb// Get the RHS vector
int Prometheus::GetA// Get the stiffness matrix
int Prometheus::ZeroA() // Zero the stiffness matrix
int Prometheus::PreSolve // Construct coarse grid operators
int Prometheus::Solve // Solve for one RHS
int promCreate( void **prom ) // simple constructor -- N.B. user must call PetscInit()Prometheus methods (FORTRAN interface)int promDestroy( void *prom ) // destructor -- N.B. user must call PetscFinilize()
int InitParameters( void *prom )
int InitLocal( void *prom, ... ) // provide local FE data
int InitGlobal( void *prom, ... ) // provide global graph data
int promNewLevel( void *prom, ... ) // construct a new coarse grid
int promFinalize( void *prom, ... ) // prepare a solution phase
int promBuildGrids( void *prom, ... ) // calls PromNewLevel and PromFinalize
int promUpdateGeometry( void *prom, ... ) // updates nodal coordinates for AMG methods
int promGetx( void *prom, ... ) // Get the solution vector
int promGetb( void *prom, ... ) // Get the RHS vector
int promGetA( void *prom, ... ) // Get the stiffness matrix
int promZeroA( void *prom ) // Zero the stiffness matrix
int promPreSolve( void *prom, ... ) // Construct coarse grid operators
int promSolve( void *prom, ... ) // Solve for one RHS
The FORTRAN interface uses the same arguments as the C interface except that an integer*8 is provided instead of a void *. This is a machine pointer to a Prometheus object. Thus with the declarations: "integer*8 prom" and "integer ierr" the arguments are as in the C and C++ interface except "prom" is provided as the first argument and "ierr" is returned in the last argument.
prom_create( prom, ierr) // simple constructor -- N.B. user must call PetscInit()Method definitionsprom_destroy( prom, ierr ) // destructor -- N.B. user must call PetscFinilize()
prom_init_local( prom, ..., ierr) // provide local FE data
prom_init_parameters( prom, ierr)
prom_init_global( prom, ..., ierr ) // provide global graph data
prom_new_level( prom, ..., ierr ) // construct a new coarse grid
prom_finalize( prom, ..., ierr ) // prepare a solution phase
prom_build_grids( prom, ..., ierr ) // calls PromNewLevel and PromFinalize
prom_update_geometry( prom, ..., ierr ) // updates nodal coordinates for AMG methods
prom_get_x( prom, ..., ierr ) // Get the solution vector
prom_get_b( prom, ..., ierr ) // Get the RHS vector
prom_get_a( prom, ..., ierr ) // Get the stiffness matrix
prom_zero_a( prom, ierr ) // Zero the stiffness matrix
prom_pre_solv( prom, ..., ierr ) // Construct coarse grid operators
prom_solv( prom, ..., ierr ) // Solve for one RHS
int Prometheus::Prometheus();Example of Prometheus constructionint Prometheus::InitParameters();
InitParameters constructs internal variables from input parameters in PETSc's data base. This should be called immediately after construction. This is not called from the constructor to accommodate the FEI implementation.
int Prometheus::InitLocal
(
const int numnd, // number of vertices in local FE problem
const double coord[], // coord[numnd][3]: coordinates
const int lid_leq[numnd+1], // local equation number for each node (eg, 0,3,6,.., with three dof per node)
const int eq_id[], // eq_id[numeq] local equation number for local dof (optional for FEI users), negative id for equations with Direchlet boundary conditions, numeq = lid_leq[numnd].
const int numel, // number of elements in local FE problem
const int nen, // number of vertices (max) per element
const int *elems, // elems[numel][nen]: local node numbers for elements
const int matids[numel] // material IDs for elements
);InitLocal provides Prometheus with the "local" finite element problem for each processor p. This assumes that the application has partitioned the original finite element problem so that each vertex is assigned to one, and only one, processor - this partitioning defines a local vertex set for each processor p - VpL . Elements must also be partitioned across the processors to define a local set of elements for each processor p - EpL. Define the extended set of elements EpE as all elements that have at least one vertex partitioned to processor p. We do not explicitly use EpL and will assume that it is a subset of EpE. Finally define the extended vertex set VpE to be the set of vertices in the elements in EpE
Now numnd = |V pE | with - the vertices in VpL order first. And, numel = |VpE|. N.B. if each processor p computes the element stiffness matrix and residuals for each element in EpE, then no communication is required to form the global stiffness matrix (A) and residual (b), at the expense of computing elements redundantly (we find that this very worthwhile in our applications).
elems[numel][nen]: is a 2D array of local, zero(0) based, indices of the vertices for each element in EpE . Note, If some elements have fewer than nen vertices then the row should be padded with zeros.eq_id[] the local one(1) based equation number of the local degrees of freedom on each processor (with a negative number for constrained degrees of freedom). The primary purpose of this is to simply provide a flag for Dirichlet boundary conditions. Our application - Athena - uses eq_id to provide for the mapping of PETSc's local degrees of freedom to the compressed vectors used by our (serial) finite element application FEAP. Prometheus library users only need to provide a negative number for fixed degrees of freedom. For instance, if the problem has three vertices with 3 degrees of freedom per vertex, and the first vertex has its "x" and "y" dof constrained and last vertex has its "z" dof constrained then eq_id could be [-1 -2 1 2 3 4 5 6 -3] or simply [-1 -1 1 1 1 1 1 1 -1] .
int Prometheus::InitGlobal
(
const int npLoc, // = |VpL| from above
const int npGst, // = |VpE| - |VpL| from above
const int *proc_gnode, // proc_gnode[np+1 ] first global (PETSc) ID for each processor, proc_gnode[np]=total number of global vertices
const int *ghost_gnode, // ghost_gnode[npGst] global (PETSc) ID of ghost vertices
const int *ghost_proc=NULL // ghost_proc[npGst] processors ID of ghost vertices -- can be derived from ghost_gnode and proc_gnode
);InitGlobal provides Prometheus with global data to "tie" the whole problem together. There are three(3) types of vertex IDs in Prometheus: the local ID used to index into arrays such as those provided above in InitLocal(), the PETSc global ID which are defined by the (block row) matrix partitioning (i.e. the local vertices on processor 0 are number first, then the local vertices on processor 1, and so on), and the global indices of the original (pre partitioned) problem. Note, the library interface does not explicitly use pnode_fnode, this is just an artifact of our parallel finite element application that is required as we use it for error detection within Prometheus. npLoc+npGst = numnd (from above), and np is the number of processors. pnode_fnode holds the local size information (npLoc) for all of the other processors, and pnode_fnode[np] = the total number of global vertices - thus, pnode_fnode[p+1] - pnode_fnode[p] = npLoc for processor p. ghost_gnode[] provides the global indices of the "ghost" vertices (VpE - VpL), and ghost_proc provides the processor ID of each ghost vertex (note, ghost_proc could be computed with ghost_gnode and proc_gnode).
N.B. This particular interface (InitLocal and InitGlobal arguments) comes from the structure of our parallel finite element code Athena and our serial code FEAP. For the user interface. we have taken care so that after InitGlobal is called the user may "free" the memory of the array arguments (to InitLocal and InitGlobal) above - also InitLocal must be called before InitGlobal.
int Prometheus::NewLevel( int *nnodes );
PromNewLevel creates a new coarse grid and returns the number of global nodes (in nnodes) of the new grid. If *nnodes equals the number of nodes on the last grid, then Prometheus decided that the current "top" grid as coarse as one should coarsen and all further calls to PromNewLevel will not have any effect (the heuristic used for this can be found in Prometheus::NewLevel in ./src/prometheus.C.
PromFinalize should be called after all coarse grids have been created with PromNewLevel. Prometheus is now ready to assemble the stiffness matrix and RHS.
int Prometheus::BuildGrids( int nlevels = 6 );
PromBuildGrids is a wrapper for PromNewLevel and PromFinalize, and takes one argument, with a default value of 6, for the number of coarse grids to construct (again, if Prometheus decides that a current "top" grid is as coarse as one should coarsen then less than "nlevels" grids will be constructed.
int Prometheus::UpdateGeometry
(
const double * const coords, // coords[numnd][3] -- inital coordinates of nodes
const double * const deltas // deltas[numnd][3] -- displacements of nodes
);Updates coordinates of vertices with increment for Prometheus initial prolongators. Set new coordinates with coords[] + deltas[].
int Prometheus::Getx( PromVector **ret );
int Prometheus::Getb( PromVector **ret );
int Prometheus::GetA( PromVector **ret );
Getx, Getb, and GetA provide access to PETSc's objects for x, b, and A, respectively in Ax=b. The user application is responsible for assembling b and A, with the PETSc user interface. The current interface requires that the application does not call the PETSc MatAssemblyBegin() and MatAssemblyEnd() routines - but does finalize the RHS vector (b).
Prometheus::PreSolve
(
const int nstep, // zero based "time step"
const int niter // zero based non-linear (Newton) iteration
);PreSolve is called after A and b have been assembled by the user application and makes the coarse grid operators and prepare to solve Ax=b for x.
int Prometheus::Solve
(
const int niter, // zero based non-linear (Newton) iteration
double *energy // energy (xTb)
);Solve calls the final PETSc assemble routines, and solves the system of equations. Getx can now be used to get the solution.
Sample finite element problem:
Processor 0:
ierr = prometheus->InitLocal(
numnd, /*
= 4 */
coord, //
= [ [0. 0. 0.] [0. 1. 0.] [1. 0. 0.] [0. 0. 1.] ]
lid_leq,
//
= [ 0 3 6 12 ]
eq_id,
//
= [ -1 -2 -3 1 2 -4 3 4 -5 5
6 7 ]
numel, /*
= 1 */ nen, /*
= 4 */ elems, //
= [ 0 2 1 3 ]
matids /*
= [ 1 ] */ ); CHKERRQ(ierr);
ierr = prometheus->InitGlobal(
npLoc,
/*
= 2 */ npGst,
//
= 2
pnode_fnode,
//
= [ 0 2 1 3 ]
proc_gnode, //
= [0 2 4]
ghost_gnode,
//
= [ 2 3 ]
ghost_proc /*
= [ 1 1 ] */); CHKERRQ(ierr);
Processor 1:
ierr = prometheus->InitLocal(
numnd, /*
= 4 */
coord, //
= [ [1. 0. 0.] [0. 0. 1.] [0. 0. 0.] [0. 1. 0.] ]
lid_leq,
//
= [ 0 3 6 12 ]
eq_id, //
= [ 1 2 -1 3 4 5 -2 -3 -4 6 7
-5 ]
numel, /*
= 1 */ nen, /*
= 4 */ eq_id, //
= [ 1 2 -1 3 4 5 -2 -3 -4 6 7
-5 ]
atids /*
= [ 1 ] */ ); CHKERRQ(ierr);
ierr = prometheus->InitGlobal(
npLoc,
/*
= 2 */ npGst,
//
= 2
pnode_fnode,
//
= [ 1 3 0 2 ]
proc_gnode, //
= [0 2 4]
ghost_gnode,
//
= [ 0 1 ]
ghost_proc /*
= [ 0 0 ] */); CHKERRQ(ierr);
Note, tetrahedra are number (in "elems") with a "right hand rule" of numbering, and hexahedra are numbered as shown here.
Dirichlet boundary conditions effect the structure of the stiffness matrix that Prometheus constructs (and that the application needs to fill). For efficiency, Prometheus must preallocate the matrix (by giving PETSc the number of nonzeros per row); to insure that the user and Prometheus are on the same page with regard to global indices we set the PETSc matrix to abort if a new nonzero entry is added to the matrix. Thus, it is essential that the user understand how the nonzero structure of the matrix is effected by Dirichlet boundary conditions. Prometheus will not allocate space for off diagonal terms of vertices that have all Dirichlet boundary conditions (eg, global vertex 0 in the example above). Prometheus will allocate storage for vertices that have only partial Dirichlet boundary conditions (eg, vertices 1 and 2 in the example above) - therefor the user must insure that nonzero values are not inserted into the matrix for degrees of freedom that have been restrained by Dirichlet boundary conditions. The figure below shows the nonzero structure of the global PETSc matrix for our our sample problem. Note, PETSc will have allocated the full 3 by 3 blocks for those blocks with at least one nonzero in them, and so the total number of nonzeros allocated by PETSc will be 90 (ten 3 by 3 blocks).
See test Prometheus programs in Prometheus-1.x/Test/ex1.C and Prometheus-1.x/Test/ex2.c for concrete examples for adding element stiffness matrices the PETSc global stiffness matrix, and the overall structure of a Prometheus program (we suggest using one of these sources to "clone" your application interface). Also see here for a sample of the FORTUNE interface (written in C).