Using MPI with the Trident LightweightExecutor to execute a workflow on HPC Server cluster

Jul 22, 2011 at 8:41 AM

Hi,

I have a workflow which we run many instances of simultaneously on a HPC Server cluster.  We execute the workflow using the LightweightExecutor application with arguments that reference the workflow id (the same guid for all instances), and also the workflow parameters file (the xml file).  So we basically run many instances of the 1 workflow which are all parameterised slightly differently.  We now would like to be able to communicate between these workflow instances within one of the activities using Message Passing Interface.

How should we go about incorporating MPI into our activity?, and also be able to execute the workflow with the LightweightExecutor (with probably mpiexec?) whilst still having all our workflow instances parametrized differently.

 

Jamie

Jul 22, 2011 at 12:06 PM

Hi,

Thanks for showing interest in Trident.

We are looking into this issue, along with other workaround. We will keep you posted on the updates.

-Regards

Jul 25, 2011 at 12:11 PM

Hi,

Thanks for your patience Jamie.

For the sake of clarity I have divided the response in 3 parts.

1) HPC and Trident

I think you are quite clear with this, on how the Trident works with HPC and how to run a Trident workflow on the cluster. Please see this small document on Trident HPC

2) Communicating between the workflows from an activity.

You can implement the MPI for cross workflows communications from an activity. There is a C# library for MPI which you can find at MPI.NET and Pure MPI.NET.

3) Apart from this, I suggest you to use DryadLINQ if this meets your criteria. Given below are some links to DryadLINQ resources.

Forum ( http://social.msdn.microsoft.com/Forums/en-US/dryad/threads ).
Sample Program (http://research.microsoft.com/apps/pubs/default.aspx?id=66811 )
DryadLINQ research site (http://research.microsoft.com/en-us/projects/dryadlinq/ )

Please let me know if this you need any further information.

-Regards

 

Jul 26, 2011 at 2:53 AM

Hi thanks for that,

We have a good understanding of how trident works with HPC server for embarrassingly parallel problems, but maybe not so much when data may need to be passed between machines.

I think if we were to use MPI in our workflows we would probably use one of the two libraries you mention above.  The thing I don't quite understand is how a trident workflow could actually be executed using MPI.  It would be preferable to have the workflow run with the lightweight executor on compute nodes so that all libraries etc are fetched from the central registry, and the monitoring is still performed by trident.  However to run an MPI enabled application the program must be run using the mpiexec application.  It is not clear how a workflow should be invoked using both of these applications (mpiexec and lightweightexecutor)

ie We currently run two instances of a workflow in parallel on 2 nodes of a cluster in the following way

LightweightExecutor.exe -wf aaaaaaaa-bbbb-cccc-dddddddddddd -input parameters1.xml, and

LightweightExecutor.exe -wf aaaaaaaa-bbbb-cccc-dddddddddddd -input parameters2.xml

So if we wanted to communicate between these workflow instances using MPI, firstly I assume the activity (doing the communicating) would need to be MPI enabled, and secondly we would need to launch the workflow instances using the mpiexec application.  I am thinking that launching the workflow using the MPI MIMD approach may not work in this case since each process needs to be parametrized differently (with the different workflow parameters file) and I don't believe MIMD will support this.  However a MPMD approach may work if each workflow instance is considered as a different program? I guess I am after more clarity how to use both mpiexec and lightweightexecutor together.

 

DyradLINQ looks really interesting... we have been familiar with it for some time and would like to test drive it sometime in the future!

Jul 26, 2011 at 1:21 PM

Hi,

Since, this was my first encounter with MPI, I am not very sure on how to carry out the execution of the workflows using mpiexec. I need to dig a bit deeper into the MPI details to further suggest you on this issue. It will take some time.

As per my understanding now, you can certainly write a batch file to accomplish your task if the mpiexec supports this facility.

Another option of doing the same is running a workflow from within an activity using LightweightExecutor. You can invoke the LightweightExecutor from an activity by providing the necessary command line inputs. You can run as many instances as you want using loop. The main workflow of which this activity is a part of can be run be run using mpiexec. The “Parallel Activity” which can be downloaded from (Sample for showcasing Parallel activity) can also help you in this scenario. Apart from this, VS2010 provides parallel activity.

Please let me know if this helps you in achieving your goals.

Regards