\documentclass{article}
\usepackage{cgw06}
\usepackage{epsf}


\title{Molecular Dynamics on Parallel Computers}

\author{W. Alda\inst{1}, K. Boryczko\inst{1}, M. Bubak\inst{1}, 
        W. Dzwinel\inst{1}, J. Kitowski\inst{1,2}, 
        J. Mo\'sci\'nski\inst{1,2} and  M. Pogoda\inst{1}}

\institute{Institute of Computer Science,
           AGH, al. Mickiewicza 30, 30-059, Krak\'ow, Poland 
\and
Academic Computer Center  CYFRONET, ul. Nawojki 11, 30-950 Krak\'ow, 
                                                   Poland\\
\emph{email:\/} \texttt{[bubak,kito]@uci.agh.edu.pl}\\
\emph{phone:\/} (+48 12) 617 3964, \ \ \  {\em fax:\/} (+48 12) 338 054 }  


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}
\maketitle

%%%%%%%%%%%%%%%%%%%%%%%%%%% abstract  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{abstract}
In the paper we present  experiments  with parallel  molecular  dynamics
algorithms  and programs for  simulation of systems with large number of
particles and for feature extraction in pattern recognition.
In the first two algorithms  Lennard-Jones systems are incorporated.  In
the 3-D  ({\sl  Pipe  Method})  domain  decomposition and  data-parallel
paradigm are applied.
In the second 2-D  algorithm  ({\sl 2-D MD}) domain  decomposition  with
distributed  neighbor (Verlet) and link (Hockney) lists are applied.  In
the third  algorithm  ({\sl FEA}) a kind of the harmonic  potential and
all-pair  interactions are introduced and parallelization is implemented
with both message-passing and data-parallel programming models.
All  programs  are  equipped  with  dynamic  load  balancing   based  on
adaptive   repartitioning   of  the   computational   box  during  the
simulation.
\end{abstract}

%%%%%%%%%%%%%%%%%%%%%%%%%% introduction %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Introduction}

One of the fast developing approaches to material science is application
of particle  simulation,  mainly molecular dynamics (MD) method to solve
the  problems of origin of  different  instabilities  and others, out of
reach of classical  models based on flow  continuity of matter,  momenta
and   energy.   Some   early   results   of   such   simulations   (e.g.
\cite{rap1}-\cite{rap3})  have been  followed by other works  concerning
sedimentation  \cite{alda1}  and simulation of macro-scale  phenomena of
intrinsically  discontinuous nature like fracture \cite{mora} as well as
others like convection, explosions, formation of cracks, deformation and
penetration    in   materials    of    non-liquid    properties    (e.g.
\cite{alda2,pc96})  showing  requirements  for large  number  of  ``{\sl
molecules}''   involved  in  realistic   simulations.  More  recently  a
scalable  multicell  MD code  has  been  implemented  on the  CM5 and it
demonstrated that MD simulations with $10^8+$ molecules are now feasible
\cite{beazley,lom1}.


In the paper we present our experiments  with parallel MD algorithms and
programs  for  simulation  of  system  dynamics  with  large  number  of
particles and for feature  extraction  for general  pattern  recognition
method.


%%%%%%%%%%%%%%%%%%%%% Characteristics of parallel algorithms %%%%%%%%%%%

\section{Characteristics of parallel algorithms}


\subsection{Pipe Method}

The  computational  box applied in the study is a 3-D long cylinder.  It
is useful for  studies of fluids and  mixtures  in  microcapillaries  or
growth of optical  fibers.  The algorithm was  originally  developed for
vector computers \cite{pip1}.  Short-range  interactions defined by 12/6
Lennard-Jones  pair  potential  are used with  leap-frog  algorithm  for
solving  Newtonian  equations of motion.  Simulation  of  microcanonical
ensemble is performed.

Periodic boundary conditions are introduced along the cylinder axis only
(in {\sl z}-direction).  Since the number of neighbors  interacting with
a given molecule is nearly  constant, the integer cutoff number $n_C$ is
introduced,   where  $n_C$  is  a  number  of   neighbours   interacting
potentially with a given particle.  Particles in the cylinder are sorted
due to their  {\sl z}  coordinates  and the  index  vector  is set up on
returning to original  particles  indices.  For forces  calculation  the
computational  cylinder  is  stepwise  shifted in respect to its copy to
subsequent neighbouring particles from 1 to $n_C$.



\subsection{2-D MD program}

Calculation  of forces is always the most time  consuming part of any MD
program.  In the 2-D MD program we have applied the Verlet neighbor-list
\cite{gupta}  to  speed  up  the  evaluation  of  interactions   between
molecules  as well as Hockney  link-list  and sorted  lists to build the
neighbor  list.  The  neighbor  list  is  reconstructed   automatically.
Equations of motion are solved using leap-frog algorithm.  Tuning of the
sequential  program  consisted  in  referencing  large global  arrays by
locally declared pointers,  assigning local variables to frequently used
global ones and removing the effect of the cache on the  performance  by
avoiding  allocations of arrays of a size close to multiplicity of cache
line   length  and   sorting  of   particle   data   during   simulation
.....


\subsection{Feature extraction by molecular dynamics}

Feature  extraction  (mapping) is such a transformation of N-dimensional
space to  low-dimensional  $E^3$ space which  preserves  in the best way
mutual  distances  between  cluster  members.  This may be  achieved  by
finding  the  global   minimum  of  the  Sammon's   criterion   function.
The quality of the  transformation  is a distance  between
the global  minimum to that found by  mapping.  Particles
interact via $V_{ij}$ particle-particle pair potential

\begin{equation}
      V_{ij} = k(r_{ij}^2-a_{ij}^2)^2, 
\end{equation}
 
where: 

$k$ - the stiffness factor, $k > 0$,
 
$r_{ij}$ - the distance between particles $i$ and $j$ in ${E^3}$,

$a_{ij}$ - the distance between particles $i$ and $j$ in ${R^N}$.  
 
The sum of potentials for each pair of $(i,j)$ gives the total potential
energy of particles  system $E_p$.  Minimization of $E_p$ represents the
Sammon's  criterion.  The  system  evolves  according  to the  Newtonian
equations of motion.  The transient total force acting on a
single  particle  $i$ is equal to the gradient of potential  energy.  To
reduce the  numerical  oscillation  and assure  energy  dissipation  the
friction force is introduced.  This is the principal  condition to reach
$E_p$ minimum ....

%%%%%%%%%%%%%%%%%%%%%%%%% Computers and environments  %%%%%%%%%%%%%%%%%%

\section{Computers and environments}

\begin{table}[htb]
\centering
\begin{tabular}{lllll}
\hline
\hline
             &        &                & System:              &         \\
System       &  Nodes & Inter-         & version and          & Options \\
name         &        & connection     & environment          &         \\
\hline
HP/Convex    &  32    & two-layer --   & SPP-UX:              &         \\
SPP1200/XA   &        & crossbar/      & 3.2.129, C (6.4),    & -O2     \\
SPP1600/XA   &        & 1D torus       &  ConvexPVM (3.3.10)  &         \\
\hline
Intel        & 98     & 2D mesh        & OSF/1:               &         \\
Paragon XP/S &        &                & 1.0.4, C, NX-2 (R4.5)& -O,     \\
             &        &                &                      & -Knoieee\\
             &        &                & 1.3.3, Fortran77     & -O4     \\
             &        &                &                      & -Knoieee\\
\hline
TMC          & 896    & data/control   & C Most               &         \\
CM5          &        & network        & CM Fortran 2.1/2.2   & -O      \\
             &        & fat/binary tree&                      &         \\
\hline
CRAY         & 512    & 3-D torus      & UNICOS 8.0.3         &         \\
T3D          &        & latency hiding & C 4.0.5,             &         \\
             &        &                & PVM Cray v. 3.0.0.2  &         \\             
\hline
IBM RS6K/320 &   2    & Ethernet       & AIX 3.2.4            &         \\
SUN          &        &                &                      &         \\
SparcStation2& 2      & network        & SunOS 4.1            &  -O     \\
             &        &                & C, PVM (3.3.8)       &         \\
\hline
\hline
\end{tabular}
\caption{Comparison of parallel computer systems and environments.}
\label{tab1}
\end{table}

Main feature of three NUMA multiprocessor computers applied in the study
are summarized in Tab. \ref{tab1}.

%%%%%%%%%%%%%%%%%%%%%% Program performance %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Program performance}

\subsection{The PIPE method}

In Fig.  \ref{WCtime} execution wall-clock time, $\tau$, per MD timestep
and  particle is  presented  for the Pipe  Method for the both models of
programming:  domain  decomposition  and  data-parallel.  For the domain
decomposition  paradigm, for which ...

\subsection{2-D MD parallel program}

Wall-clock $\tau$ of the parallel program on homogeneous clusters of the
SUN SPARCstation  IPX, IBM RS/6000-320 and CONVEX Exemplar SPP 1200, SPP
1600  and  Cray  T3D  is  presented  in  Fig.  \ref{PARTIM}.


\begin{figure}[htb]
\centering
\begin{minipage}{5.8cm}
\epsfxsize=5.8cm
\epsfbox{fig1.eps}
 \caption{Execution time for different number of computing nodes, $K$,
          for  PIPE  algorithm on network of workstations 
         (NW), CM5 (CM), Paragon (PN) and SPP1200 (CX).}
\label{WCtime}
\end{minipage}
\hspace{0.2cm}
\begin{minipage}{5.8cm}
\vspace*{0.48cm}
\epsfxsize=5.8cm
\epsfysize=7.3cm
\epsfbox{fig2.eps}
%\vspace*{0.4cm}
\caption{Execution time for the parallel 2-D MD program
          on virtual network computer and on Exemplar \mbox{SPP 1200},
          \mbox{SPP 1600} and Cray T3D; 
          constant size problem -- $131072$ particles.}
\label{PARTIM}
\end{minipage}
\end{figure}


Timings  were done for the constant  size  problem,  i.e.  the number of
particles  indicated in Fig.  \ref{PARTIM} is the total
number of particles in the computational box.
Calculations on SPP 1200 and SPP 1600 were done on dedicated  subcomplex
with 24 and 16 processors, respectively. 


%%%%%%%%%%%%%%%%%%%%%%%% Sample results %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Sample results} 


MD  application  is  especially  valuable  when  the  fluctuations  play
predominant  role  in the  system  evolution,  e.g.  the  hydrodynamical
phenomena  driven  by  Rayleigh  -  Taylor  type of  instabilities.  The
process of mixing of two  fluids:  heavier one placed  above and lighter
on the bottom is a good example.  

The system  consists of two kinds of  particles  which form  lighter and
heavier  fluids.  The  particles  are placed in a 2-D  rectangular  box.
Periodic  boundary  conditions  are  applied to  vertical  edges,  while
horizontal  edges  represent  reflecting  walls.  Initially, the heavier
fluid occupies the upper half of the box.  Due to external gravity field
acting downwards on every particle  the motion of both layers starts.

\begin{figure}[htb]
\centering
\begin{minipage}{\textwidth}
\centering
\fbox{
\epsfxsize=9.5cm
\epsfbox{fig3.eps}
}
 \caption{Simulation of mixing of two fluids.}
\label{BANKI}
\end{minipage} 
\end{figure}

The  MD  simulations (see  Fig.\ref{BANKI})   have  been  performed  for
$2.8\times  10^6$  particles on 128 processors of Cray T3D (at Minnesota
Supercomputer Center).  In Fig.\ref{BANKI} one may observe the situation
after $10^5$ timesteps.

%%%%%%%%%%%%%%%%%%%%%%%%%%% Concluding remarks %%%%%%%%%%%%%%%%%%%%%%%%%

\section{Concluding remarks}

We have  elaborated two parallel  2-D/3-D  short-range MD programs.  The
pipe method is used for studies of fluid properties in  microcapillaries
and test purposes  while the second is highly  optimized for  production
runs.  Both   algorithms   turn  out  to  be   efficient   in   parallel
implementation for moderate number of particles and computing nodes.

The 2-D MD code is based on PVM and thus  portable  over a wide range of
computer architectures like networked  workstations and multiprocessors.
When it was  being  developed  the  main  objective  was to  make  large
simulations  feasible by reducing the execution  time and providing more
memory for  computer  experiments.  It is  suitable  for  simulation  of
Lennard-Jones  systems with large number of particles  ($10^6+$) what is
required for fluid flow and macroscopic phenomena investigation with MD.
Important  advantage  of the  parallel  program  is the  load  balancing
procedure  tuning  dynamically  distribution  of  the  program  data  on
processors  to their  current  load  which  results  in  improvement  of
performance.

MD  approach   can  also  be  used  as  "natural   solver"  and  applied
straightforwardly  to global minimum  search.  In this case we were able
to use parallel systems  efficiently by just adapting  existing parallel
MD algorithms and software.


%%%%%%%%%%%%%%%%%%%% acknowledgement %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{acknowledgement}
We  are  very   grateful   to  Prof.  David  A.  Yuen   from   Minnesota
Supercomputer  Institute  and to Dr.  Jeremy  Cook  from  University  of
Bergen  for  supporting  us  with  computer  time.  The  support  of ACC
CYFRONET  staff, in  particular  Mrs Z.  Mosurska, Mr M.  Pilipczuk  and
mbox{Dr.  P.  Wyrostek},  was very helpful.  This research was partially
supported by the II Joint American-Polish \mbox{M.  Sk\l{}odowska-Curie}
Fund (MEN/NSF--94--193).
\end{acknowledgement}


%%%%%%%%%%%%%%%%%%%%%%%% bibliography %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{thebibliography}{99}

\bibitem{rap1}
      Rapaport, D.:
      Microscale hydrodynamics: Discrete particle simulation of evolving
      flow patterns,
      {\sl Phys.Rev.}, {\bf A36}, 7 (1987) 3288.

\bibitem{kop}
      Koplik, J., Banavar, J.R.,  Willemsen, J.F.: 
      Molecular dynamics of Poiseuille flow and moving contact lines,
      {\sl Phys. Rev.  Lett.}, {\bf 60} (1988) 1282.

\bibitem{cui}
      Cui, S.T., Evans, D.J.: 
      Molecular dynamics simulation of two dimensional flow past a plate, 
      {\sl Mol. Simul.}, {\bf 9} (1992) 179.

\bibitem{rap3}
      Rapaport, D.: 
      Unpredictable convection in a small box: molecular dynamics
      experiments flow patterns, 
      {\sl Phys.Rev.}, {\bf A46}, 4 (1992) 1971.

\bibitem{alda1}
      Alda, W., Dzwinel, W., Kitowski, J., Mo\'sci\'nski, J., and Yuen, D:
      Convection driven by sedimentation using molecular dynamics approach,
      {\sl Hertzberger, B., Serazzi, G., (eds.), Proc. Int. Conf. High
      Performance Computing and Networking, Milan, Italy, May 1995\/},
      LNCS {\bf 796\/},
      678-683, Springer-Verlag, 1995.  
       
\bibitem{mora}
      Mora, P., Place, D.: 
      A lattice solid model for the nonlinear dynamics of earthquakes,
      International Journal of Modern Physics C 
      {\bf 4} (1993) 1059-1074.
      
\bibitem{alda2}   
      Alda, W., Bubak, M., Dzwinel, W., Kitowski, J., Mo\'sci\'nski, J.: 
      Computational molecular dynamics at the Institute of Computer
      Science, AGH - Cracow, Poland, 
      Supercomputer Institute Research Report UMSI 94/224 (1994).
     
\bibitem{pc96}
      Alda, W., Bubak, M., Dzwinel, W.,  Kitowski, J., Mo\'sci\'nski, J.,
      Pogoda, M., and Yuen D.,      
      Fluid dynamics simulation with particles,
      in: {\sl Borcherds, P., Bubak, M., and  Maksymowicz, A., (Eds.),
      Proc. of the 8 Joint EPS-APS Int. Conf. on Physics Computing}, 
      Krakow, September 17-21, 1996, Krak\'ow, Poland,
      pp. 281-284, ACC CYFRONET-KRAK\'OW, Poland, 1996. 

\bibitem{beazley}
      Beazley, D.M., Lomdahl, P.S.:
      Message-passing  multi-cell  molecular dynamics on the Connection
      Machine 5, Parallel Computing {\bf 20} (1994) 173-195.

\bibitem{lom1}
      Beazley,  D.M.,  Lomdahl, P.S.,  Gronbech-Jensen,  N., Giles, R.,
      Tamayo,  P.:  Parallel   algorithms  for  short-range   molecular
      dynamics, {\sl World Scientific's Annual Reviews in Computational
      Physics}, {\bf 3} (1995).

\bibitem{pip1}
      Mo\'sci\'nski, J.,  Kitowski, J.,  Rycerz, Z.A.,
      Jacobs, P.W.M.: A vectorized algorithm on the ETA 10-P for molecular
      dynamics
      simulation of large number of particles confined in a long cylinder,
      {\sl Comput. Phys. Commun.\/}, {\bf 54\/} (1989) 47.
      
\bibitem{gupta}
      Gupta, S.:
      Computing aspects of molecular dynamics simulation,
      Computer Physics Communication {\bf 70} (1992) 243-270.

\bibitem{lb95} 
      Bubak, M., Mo\'sci\'nski, J., Pogoda, M., S\l{}ota, R.:
      Load balancing for lattice gas and molecular dynamics simulations
      on networked workstations, 
      {\sl Hertzberger, B., Serazzi, G., (eds.), Proc. Int. Conf. High
      Performance Computing and Networking, Milan, Italy, May 1995\/},
      LNCS {\bf 796\/},
      329-334, Springer-Verlag, 1995.

\end{thebibliography}

\end{document}