# Do-it-yourself computational astronomy. Hardwares, algorithms, softwares, and sciences

Abstract

We overview our GRAPE (GRAvity PipE) and GRAPE-DR project to develop dedicated computers for astrophysical N-body simulations. The basic idea of GRAPE is to attach a custom-build computer dedicated to the calculation of gravitational interaction between particles to a general-purpose programmable computer. By this hybrid architecture, we can achieve both a wide range of applications and very high peak performance. GRAPE-6, completed in 2002, achieved the peak speed of 64 Tflops. The next machine, GRAPE-DR, will have the peak speed of 2 Pflops and will be completed in 2008.
We discuss the physics of stellar systems, evolution of general-purpose high-performance computers, our GRAPE and GRAPE-DR projects and issues of numerical algorithms.

This research hasn't been cited in any other publications.

- Article
- Oct 1971
- ASTROPHYS SPACE SCI

The dynamical evolution of spherical star clusters under the effect of internal encounters is followed numerically using a Monte Carlo procedure. Successive states of the system are computed, separated by a time step which is a fraction of the relaxation time. In any given state, each star is characterized by its total energy and its angular momentum with respect to the centre. Changes in these two quantities from one state to the next are computed by randomly selecting the position of the star on its orbit, randomly choosing a field star, letting the two stars interact, and multiplying the effect by an appropriate factor. This procedure can be shown to reproduce correctly the behaviour of the system as given by the Fokker-Planck equation. The computation is much faster than the exactN-body integration. Multiple-encounter effects are neglected, but this is probably not of serious consequence whenN is large. Some provisional results are presented. Once more it is found thatN-body systems develop a very high central density peak. The velocity distribution becomes isotropic in the central parts, radially elongated in the halo. Models started with widely different initial conditions tend to become similar after a few relaxation times. The presence of a tidal field, or a distribution of masses, accelerate the evolution of the system. A companion paper gives a detailed technical description of the method. - Article
- Dec 1986
- NATURE

A novel method is described of directly calculating the force on N in the gravitational N-body problem that grows only as N log N. The technique uses a tree-structured hierarchical subdivision of space into cubic cells, each of which is recursively divided into eight subcells whenever more than one particle is found to occupy the same cell. This tree is constructed anew at every time step, avoiding ambiguity and tangling. Advantages over potential-solving codes include accurate local interactions, freedom from geometrical assumptions and restrictions, and applicability to a wide class of systems, including planetary, stellar, galactic, and cosmological ones. Advantages over previous hierarchical tree-codes include simplicity and the possibility of rigorous analysis of error. - Article
- Aug 2000
- PUBL ASTRON SOC JPN

We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 accelerates the force calculation which dominates the calculation cost of the simulation. All other calculations, such as the time integration of orbits, are performed on a general-purpose computer (host computer) connected to GRAPE-5. A GRAPE-5 board consists of eight custom pipeline chips (G5 chip) and its peak performance is 38.4 Gflops. GRAPE-5 is the successor of GRAPE-3. The differences between GRAPE-5 and GRAPE-3 are: (1) The newly developed G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip, which was used for GRAPE-3, had one at 20 MHz. The calculation speed of GRAPE-5 is 8-times faster than that of GRAPE-3. (2) The GRAPE-5 board adopted a PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in a communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to the Ewald or P3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10-times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep with a direct summation algorithm takes 14 (N/128 k)2 seconds. With the Barnes-Hut tree algorithm (θ = 0.75), one timestep can be done in 15 (N/106) seconds. - Article
- Jul 1978
- PROG THEOR PHYS

Thermodynamics of self-gravitating gas system, which is enclosed by an adiabatic spherical wall, is discussed. When the temperature distribution is isothermal, the system is in thermodynamic equilibrium in the sense that the first order variation of the total entropy of the system vanishes. However, the second order variation of the total entropy may be positive, when the effect of gravity exceeds a certain limit. Then, the system may evolve to make its entropy increase. This is the gravothermal catastrophe, which was pointed out first by Antonov in 1962, but for which some questions were raised concerning its reality. In the present paper, this catastrophe is analysed by extending functional space of variation to include non-isothermal perturbations. It results in two merits: It is most convenient to make a close relation with usual concepts in the thermodynamics of irreversible process, and the present formulation does not contain any singular quantities which brought a confusion in the interpretation of the real physical processes. - Article
- Aug 1978
- PROG THEOR PHYS

Development of the gravothermal catastrophe is followed numerically for self-gravitating gas system enclosed by an adiabatic wall, which is isothermal in the initial state. It is found that the final fate of the catastrophe is in two ways depending on the initial perturbations. When the initial perturbation produces a temperature distribution decreasing outward, the contraction proceeds in the central region and the central density increases unlimitedly, as the heat flows outward. When the initial temperature distribution is increasing outward, on the other hand, the central region expands as the heat flows into the central region. Then the density contrast is reduced and finally the system reaches another isothermal configuration with the same energy but with a lower density contrast and a higher entropy. This final configuration is gravothermally stable and may be called a thermal system. In the former case of the unlimited contraction, the final density profile is determined essentially by the density and temperature dependence of the heat conductivity. In the case of a system under the force of the inverse square law, the final density distribution is well approximated by a power law so that the mass contained in the condensed core is relatively small. A possibility of formation of a black hole in stellar systems is also discussed. - Article
- Sep 1990
- COMPUT PHYS COMMUN

We have designed and built GRAPE-1 (GRAvity PipE 1), a special-purpose computer for astrophysical N-body calculations. It is designed as a back-end processor that calculates the gravitational interaction between particles. All other calculations are performed on a host computer connected to GRAPE-1. For large-N calculations (N>~104), GRAPE-1 achieves about 100 Mflops-equivalent in one board of size about 40 by 30 cm at the power of 2.5 watt. The pipelined architecture of the GRAPE-1 which is specialized and optimized for the N-body calculation is the key to the high performance. The design and construction of the GRAPE-1 system are discussed. - Article
- Apr 1980
- MON NOT R ASTRON SOC

The mechanisms driving the evolution of globular clusters are considered, and it is found that not only the evolution of the core, but also the surroundings, are self-similar and leave behind a power law debris of material that was once in the core. The exponent in this power law is proven to be in the range of -2 to -2.5. Both the magnitude of the core energy and the core mass are found to tend to zero during evolution towards core collapse, and the central density always becomes infinite in finite time. For a model with energy transport by stellar encounters approximated as heat conduction in a gas, a numerical solution yields an eigenvalue of 2.21. - Article
- Mar 1992
- PUBL ASTRON SOC JPN

HACS, a fourth-order Hermite integrator with the Ahmad-Cohen scheme is implemented. HACS is self-starting, so its implementation is considerably simpler than the original ACS. Compared to ACS, HACS allows time steps twice as long for the same accuracy and the increase in calculation cost per timestep is not very large. The actual gain in speed depends on the hardware and ranges between a factor of one and two. The gain using ACS would be significantly smaller on vector or parallel machines. - Attention is given to three examples representative of solar system dynamics in which the Liapunov time, TL, and the time for an orbit to make a sudden transition, TC, are found to be strongly correlated. The relation between the two times is TC varies as TL super b, with b approximately equal to 1.8. About 150 orbits were numerically integrated in the first and second examples, and about 1000 orbits in the third. In the first two examples, all three bodies were coplanar. In the third, the initial inclination of the test particle was varied from 0 to 60 deg. There was, at most, a weak dependence on inclination. The maximum departure of b from 1.8 was 14 percent, and the average departure was less than 7 percent. The correlation between TC and TL holds over at least six orders of magnitude in TC. The Liapunov time typically reaches its asymptotic value in a few thousand orbits, and then it can be used to predict sudden events that occur at much later times, e.g., the lifetime of the solar system.
- Article
- Feb 1968
- MON NOT R ASTRON SOC

Self-gravitating systems have negative specific heats, thus if heat is allowed to flow between two of them, the hotter one loses heat and gets yet hotter while the colder gains heat and gets yet colder. Evolution is thus away from equilibrium. When a single isothermal sphere within a non-conducting box is sufficiently centrally condensed a similar instability arises between the central parts and the outer parts; as a result no equilibrium states exist for an isothermal sphere of energy E(<0) and mass M within a spherical box of radius greater than 0.335 GM2/(−E). This is Antonov's discovery that no state of locally maximal entropy exists for stellar systems of given energy and mass contained within a rigid sphere of radius larger than this. The instability is distinct from that found by Ebert which is similar to the Schönberg–Chandrasekhar limit in stars and relates to isothermal spheres at fixed temperature. In fact there are four distinct critical points for instability of isothermal spheres which are related to the turning points of the four total thermodynamic free energies by Poincar^'s theory of linear series of equilibria. This study of the thermodynamics of self-gravitating spheres gives insight on the evolution and the final fate of stellar systems. It also helps in the understanding of some well known phenomena in stellar evolution. It is emphasized that these results prove that the escape of stars from a cluster is not necessary for its evolution but rather that extended systems naturally grow a core–halo structure reminiscent of the internal constitution of a red-giant star. - Article
- Jan 1985
- Proc Int Astron Union

This paper is an English translation of that seminal paper by Antonov which led eventually to our present understanding of the gravothermal instability. [Orig.: Vest. Leningr. Univ., Tom 7, p. 135 (1962)]. - Article
- Nov 1980
- ASTROPHYS J

The numerical Fokker-Planck determinations of core collapse in a one-component star cluster shows that a nonisothermal self-similar structure develops in the region between the shrinking isothermal core and the halo during the late stages of the core collapse. The region is characterized by the radial profiles of the stellar density, the gravitational potential, and velocity dispersion following the power laws; the central velocity dispersion increases with the central density. The data provide new evidence for the identification of the late stage of core collapse with the gravothermal instability of Lynden-Bell and Wood (1968). - Article
- Jan 1985

Contents: I. Introduction. II. Basic formulation: Difference scheme. Individual time-step algorithm. III. Ahmad-Cohen scheme: Principles and procedures. Standard algorithm. Parameters and timing tests. IV. Comoving coordinates: Basic principles. Implementation of the AC method. Mergers and inelastic effects. V. Planetary perturbations and collisions: Grid method formulation. Collision algorithm and model parameters. VI. Two-body regularization: KS transformations and equations of motion. Standard N-body treatment. Algorithm and parameters. Regularized AC procedures. VII. Three-body regularization: Isolated triple systems. General N-body algorithm. VIII. Star-cluster simulations: Models of open clusters. Postcollapse evolution. - Article
- Nov 1988
- ASTROPHYS J SUPPL S

A theoretical framework for analyzing the computational cost of gravitational N-body codes is introduced and applied to three different types of direct-summation codes, including the type of Aarseth code which has found most general use. The method of analysis, based on the probability distribution of nearest-neighbor distances, is described. The number of time steps required for a variety of different versions of the Aarseth scheme and a variety of physical models of spherical star clusters is estimated in order to measure the effects of different degrees of central concentration. Analytical estimates of computer time required are compared with actual measurements, and the validity of the scaling outside the range actually tested is discussed. A practical result for planning star cluster simulations on the next generation of supercomputers is derived. It is found that the consumption of computer time can be very centrally concentrated. - Article
- Dec 2008
- ASTROPHYS J SUPPL S

In this paper, we describe the performance and accuracy of the P2M2 tree code. The P2M2 tree code is a high-accuracy tree code based on the pseudoparticle multipole method (P2M2). P2M2 is a method to express multipole expansion using a small number of pseudoparticles. The potential field of physical particles is approximated by the field generated by the pseudoparticles. The primary advantage of the P2M2 tree code is that it can use Gravity Pipe (GRAPE) special-purpose computers efficiently for high-accuracy calculations. Although the tree code has been implemented on GRAPE, it could not handle terms of the multipole expansion higher than dipole, since GRAPE can calculate forces from point mass particles only. Thus, the calculation cost grows quickly when high accuracy is required. In the P2M2 tree code, the multipole expansion is expressed by particles, and thus we can evaluate high-order terms on GRAPE. We implemented the P2M2 tree code on both MDGRAPE-2 and a conventional workstation and measured the performance. The results show that MDGRAPE-2 accelerates the calculation by a factor between 20 (for low accuracy) and 200 (for high accuracy). Even on general-purpose programmable computers, the P2M2 tree code offers the advantage that the mathematical formulae, and therefore the actual program, are much simpler than that of the direct implementation of multipole expansion, although the calculation cost becomes somewhat higher. - In this paper, we describe the architecture and performance of the GRAPE-4 system, a massively parallel special-purpose computer for N-body simulation of gravitational collisional systems. The calculation cost of N-body simulation of collisional self-gravitating system is O(N3). Thus, even with present-day supercomputers, the number of particles one can handle is still around 10,000. In N-body simulations, almost all computing time is spent calculating the force between particles, since the number of interactions is proportional to the square of the number of particles. Computational cost of the rest of the simulation, such as the time integration and the reduction of the result, is generally proportional to the number of particles. The calculation of the force between particles can be greatly accelerated by means of a dedicated special-purpose hardware. We have developed a series of hardware systems, the GRAPE (GRAvity PipE) systems, which perform the force calculation. They are used with a general-purpose host computer which performs the rest of the calculation. The GRAPE-4 system is our newest hardware, completed in 1995 summer. Its peak speed is 1.08 TFLOPS. This speed is achieved by running 1692 pipeline large-scale integrated circuits (LSIs), each providing 640 MFLOPS, in parallel.
- Article
- Oct 2002

We present the results of very long-term numerical integrations of planetary orbital motions over 109 -yr time-spans including all nine planets. A quick inspection of our numerical data shows that the planetary motion, at least in our simple dynamical model, seems to be quite stable even over this very long time-span. A closer look at the lowest-frequency oscillations using a low-pass filter shows us the potentially diffusive character of terrestrial planetary motion, especially that of Mercury. The behaviour of the eccentricity of Mercury in our integrations is qualitatively similar to the results from Jacques Laskar's secular perturbation theory (e.g. emax∼ 0.35 over ∼± 4 Gyr). However, there are no apparent secular increases of eccentricity or inclination in any orbital elements of the planets, which may be revealed by still longer-term numerical integrations. We have also performed a couple of trial integrations including motions of the outer five planets over the duration of ± 5 × 1010 yr. The result indicates that the three major resonances in the Neptune–Pluto system have been maintained over the 1011-yr time-span. - Article
- Oct 1996
- Optic Photon News

Studying the dynamics of a large number of particles interacting through long-range forces, commonly referred to as the "N-body problem", is a central aspect of many different branches of physics. In recent years, physicists have made significant advances in the development of fast N-body algorithms to deal efficiently with such complex problems. This book gives a thorough introduction to these so-called "tree methods", setting out the basic principles and giving many practical examples of their use. The authors assume no prior specialist knowledge, and they illustrate the techniques throughout with reference to a broad range of applications. The book will be of great interest to graduate students and researchers working on the modeling of systems in astrophysics, plasma physics, nuclear and particle physics, condensed matter physics and materials science. - Article
- Feb 1996
- EARTH MOON PLANETS

The motion of Pluto is said to be chaotic in the sense that the maximum Lyapunov exponent is positive: the Lyapunov time (the inverse of the Lyapunov exponent) is about 20 million years. So far the longest integration up to now, over 845 million years (42 Lyapunov times), does not show any indication of a gross instability in the motion of Pluto. We carried out the numerical integration of Pluto over the age of the solar system (5.5 billion years 280 Lyapunov times). This integration also did not give any indication of chaotic evolution of Pluto. The divergences of Keplerian elements of a nearby trajectory at first grow linearly with the time and then start to increase exponentially. The exponential divergences stop at about 420 million years. The divergences in the semi-major axis and the mean anomaly ( equivalently the longitude and the distance) saturate. The divergences of the other four elements, the eccentricity, the inclination, the argument of perihelion, and the longitude of node still grow slowly after the stop of the exponential increase and finally saturate. - Article
- Mar 1990
- J COMPUT PHYS

A simple method for vectorizing tree searches, which operates by processing all relevant nodes at the same depth in the tree simultaneously, is described. This procedure appears to be general, assuming that gather-scatter operations are vectorizable, but is most efficient if the traversals proceed monotonically from the root to the leaves, or vice versa. Particular application is made to the hierarchical tree approach for computing the self-consistent interaction of N bodies. It is demonstrated that full vectorization of the requisite tree searches is feasible, resulting in a factor ∼4-5 improvement in cpu efficiency in the traversals on a CRAY X-MP. The overall gain in the case of the Barnes-Hut tree code algorithm is a factor ∼2-3, implying a net speed-up of ≈400-500 on a CRAY X-MP over a VAX 11/780 or SUN 3/50. - Article
- Nov 1992
- PHYSICA D

There exist methods that preserve the symplectic invariants of Hamiltonian systems when fixed step size is used. It is proved that the most simple-minded version of this property is lost if the step size is varied. - Article
- Dec 2006
- NEW ASTRON

The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs.In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors. - Article
- May 1999
- J COMPUT PHYS

In this paper we describe a new approach to implement theO(N) fast multipole method andO(NlogN) tree method, which uses pseudoparticles to express the potential field. The new method is similar to Anderson's method, which uses the values of potential at discrete points to represent the potential field. However, for the same expansion order the new method is more accurate. - Article
- Mar 1990
- J COMPUT PHYS

Vectorized algorithms for the force calculation and tree construction in the Barnes-Hut tree algorithm are described. The basic idea for the vectorization of the force calculation is to vectorize the tree traversal across particles, so that all particles in the system traverse the tree simultaneously. The tree construction algorithm also makes use of the fact that particles can be treated in parallel. Thus these algorithms take advantage of the internal parallelism in the N-body system and the tree algorithm most effectively. As a natural result, these algorithms can be used on a wide range of vector/parallel architectures, including current supercomputers and highly parallel architectures such as the Connection Machine. The vectorized code runs about five times faster than the non-vector code on a Cyber 205 for an N-body system with N = 8192. - Article
- Apr 2001
- J COMPUT PHYS

An algorithm is presented for the rapid evaluation of the potential and force fields in systems involving large numbers of particles whose interactions are Coulombic or gravitational in nature. For a system of N particles, an amount of work of the order O(N2) has traditionally been required to evaluate all pairwise interactions, unless some approximation or truncation method is used. The algorithm of the present paper requires an amount of work proportional to N to evaluate all interactions to within roundoff error, making it considerably more practical for large-scale problems encountered in plasma physics, fluid dynamics, molecular dynamics, and celestial mechanics. - Book
- Jan 1998

Scientific Simulations with Special-Purpose Computers. The GRAPE Systems J. Makino University of Tokyo, Japan and M. Taiji Institute for Statistical Mathematics, Tokyo, Japan Physics is full of complex many-body problems, i.e. problems where there are a large number of bodies interacting. This is particularly true in astrophysics, where stars or galaxies can be thought of as individual particles, but also in plasma physics, hydrodynamics and molecular dynamics. Special purpose computers have been developed to handle these highly complex problems. Scientific Simulations with Special-Purpose Computers gives an overview of these systems, and then focuses on an extremely high profile and successful project-the GRAPE computer at the University of Tokyo-and discusses its development, performance and applications across a range of problems. The future development and applications of special purpose computers are also discussed. Written by two of the leading developers of the GRAPE system, this unique volume will be of great interest to readers across a wide range of fields, including, astrophysicists, astronomers, plasma physicists, researchers in molecular dynamics and computer scientists. - Article
- Nov 1995
- NATURE

The presence of a Jupiter-mass companion to the star 51 Pegasi is inferred from observations of periodic variations in the star's radial velocity. The companion lies only about eight million kilometres from the star, which would be well inside the orbit of Mercury in our Solar System. This object might be a gas-giant planet that has migrated to this location through orbital evolution, or from the radiative stripping of a brown dwarf. - Article
- May 1993
- NATURE

THE apparent emptiness of the outer Solar System has been a long-standing puzzle for astronomers, as it contrasts markedly with the abundance of asteroids and short-period comets found closer to the Sun. One explanation for this might be that the orbits of distant objects are intrinsically short-lived, perhaps owing to the gravitational influence of the giant planets. Another possibility is that such objects are very faint, and thus they might easily go undetected. An early survey1 designed to detect distant objects culminated with the discovery of Pluto. More recently, similar surveys yielded the comet-like objects 2060 Chiron2 and 5145 Pholus3 beyond the orbit of Saturn. Here we report the discovery of a new object, 1992 QB1, moving beyond the orbit of Neptune. We suggest that this may represent the first detection of a member of the Kuiper belt4,5, the hypothesized population of objects beyond Neptune and a possible source of the short-period comets6-8. - Article
- May 1995
- ASTROPHYS J

In stellar dynamical computer simulations, as well as other types of simulations using particles, time step size is often held constant in order to guarantee a high degree of energy conservation. In many applications, allowing the time step size to change in time can offer a great saving in computational cost, but variable-size time steps usually imply a substantial degradation in energy conservation. We present a meta-algorithm' for choosing time steps in such a way as to guarantee time symmetry in any integration scheme, thus allowing vastly improved energy conservation for orbital calculations with variable time steps. We apply the algorithm to the familiar leapfrog scheme, and generalize to higher order integration schemes, showing how the stability properties of the fixed-step leapfrog scheme can be extended to higher order, variable-step integrators such as the Hermite method. We illustrate the remarkable properties of these time-symmetric integrators for the case of a highly eccentric elliptical Kepler orbit and discuss applications to more complex problems. - The Digital Orrery has been used to perform an integration of the motion of the outer planets for 845 million years. This integration indicates that the long-term motion of the planet Pluto is chaotic. Nearby trajectories diverge exponentially with an e-folding time of only about 20 million years.
- We have designed and built the Orrery, a special computer for high-speed high-precision orbital mechanics computations. On the problems the Orrery was designed to solve, it achieves approximately 10 Mflops in about 1 ft3of space while consuming 150 W of power. The specialized parallelarchitecture of the Orrery, which is well matched to orbital mechanics problems, is the key to obtaining such high performance. In this paper we discuss the design, construction, and programming of the Orrery. Copyright © 1985 by The Institute of Electrical and Electronics Engineers, Inc.
- Article
- Apr 1993
- J COMPUT PHYS

We consider treecodes (N-body programs which use a tree data structure) from the standpoint of their worst-case behavior. That is, we derive upper bounds on the largest possible errors that are introduced into a calculation by use of various multipole acceptability criteria (MAC). We find that the conventional Barnes-Hut MAC can introduce potentially unbounded errors unless ` ! 1= p 3, and that this behavior while rare, is demonstrable in astrophysically reasonable examples. We consider two other MACs closely related to the BH MAC. While they don't admit the same unbounded errors, they nevertheless require extraordinary amounts of CPU time to guarantee modest levels of accuracy. We derive new error bounds based on some additional, easily computed moments of the mass distribution. These error bounds form the basis for four new MACs which can be used to limit the absolute or relative error introduced by each multipole evaluation, or, with the introduction of some additional data struc... - Article
- Feb 1970

Network-of-Workstations technology is applied to the challenge of implementing very high performance workstations for Earth and space science applications. The Beowulf parallel workstation employs 16 PCbased processing modules integrated with multiple Ethernet networks. Large disk capacity and high disk to memory bandwidth is achieved through the use of a hard disk and controller for each processing module supporting up to 16 way concurrent accesses. The paper presents results from a series of experiments that measure the scaling characteristics of Beowulf in terms of communication bandwidth, file transfer rates, and processing performance. The evaluation includes a computational fluid dynamics code and an N-body gravitational simulation program. It is shown that the Beowulf architecture provides a new operating point in performance to cost for high performance workstations, especially for file transfers under favorable conditions. 1 INTRODUCTION Networks Of Workstations, or NOW [?]... - Article
- Apr 2007
- PUBL ASTRON SOC JPN

We have developed PGPG (Pipeline Generator for Programmable GRAPE), a software which generates the low-level design of the pipeline processor and communication software for FPGA-based computing engines (FBCEs). An FBCE typically consists of one or multiple FPGA (Field-Programmable Gate Array) chips and local memory. Here, the term "Field-Programmable" means that one can rewrite the logic implemented to the chip after the hardware is completed, and therefore a single FBCE can be used for calculation of various functions, for example pipeline processors for gravity, SPH interaction, or image processing. The main problem with FBCEs is that the user need to develop the detailed hardware design for the processor to be implemented to FPGA chips. In addition, she or he has to write the control logic for the processor, communication and data conversion library on the host processor, and application program which uses the developed processor. These require detailed knowledge of hardware design, a hardware description language such as VHDL, the operating system and the application, and amount of human work is huge. A relatively simple design would require 1 person-year or more. The PGPG software generates all necessary design descriptions, except for the application software itself, from a high-level design description of the pipeline processor in the PGPG language. The PGPG language is a simple language, specialized to the description of pipeline processors. Thus, the design of pipeline processor in PGPG language is much easier than the traditional design. For real applications such as the pipeline for gravitational interaction, the pipeline processor generated by PGPG achieved the performance similar to that of hand-written code. In this paper we present a detailed description of PGPG version 1.0. - The method of choice for integrating the equations of motion of the general N-body problem has been to use an individual time step scheme. For the sake of efficiency, block time steps have been the most popular, where all time step sizes are smaller than a maximum time step size by an integer power of two. We present the first successful attempt to construct a time-symmetric integration scheme, based on block time steps. We demonstrate how our scheme shows a vastly better long-time behavior of energy errors, in the form of a random walk rather than a linear drift. Increasing the number of particles makes the improvement even more pronounced.
- Article
- May 2005
- PUBL ASTRON SOC JPN

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days. - Article
- May 2005
- PUBL ASTRON SOC JPN

We present Particle–Particle–Particle–Mesh (PPPM) and Tree Particle–Mesh (TreePM) implementations on GRAPE-5 and GRAPE-6A systems, special-purpose hardware accelerators for gravitational many-body simulations. In our PPPM and TreePM implementations on GRAPE, the computational time is significantly reduced compared with the conventional implementations without GRAPE, especially under strong particle clustering, and is almost constant irrespective of the degree of particle clustering. We describe our survey of two simulation parameters, the PM grid spacing and the opening parameter for the most optimal combination of force accuracy and computational speed. We also describe the parallelization of these implementations on a PC–GRAPE cluster, in which each node has one GRAPE board, and present the optimal configuration of simulation parameters for good parallel scalability.