[bib1] [ABJ82] and Design and implementation of a parallel tree search algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, :192–203, 1982.
[bib2] [ACM91] . Resources in Parallel and Concurrent Systems. ACM Press, New York, NY, <year>1991</year>.
[bib3] [ACS89a] and A model for hierarchical memory. Technical Report RC 15118 (No. 67337), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.
[bib4] [ACS89b] and On communication latency in PRAM computations. Technical Report RC 14973 (No. 66882), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.
[bib5] [ACS89c] and Communication complexity of PRAMs. Technical Report RC 14998 (64644), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.
[bib6] [ADJ+91] and The MIT alewife machine : A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on Scalable Shared Memory Multiprocessors. Kluwer Academic, 1991.
[bib7] [AFKW90] and Solving Problems on Concurrent Processors: Software for Concurrent Processors: Volume II. Prentice-Hall, Englewood Cliffs, NJ, <year>1990</year>.
[bib8] [AG94] and Highly Parallel Computing. Benjamin/Cummings, Redwood City, CA, <year>1994</year>. (Second Edition).
[bib9] [Aga89] Performance tradeoffs in multithreaded processors. Technical Report 89-566, Massachusetts Institute of Technology, Microsystems Program Office, Cambridge, MA, 1989.
[bib10] [Aga91] Performance tradeoffs in multithreaded processors. Technical report MIT/LCS/TR 501; VLSI memo no. 89-566, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1991.
[bib12] [Agh86] Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA, <year>1986</year>.
[bib13] [AHMP87] and Deterministic simulation of idealized parallel computers on more realistic ones. SIAM Journal of Computing, :808–835, October 1987.
[bib14] [AHU74] and The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, <year>1974</year>.
[bib15] [AJM88] and A randomized parallel branch-and-bound algorithm. In Proceedings of the 1988 International Conference on Parallel Processing, 1988.
[bib16] [AK84] and Graph problems on a mesh-connected processor array. Journal of ACM, :649–667, July 1984.
[bib18] [Akl89] The Design and Analysis of Parallel Algorithms. Prentice-Hall, Englewood Cliffs, NJ, <year>1989</year>.
[bib19] [Akl97] Parallel Computation Models and Methods. Prentice-Hall, Englewood Cliffs, NJ, <year>1997</year>.
[bib20] [AKR89] and Floorplan optimization on multiprocessors. In Proceedings of the 1989 International Conference on Computer Design, 1989. Also published as Technical Report ACT-OODS-241-89, Microelectronics and Computer Corporation, Austin, TX.
[bib21] [AKR90] and Efficient parallel algorithms for search problems: Applications in VLSI CAD. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 1990.
[bib22] [AKRS91] and Automatic test pattern generation on multiprocessors. Parallel Computing, , number 12:1323–1342, December 1991.
[bib23] [AKS83] and An O (n log n) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 1–9, 1983.
[bib24] [AL93] and . Parallel Computational Geometry. Prentice-Hall, Englewood Cliffs, NJ, <year>1993</year>.
[bib25] [AM88] and Parallel branch-and-bound algorithms on hypercube multiprocessors. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 1492–1499, New York, NY, 1988. ACM Press.
[bib26] [Amd67] Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, 483–485, 1967.
[bib27] [And91] Concurrent Programming: Principles and Practice. Benjamin/Cummings, Redwood City, CA, <year>1991</year>.
[bib28] [AOB93] and Balanced parallel sort on hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, :572–581, May 1993.
[bib29] [AS87] and New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, :1258–1263, October 1987.
[bib30] [AU72] and The Theory of Parsing, Translation and Compiling: Volume 1, Parsing. Prentice-Hall, Englewood Cliffs, NJ, <year>1972</year>.
[bib32] [BA82] Principles of Concurrent Programming. Prentice-Hall, Englewood Cliffs, NJ, <year>1982</year>.
[bib36] [Bat68] Sorting networks and their applications. In Proceedings of the 1968 Spring Joint Computer Conference , 307–314, 1968.
[bib37] [Bat76] The Flip network in STARAN. In Proceedings of International Conference on Parallel Processing, 65–71, 1976.
[bib38] [Bat80] Design of a massively parallel processor. IEEE Transactions on Computers, 836–840, September 1980.
[bib39] [Bau78] The Design and Analysis of Algorithms for Asynchronous Multiprocessors. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, <year>1978</year>.
[bib40] [BB90] and Approximate algorithms for the partitionable independent task scheduling problem. In Proceedings of the 1990 International Conference on Parallel Processing, I72–I75, 1990.
[bib42] [BCCL95] and Parallel mixed integer programming. Technical Report CRPC TR 95554, Center for Research on Parallel Computation, Research Monograph, 1995.
[bib43] [BCJ90] and Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system. IEEE Transactions on Parallel and Distributed Systems, :195–205, 1990.
[bib47] [Ben80] A parallel algorithm for constructing minimum spanning trees. Journal of the ACM, :51–59, March 1980.
[bib48] [Ber84] On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, :147–150, March 1984.
[bib49] [Ber89] Communication efficient matrix multiplication on hypercubes. Parallel Computing, :335–342, 1989.
[bib50] [BH82] and Routing merging and sorting on parallel models of computation. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing, 338–344, May 1982.
[bib51] [Bix91] Two applications of linear programming. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.
[bib52] [BJK+95] and Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming, 1995.
[bib53] [BKH89] and The giant-Fourier-transform. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 387–389, 1989.
[bib54] [Ble90] Vector Models for Data-Parallel Computing. MIT Press, Cambridge, MA, <year>1990</year>.
[bib55] [BMCP98] and Solving large-scale qap problems in parallel with the search library zram. Journal of Parallel and Distributed Computing, :157–169, 1998.
[bib56] [BNK92] and Designing broadcasting algorithms in the postal model for message-passing systems. In Proceedings of 4th ACM Symposium on Parallel Algorithms and Architectures, 13–22, 1992.
[bib57] [BOS+91] and Optimal communication algorithms for hypercubes. Journal of Parallel and Distributed Computing, :263–275, 1991.
[bib58] [BR90] and On optimal and practical routing methods for a massive data movement operation on hypercubes. Technical report, University of Southern California, Los Angeles, CA, 1990.
[bib59] [Bra97] Technology news & reviews: Chemkin software; OpenMP Fortran Standard; ODE toolbox for Matlab; Java products; Scientific WorkPlace 3.0. IEEE Computational Science and Engineering, :75–78, October/December 1997.
[bib60] [Bro79] Dynamic programming in computer science. Technical Report CMU-CS-79-106, Carnegie Mellon University, Pittsburgh, PA, 1979.
[bib61] [BS78] and Optimal sorting algorithms for parallel computers. IEEE Transactions on Computers, :84–87, January 1978.
[bib62] [BT89] and Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, NJ, <year>1989</year>.
[bib63] [BT97] and Parallel and Distributed Computation: Numerical Methods. Athena Scientific, <year>1997</year>.
[bib65] [Buy99] R.Buyya, editor. High Performance Cluster Computing: Architectures and Systems. Prentice Hall, <year>1999</year>.
[bib66] [BW89] and Computing performance as a function of the speed, quantity, and the cost of processors. In Supercomputing ’89 Proceedings, 759–764, 1989.
[bib67] [BW97] and Multithreading Applications in Win32: the Complete Guide to Threads. Addison-Wesley Developers Press, Reading, MA, <year>1997</year>.
[bib68] [C+95] et al. A proposal for a set of Parallel Basic Linear Algebra Subprograms. Technical Report CS-95-292, Computer Science Department, University of Tennessee, 1995.
[bib69] [CAHH91] and The parallelization of some level 2 and 3 BLAS operations on distributed-memory machines. In Proceedings of the First International Conference of the Austrian Center of Parallel Computation. Springer-Verlag Series Lecture Notes in Computer Science, 1991.
[bib70] [Can69] A cellular computer to implement the Kalman Filter Algorithm. Ph.D. Thesis, Montana State University, Bozman, MT, <year>1969</year>.
[bib71] [Car89] G.F.Carey, editor. Parallel Supercomputing: Methods, Algorithms and Applications. Wiley, New York, NY, <year>1989</year>.
[bib72] [CD87] and An approach to parallel vision algorithms. In R.Porth, editor, Parallel Processing. SIAM, Philadelphia, PA, <year>1987</year>.
[bib73] [CDK+00] R.Chandra, L.Dagum, D.Kohr, D.Maydan, J.McDonald, and R.M. (editors). Parallel Programming in OpenMP. Morgan Kaufmann Publishers, <year>2000</year>.
[bib74] [CG87] and Gaussian elimination with partial pivoting and load balancing on a multiprocessor. Parallel Computing, :65–74, 1987.
[bib75] [CGK93] and Parallel search algorithms for robot motion planning. In Proceedings of the IEEE Conference on Robotics and Automation, 46–51, 1993.
[bib76] [CGL92] and Analysis of multithreaded microprocessors under multiprogramming. Report UCB/CSD 92/687, University of California, Berkeley, Computer Science Division, Berkeley, CA, May 1992.
[bib77] [Cha79] Maximal parallelism in matrix multiplication. Technical Report RC-6193, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1979.
[bib78] [Cha87] An alternate view of LU factorization on a hypercube multiprocessor. In M.T.Heath, editor, Hypercube Multiprocessors 1987, 569–575. SIAM, Philadelphia, PA, <year>1987</year>.
[bib79] [CJP83] and Solving large-scale zero-one linear programming problem. Operations Research, :803–834, 1983.
[bib80] [CKP+93a] and LogP: Towards a realistic model of parallel computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 1–12, 1993.
[bib81] [CKP+93b] et al. Logp: Towards a realistic model of parallel computation. In Principles and Practices of Parallel Programming, <year>May 1993</year>.
[bib83] [CLC81] and Optimal parallel algorithms for the connected component problem. In Proceedings of the 1981 International Conference on Parallel Processing, 170–175, 1981.
[bib84] [CLC82] and Efficient parallel algorithms for some graph problems. Communications of the ACM, :659–665, September 1982.
[bib85] [CLR90] and Introduction to Algorithms. MIT Press, McGraw-Hill, New York, NY, <year>1990</year>.
[bib86] [CM82] and Distributed computation on graphs: Shortest path algorithms. Communications of the ACM, :833–837, November 1982.
[bib87] [CM98] and OpenMP and HPF: Integrating two paradigms. Lecture Notes in Computer Science, 1470, <year>1998</year>.
[bib89] [Col89] Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA, <year>1989</year>.
[bib91] [CR89] and A model of parallel performance. Technical Report AFWL-TR-89-01, Air Force Weapons Laboratory, 1989.
[bib92] [CR91] and Modeling the serial and parallel fractions of a parallel algorithm. Journal of Parallel and Distributed Computing, 1991.
[bib93] [CS88] and Efficient mapping and implementations of matrix algorithms on a hypercube. Journal of Supercomputing, :7–27, 1988.
[bib94] [CSG98] and Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, <year>1998</year>.
[bib95] [CT92] and An Introduction to Parallel Programming. Jones and Bartlett, Austin, TX, <year>1992</year>.
[bib97] [Cve87] Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM Journal of Research and Development, :435–451, 1987.
[bib100] [Dal87] A VLSI Architecture for Concurrent Data Structures. Kluwer Academic Publishers, Boston, MA, <year>1987</year>.
[bib101] [Dal90a] Analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, , June 1990.
[bib102] [Dal90b] Network and processor architecture for message-driven computers. In R.Sauya and G.Birtwistle, editors, VLSI and Parallel Computation. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.
[bib103] [Dav86] Column LU factorization with pivoting on a hypercube multiprocessor. SIAM Journal on Algebraic and Discrete Methods, :538–550, 1986. Also available as Technical Report ORNL-6219, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.
[bib104] [DCG90] and Vectorization and multitasking of dynamic programming in control: experiments on a CRAY-2. Parallel Computing, :261–269, 1990.
[bib105] [DDSV99] and Numerical Linear Algebra for High Performance Computers (Software, Environments, Tools). SIAM, <year>1999</year>.
[bib106] [DeC89] The Technology of Parallel Processing: Parallel Processing Architectures and VLSI Hardware: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, <year>1989</year>.
[bib107] [DEH89] and Parallel Processing for Computer Vision and Display. Addison-Wesley, Reading, MA, <year>1989</year>.
[bib108] [Dem82] Experiences with multiprocessor algorithms. IEEE Transactions on Computers, :278–288, 1982.
[bib109] [DFHM82] and A taxonomy of parallel sorting algorithms. Technical Report TR-482, Computer Sciences Department, University of Wisconsin, Madison, WI, 1982.
[bib110] [DFRC96] and Scalable parallel computational geometry for coarse grained multicomputers. International Journal on Computational Geometry, :379–400, 1996.
[bib112] [Dij59] A note on two problems in connection with graphs. Numerische Mathematik, :269–271, 1959.
[bib113] [DM93] and Parallel A* algorithms and their performance on hypercube multiprocessors. In Proceedings of the Seventh International Parallel Processing Symposium, 797–803, 1993.
[bib114] [DM98] and OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, :46–55, January/March 1998.
[bib115] [DNS81] and Parallel matrix and graph algorithms. SIAM Journal on Computing, :657–673, 1981.
[bib116] [Dra96] Introduction to Java threads. JavaWorld: IDG’s magazine for the Java community, , April 1996.
[bib117] [DRGNP] and VM parallel environment. In Proceedings of the IBM Kingston Parallel Processing Symposium.
[bib119] [DS87] and Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, :547– 553, 1987.
[bib120] [DSG83] and Derivation of a termination detection algorithm for a distributed computation. Information Processing Letters, :217–219, 1983.
[bib121] [DT89] and Implementing the discrete Fourier transform on a hypercube vector-parallel computer. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 407– 410, 1989.
[bib122] [DV87] and Optimal graph algorithms on a fixed-size linear array. IEEE Transactions on Computers, :460–470, April 1987.
[bib123] [dV89] Multicomputer matrix computations: Theory and practice. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 1303–1308, 1989.
[bib124] [DY81] and Parallel algorithms for the minimum spanning tree problem. In Proceedings of the 1981 International Conference on Parallel Processing, 188–189, 1981.
[bib125] [Eck94] Parallel branch-and-bound methods for mixed-integer programming on the cm-5. SIAM Journal on Optimization, :794–814, 1994.
[bib126] [Eck97] Distributed versus centralized storage and control for parallel branch and bound: Mixed integer programming on the cm-5. Computational Optimization and Applications, :199–220, 1997.
[bib127] [Ede89] Optimal matrix transposition and bit-reversal on hypercubes: Node address–memory address exchanges. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.
[bib128] [EDH80] and Distributed enumeration on network computers. IEEE Transactions on Computers, :818–825, September 1980.
[bib129] [EHHR88] and Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, :589–600, 1988.
[bib130] [EHMN90] and PRA*: A memory-limited heuristic search procedure for the connection machine. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 145– 149, 1990.
[bib131] [Ekl72] A fast computer method for matrix transposing. IEEE Transactions on Computers, :801–803, 1972.
[bib132] [Ert92] OR—parallel theorem proving with random competition. In A.Voronokov, editor, LPAR ’92: Logic Programming and Automated Reasoning, 226–237. Springer-Verlag, New York, NY, <year>1992</year>.
[bib133] [EZL89] and Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, :408–423, 1989.
[bib136] [FF86] and Optimal communication algorithms on hypercube. Technical Report CCCP-314, California Institute of Technology, Pasadena, CA, 1986.
[bib137] [FJDS96] and Introduction to High-Performance Scientific Computing. MIT Press, <year>1996</year>.
[bib138] [FJL+88] and Solving Problems on Concurrent Processors: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, <year>1988</year>.
[bib139] [FK88] and Distributed tree search and its application to alpha-beta pruning. In Proceedings of the 1988 National Conference on Artificial Intelligence, 1988.
[bib142] [Fla90] Further applications of the overhead model for parallel systems. Technical Report G320-3540, IBM Corporation, Palo Alto Scientific Center, Palo Alto, CA, 1990.
[bib144] [Fly72] Some computer organizations and their effectiveness. IEEE Transactions on Computers, :948–960, 1972.
[bib145] [Fly95] Computer Architecture: Pipelined and Parallel Processor Design. Jones and Bartlett, <year>1995</year>.
[bib146] [FM70] and Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, :496–507, July 1970.
[bib147] [FM87] and DIB—a distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems, :235–256, April 1987.
[bib148] [FM92] and Load balancing algorithms on the connection machine and their use in Monte-Carlo methods. In Proceedings of the Unstructured Scientific Computation on Multiprocessors Conference, 1992.
[bib149] [FMM94] and Studying overheads in massively parallel min/max-tree evaluation. In Proc. of the 6th ACM Symposium on Parallel Algorithms and Architectures, 94–103, 1994.
[bib150] [FOH87] and Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing, :17–31, 1987.
[bib151] [Fos95] Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, <year>1995</year>.
[bib152] [Fou94] Parallel Computing: Principles and Practice. Cambridge University Press, <year>1994</year>.
[bib153] [FR62] and Flows in Networks. Princeton University Press, Princeton, NJ, <year>1962</year>.
[bib154] [Fra93] The multiscalar architecture. Technical Report CS-TR-1993-1196, University of Wisconsin, 1993.
[bib155] [FTI90] and A multi-level load balancing scheme for OR-parallel exhaustive search programs on the Multi-PSI. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 50–59, 1990.
[bib156] [FW78] and Parallelism in random access machines. In Proceedings of ACM Symposium on Theory of Computing, 114–118, 1978.
[bib157] [Gal95] Posix. 4 : Programming for the Real World. O’Reilly & Associates, <year>1995</year>.
[bib159] [Gei85] Efficient parallel LU factorization with pivoting on a hypercube multiprocessor. Technical Report ORNL-6211, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.
[bib160] [GGK+83] and The NYU Ultracomputer—designing a MIMD, shared memory parallel computer. IEEE Transactions on Computers, :175–189, February 1983.
[bib161] [GGK93] and Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel and Distributed Technology, :12–21, August 1993.
[bib162] [GH85] and Parallel Cholesky factorization on a hypercube multiprocessor. Technical Report ORNL-6190, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.
[bib163] [GH86] and Matrix factorization on a hypercube multiprocessor. In M.T.Heath, editor, Hypercube Multiprocessors 1986, 161–180. SIAM, Philadelphia, PA, <year>1986</year>.
[bib164] [GH01] and Performance Optimization of Numerically Intensive Codes. SIAM, <year>2001</year>.
[bib165] [Gib85] Algorithmic Graph Theory. Cambridge University Press, Cambridge, <year>1985</year>.
[bib166] [Gib89] A more practical PRAM model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 158–168, 1989.
[bib167] [GK91] and The scalability of matrix multiplication algorithms on parallel computers. Technical Report TR 91-54, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1991. A short version appears in Proceedings of 1993 International Conference on Parallel Processing, pages III-115–III-119, 1993.
[bib168] [GK93a] and Performance properties of large scale parallel systems. Journal of Parallel and Distributed Computing, :234–244, 1993. Also available as Technical Report TR 92-32, Department of Computer Science, University of Minnesota, Minneapolis, MN.
[bib169] [GK93b] and The scalability of FFT on parallel computers. IEEE Transactions on Parallel and Distributed Systems, :922–932, August 1993. A detailed version is available as Technical Report TR 90-53, Department of Computer Science, University of Minnesota, Minneapolis, MN.
[bib170] [GKP92] and Parallel processing of discrete optimization problems. In Encyclopaedia of Microcomputers. Marcel Dekker Inc., New York, <year>1992</year>.
[bib171] [GKR91] and Experimental evaluation of load balancing techniques for the hypercube. In Proceedings of the Parallel Computing ’91 Conference, 497–514, 1991.
[bib172] [GKRS96] and A3: A simple and asymptotically accurate model for parallel computation. In Proceedings of the Sixth Symposium on Frontiers of Massively Parallel Computing, Annapolis, MD, 1996.
[bib173] [GKS92] and Performance and scalability of preconditioned conjugate gradient methods on parallel computers. Technical Report TR 92-64, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1992. A short version appears in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 664–674, 1993.
[bib174] [GKT79] and Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration, California Institute of Technology, 509–525, 1979.
[bib175] [GL96a] and Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, <year>1996</year>.
[bib176] [GL96b]
and
User’s Guide for mpich
, a Portable Implementation of MPI. Mathematics and Computer Science Division, Argonne National Laboratory. ANL-96/6.
<year>1996</year>.
[bib177] [GLDS96] and A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, :789–828, September 1996.
[bib179] [GMB88] and Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, :609–638, 1988.
[bib180] [GO93] and Scientific Computing: An Introduction with Parallel Computing. Academic Press, <year>1993</year>.
[bib181] [GPS90] and Parallel algorithms for dense linear algebra computations . SIAM Review, :54–135, March 1990. Also appears in et al. Parallel Algorithms for Matrix Computations. SIAM, Philadelphia, PA, <year>1990</year>.
[bib182] [GR88] and LU factorization algorithms on distributed-memory multiprocessor architectures. SIAM Journal on Scientific and Statistical Computing, :639–649, 1988. Also available as Technical Report ORNL/TM-10383, Oak Ridge National Laboratory, Oak Ridge, TN, 1987.
[bib183] [GR90] and Efficient Parallel Algorithms. Cambridge University Press, Cambridge, UK, <year>1990</year>.
[bib184] [Gre91] Parallel Processing for Computer Graphics. MIT Press, Cambridge, MA, <year>1991</year>.
[bib186] [GT88] and A new approach to the maximum-flow problem. Journal of the ACM, :921–940, October 1988.
[bib187] [Gup87] Parallelism in Production Systems. Morgan Kaufmann, Los Altos, CA, <year>1987</year>.
[bib189] [Gus92] The consequences of fixed time performance measurement. In Proceedings of the 25th Hawaii International Conference on System Sciences: Volume III, 113–124, 1992.
[bib190] [HB84] and Computer Architecture and Parallel Processing. McGraw-Hill, New York, NY, <year>1984</year>.
[bib192] [HCH95] and Deep Blue system overview. In Proceedings of the 1995 International Conference on Supercomputing, Barcelona, Spain, 240–244, 1995.
[bib193] [HCS79] and Computing connected components on parallel computers. Communications of the ACM, :461– 464, August 1979.
[bib194] [HD87] and A tight upper bound for the speedup of parallel best-first branch-and-bound algorithms. Technical report, Center for Automation Research, University of Maryland, College Park, MD, 1987.
[bib195] [HD89a] and Parallel iterative a* search: An admissible distributed heuristic search algorithm. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 23–29, 1989.
[bib196] [HD89b] and Parallel Processing for Supercomputers and Artificial Intelligence. McGraw-Hill, New York, NY, <year>1989</year>.
[bib197] [HDM97] and Installation and user guide for the oxford bsp toolset: User guide for the oxford bsp toolset (v1.3) implementation of bsplib. Technical report, Oxford University Computing Laboratory, 1997.
[bib198] [Hea85] Parallel Cholesky factorization in message-passing multiprocessor environments. Technical Report ORNL-6150, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.
[bib202] [Hip89] Matrix multiplication on the JPL/Caltech Mark IIIfp hypercube. Technical Report C3P 746, Concurrent Computation Program, California Institute of Technology, Pasadena, CA, 1989.
[bib203] [Hir76] Parallel algorithms for the transitive closure and connected component problem. In Proceedings of the 8th Annual ACM Symposium on the Theory of Computing, 55–57, 1976.
[bib204] [Hir78] Fast parallel sorting algorithms. Communications of the ACM, :657–666, August 1978.
[bib205] [HJ87] and Spanning balanced trees in Boolean cubes. Technical Report YALEU/DCS/RR-508, Department of Computer Science, Yale University, New Haven, CT, 1987.
[bib206] [HJE91] and Matrix multiplication on hypercubes using full bandwidth and constant storage. In Proceedings of the 1991 International Conference on Parallel Processing, 447–451, 1991.
[bib207] [HK96] and C3: A parallel model for coarse-grained machines. Journal of Parallel and Distributed Computing, :139–154, February 1996.
[bib209] [HLV90] and A sub-linear parallel algorithm for some dynamic programming problems. In Proceedings of the 1990 International Conference on Parallel Processing, III–261–III–264, 1990.
[bib210] [HM80] and Parallel record-sorting methods for hardware realization. Osu-cisrc-tr-80-7, Computer Science Information Department, Ohio State University, Columbus, OH, 1980.
[bib211] [HMT+96] and A study of the earth-manna multithreaded system. Intl. J. of Par. Prog., :319–347, 1996.
[bib212] [HNR90] and Parallel quicksort using fetch-and-add. IEEE Transactions on Computers, :133–138, January 1990.
[bib214] [HP89] and Deterministic PRAM simulation with constant redundancy. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 103–109, 1989.
[bib216] [HR88] and Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, :558–588, 1988.
[bib217] [HR91] and Efficient communication primitives on circuit-switched hypercubes. In Sixth Distributed Memory Computing Conference Proceedings, 390–397, 1991.
[bib218] [HS78] and Fundamentals of Computer Algorithms. Computer Science Press, Rockville, MD, <year>1978</year>.
[bib220] [Hsu90] Large scale parallelization of alpha-beta search: An algorithmic and architectural study with computer chess. Technical report, Carnegie Mellon University, Pittsburgh, PA, 1990. Ph.D. Thesis.
[bib221] [Hua85] Solving some graph problems with optimal or near-optimal speedup on mesh-of-trees networks. In Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 232–340, 1985.
[bib224] [IPS91] and Parallel recognition and parsing on the hypercube. IEEE Transactions on Computers, :764–770, June 1991.
[bib225] [IYF79] and A parallel searching scheme for multiprocessor systems and its application to combinatorial problems. In Proceedings of the International Joint Conference on Artificial Intelligence, 416–418, 1979.
[bib226] [Jaj92] An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, <year>1992</year>.
[bib227] [JAM87] and Randomized parallel algorithms for Prolog programs and backtracking applications. In Proceedings of the 1987 International Conference on Parallel Processing, 278–281, 1987.
[bib228] [JAM88] and A randomized parallel backtracking algorithm. IEEE Transactions on Computers, , 1988.
[bib229] [JGD87] L.H.Jamieson, D.B.Gannon, and R.J.Douglass, editors. The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, <year>1987</year>.
[bib230] [JH88] and Matrix transposition on Boolean n-cube configured ensemble architectures. SIAM Journal on Matrix Analysis and Applications, :419–454, July 1988.
[bib231] [JH89] and Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, :1249– 1268, September 1989.
[bib232] [JH91] and Optimal all-to-all personalized communication with minimum span on Boolean cubes. In Sixth Distributed Memory Computing Conference Proceedings, 299–304, 1991.
[bib233] [JKFM89] and A radix-2 FFT on the connection machine. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.
[bib234] [JNS97] and Progress in integer programming: An exposition. Technical report, School of Industrial and Systems Engineering, Georgia Institute of Technology, 1997. Available from http://akula.isye.gatech.edu/mwps/mwps.html.
[bib235] [Joh77] Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, :1–13, March 1977.
[bib236] [Joh84] Combining parallel and sequential sorting on a boolean n-cube. In Proceedings of International Conference on Parallel Processing, 1984.
[bib237] [Joh87] Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing, :133–172, April 1987.
[bib238] [Joh90] Communication in network architectures. In R.Suaya and G.Birtwistle, editors, VLSI and Parallel Computation, 223–389. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.
[bib239] [JP93] and A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, :654–669, 1993.
[bib240] [JS87] and All pairs shortest paths on a hypercube multiprocessor. In Proceedings of the 1987 International Conference on Parallel Processing, 713–716, 1987.
[bib241] [KA88] and Fast Fourier transform algorithm design and tradeoffs. Technical Report RIACS TR 88.18, NASA Ames Research Center, Moffet Field, CA, 1988.
[bib242] [KA99a] and Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, :381–422, 1999.
[bib243] [KA99b] and Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, :406– 471, 1999.
[bib244] [KB57] and Assignment problems and the location of economic activities. Econometrica, :53–76, 1957.
[bib246] [KF90] and Measuring parallel processor performance. Communications of the ACM, :539–543, 1990.
[bib247] [KG94] and Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, :379–391, 1994. Also available as Technical Report TR 91-18, Department of Computer Science Department, University of Minnesota, Minneapolis, MN.
[bib248] [KGK90] V.Kumar, P.S.Gopalakrishnan, and L.N.Kanal, editors. Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.
[bib249] [KGR94] and Scalable load balancing techniques for parallel computers. Journal of Parallel and Distributed Computing, :60– 79, July 1994.
[bib250] [KH67] and Finite state processes and dynamic programming. SIAM Journal of Applied Math, :693–718, 1967.
[bib251] [KH83] and An efficient implementation of Batcher’s odd-even merge algorithm and its application in parallel sorting schemes. IEEE Transactions on Computers, , March 1983.
[bib252] [KK79] and Virtual cut-through: A new communication switching technique. Computer Networks, :267–286, 1979.
[bib253] [KK83] and A general branch-and-bound formulation for understanding and synthesizing and/or tree search procedures. Artificial Intelligence, :179–198, 1983.
[bib254] [KK84] and Parallel branch-and-bound formulations for and/or tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, :768–778, 1984.
[bib255] [KK88a] and Search in Artificial Intelligence. Springer-Verlag, New York, NY, <year>1988</year>.
[bib256] [KK88b] and The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound. In L.N.Kanal and V.Kumar, editors, Search in Artificial Intelligence, 1–27. Springer-Verlag, New York, NY, <year>1988</year>.
[bib257] [KK93] and Efficient Parallel Mappings of a Dynamic Programming Algorithm. In Proceedings of 7th International Parallel Processing Symposium, number 563–568, 1993.
[bib258] [KK94] and Unstructured tree search on simd parallel computers. Journal of Parallel and Distributed Computing, :379–391, September 1994.
[bib259] [KK99] and Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, :278–300, 1999.
[bib260] [KKKS94] L.N.Kanal, V.Kumar, H.Kitano, and C.Suttner, editors. Parallel Processing for Artificial Intelligence. North-Holland, Amsterdam, The Netherlands, <year>1994</year>.
[bib261] [KN91] and Probabilistic analysis of the efficiency of the dynamic load distribution. In Sixth Distributed Memory Computing Conference Proceedings, 1991.
[bib262] [Knu73] The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, MA, <year>1973</year>.
[bib263] [Kor81] The use of parallelism to implement a heuristic search. In Proceedings of the International Joint Conference on Artificial Intelligence, 575–580, 1981.
[bib264] [Kow88] Parallel Computation and Computers for Artificial Intelligence. Kluwer Academic Publishers, Boston, MA, <year>1988</year>.
[bib265] [KP92] and Branch-and-bound and backtrack search on mesh-connected arrays of processors. In Proceedings of Fourth Annual Symposium on Parallel Algorithms and Architectures, 118–126, 1992.
[bib266] [KR87a] and Array processor with multiple broadcasting. Journal of Parallel and Distributed Computing, 173–190, 1987.
[bib267] [KR87b] and Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, :501–519, 1987.
[bib268] [KR88] and A survey of complexity of algorithms for shared-memory machines. Technical Report 408, University of California, Berkeley, 1988.
[bib269] [KR89] and Load balancing on the hypercube architecture. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 603–608, 1989.
[bib270] [KRR88] and Parallel best-first search of state-space graphs: A summary of results. In Proceedings of the 1988 National Conference on Artificial Intelligence, 122–126, 1988.
[bib271] [KRS88] and A complexity theory of efficient parallel algorithms. Technical Report RC13572, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1988.
[bib272] [Kru56] On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the AMS, , 48–50, 1956.
[bib273] [KS88] and The horizon supercomputing system: architecture and software. In Proceedings of Supercomputing Conference, 28–34, 1988.
[bib274] [KS91a] and Efficient parallel execution of IDA* on shared and distributed-memory multiprocessors. In Sixth Distributed Memory Computing Conference Proceedings, 1991.
[bib275] [KS91b] and Scalability of parallel algorithms for the all-pairs shortest path problem. Journal of Parallel and Distributed Computing, :124–138, October 1991. A short version appears in the Proceedings of the International Conference on Parallel Processing, 1990.
[bib277] [KU86] and Parallel hashing – an efficient implementation of shared memory. In Proceedings of 18th ACM Conference on Theory of Computing, 160–168, 1986.
[bib278] [Kun80] The structure of parallel algorithms. In M.Yovits, editor, Advances in Computing, 73–74. Academic Press, San Diego, CA, <year>1980</year>.
[bib279] [Kun86] Memory requirements for balanced computer architectures. In Proceedings of the 1986 IEEE Symposium on Computer Architecture, 49–54, 1986.
[bib280] [KV92] and Comparison of meshes vs. hypercubes for data rearrangement. Technical Report UCF-CS-92-28, Department of Computer Science, University of Central Florida, Orlando, FL, 1992.
[bib281] [KZ88] and A randomized parallel branch-and-bound procedure. In Proceedings of the ACM Annual Symposium on Theory of Computing, 290–300, 1988.
[bib282] [Law75] Access and alignment of data in an array processor. IEEE Transactions on Computers, :1145–1155, 1975.
[bib283] [LB95a] and Threads Primer: A Guide to Multithreaded Programming. Prentice Hall PTR/Sun Microsystems Press, <year>1995</year>.
[bib284] [LB95b] and Limits on the performance benefits of multithreading and prefetching. Research report RC 20238 (89547), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 1995.
[bib285] [LB97] and Multithreaded Programming with Pthreads. Prentice Hall PTR/Sun Microsystems Press, <year>1997</year>.
[bib286] [LB98] and Multithreaded Programming with PThreads. Sun Microsystems Press / Prentice Hall, <year>1998</year>.
[bib287] [LC88] and A parallel triangular solver for a hypercube multiprocessor. SIAM Journal on Scientific and Statistical Computing, :485–502, 1988.
[bib288] [LC89] and A new method for solving triangular systems on distributed memory message passing multiprocessors. SIAM Journal on Scientific and Statistical Computing, :382–396, 1989.
[bib289] [LD90] and Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. McGraw-Hill, New York, NY, <year>1990</year>.
[bib290] [LDP89] and Multiprogramming a distributed-memory multiprocessor. Concurrency: Practice and Experience, :19–33, September 1989.
[bib291] [Lea99] Concurrent Programming in Java, Second Edition: Design Principles and Patterns. Addison-Wesley, <year>1999</year>.
[bib292] [Lei83] Parallel computation using mesh of trees. In Proceedings of International Workshop on Graph-Theoretic Concepts in Computer Science, 1983.
[bib293] [Lei85a] Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers, :344–354, April 1985.
[bib294] [Lei85b] Fat-trees: Universal networks for hardware efficient supercomputing. In Proceedings of the 1985 International Conference on Parallel Processing, 393–402, 1985.
[bib295] [Lei92] Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Mateo, CA, <year>1992</year>.
[bib296] [LER92] and Introduction to Parallel Computing. Prentice-Hall, Englewood Cliffs, NJ, <year>1992</year>.
[bib297] [Les93] The Art of Parallel Programming. Prentice-Hall, Englewood Cliffs, NJ, <year>1993</year>.
[bib298] [Lev87] Measuring communications structures in parallel architectures and algorithms. In L.H.Jamieson, D.B.Gannon, and R.J.Douglass, editors, The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, <year>1987</year>.
[bib299] [Lew91] Posix Programmer’s Guide: Writing Portable Unix Programs with the Posix. 1 Standard. O’Reilly & Associates, <year>1991</year>.
[bib300] [LHZ98] and OpenMP on networks of workstations. In SC ’98, High Performance Networking and Computing Conference, Orlando, Florida, 1998.
[bib301] [Lil92] Architectural Alternatives for Exploiting Parallelism. IEEE Computer Society Press, Los Alamitos, CA, <year>1992</year>.
[bib302] [Lin83] The key node method: A highly parallel alpha-beta algorithm. Technical Report 83-101, Computer Science Department, University of Utah, Salt Lake City, UT, 1983.
[bib303] [Lin92] A distributed fair polling scheme applied to or-parallel logic programming. International Journal of Parallel Programming, , August 1992.
[bib304] [LK72] and Cellular arrays for the solution of graph problems. Communications of the ACM, :789–801, September 1972.
[bib305] [LK85] and A hybrid SSS*/alpha-beta algorithm for parallel search of game trees. In Proceedings of the International Joint Conference on Artificial Intelligence, 1044–1046, 1985.
[bib307] [LM97] and Computational experience of an interior-point algorithm in a parallel branch-and-cut framework. In Proceedings for SIAM Conference on Parallel Processing for Scientific Computing, 1997.
[bib308] [LMR88] and Universal packet routing algorithms. In 29th Annual Symposium on Foundations of Computer Science, 256–271, 1988.
[bib309] [Loa92] Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, PA, <year>1992</year>.
[bib310] [LP92] and Parallel algorithms for the quadratic assignment problem. In P.M.Pardalos, editor, Advances in Optimization and Parallel Computing, 177–189. North-Holland, Amsterdam, The Netherlands, <year>1992</year>.
[bib311] [LPP88] and A probabilistic simulation of PRAMs on a bounded degree network. Information Processing Letters, :141–147, July 1988.
[bib312] [LPP89] and A new scheme for deterministic simulation of PRAMs in VLSI. SIAM Journal of Computing, 1989.
[bib313] [LRZ95] and Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Santa Barbara, CA, 1995.
[bib314] [LS84] and Anomalies in parallel branch and bound algorithms. Communications of the ACM, 594–602, 1984.
[bib315] [LS85] and Performance of parallel branch-and-bound algorithms. IEEE Transactions on Computers, , October 1985.
[bib316] [LS86] and A note on anomalies in parallel branch-and-bound algorithms with one-to-one bounding functions. Information Processing Letters, :119–122, October 1986.
[bib317] [LSS88] and A hypercube algorithm for the 0/1 knapsack problem. Journal of Parallel and Distributed Computing, :438–456, 1988.
[bib318] [Lub86] A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, :1036–1053, 1986.
[bib320] [LW84] and Computational efficiency of parallel approximate branch-and-bound algorithms. In Proceedings of the 1984 International Conference on Parallel Processing, 473–480, 1984.
[bib321] [LW85] and Parallel processing of serial dynamic programming problems. In Proceedings of COMPSAC 85, 81–89, 1985.
[bib322] [LW86] and Coping with anomalies in parallel branch-and-bound algorithms. IEEE Transactions on Computers, , June 1986.
[bib323] [LW95] and Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, San Mateo, CA, <year>1995</year>.
[bib324] [LY86] and New lower bounds for parallel computations. In Proceedings of 18th ACM Conference on Theory of Computing, 177–187, 1986.
[bib325] [MC82] and Parallel search of strongly ordered game trees. Computing Surveys, :533–551, 1982.
[bib327] [MD93] and Scalable duplicate pruning strategies for parallel A* graph search. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993.
[bib328] [MdV87] and Hypercube algorithms and implementations. SIAM Journal on Scientific and Statistical Computing, :s227–s287, March 1987.
[bib329] [Mes94] . MPI: A Message-Passing Interface Standard. Available at http://www.mpi-forum.org. May 1994.
[bib330] [Mes97] . MPI-2: Extensions to the Message-Passing Interface. Available at http://www.mpi-forum.org. July 1997.
[bib331] [MFMV90] and Parallel game tree search by dynamic tree decomposition. In V.Kumar, P.S.Gopalakrishnan, and L.N.Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.
[bib332] [MG89] and A fast parallel quicksort algorithm. Information Processing Letters, :97–102, 1989.
[bib333] [Mil91] Exact distributed algorithms for travelling salesman problem. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.
[bib334] [MK99] and Concurrency: State Models and Java Programs. John Wiley & Sons, <year>1999</year>.
[bib335] [MKRS88] and Meshes with reconfigurable buses. In Proceedings of MIT Conference on Advanced Research in VLSI, 163–178, 1988.
[bib336] [MM73] and From dynamic programming to search algorithms with functional costs. In Proceedings of the International Joint Conference on Artifi cial Intelligence, 345–349, 1973.
[bib337] [MM91] P.Messina and A.Murli, editors. Practical Parallel Computing: Status and Prospects. Wiley, Chichester, UK, <year>1991</year>.
[bib338] [MMR95] and A parallel depth first search branch and bound for the quadratic assignment problem. European Journal of Operational Research, :617–628, 1995.
[bib339] [Mod88] Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford, UK, <year>1988</year>.
[bib340] [Moh83] Experience with two parallel programs solving the traveling salesman problem. In Proceedings of the 1983 International Conference on Parallel Processing, 191–193, 1983.
[bib341] [Mol86] Matrix computation on distributed-memory multiprocessors. In M.T.Heath, editor, Hypercube Multiprocessors 1986, 181–195. SIAM, Philadelphia, PA, <year>1986</year>.
[bib342] [Mol87] Another look at Amdahl’s law. Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987.
[bib343] [Mol93] Parallel Processing: From Applications to Systems. Morgan Kaufmann, San Mateo, CA, <year>1993</year>.
[bib344] [MP85] and Parallel game tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, :442–452, July 1985.
[bib345] [MP93] and The role of performance metrics for parallel mathematical programming algorithms. ORSA Journal on Computing, , 1993.
[bib346] [MR] and On high level characterization of parallelism. Technical Report CSD-TR-1011, CAPO Report CER-90-32, Computer Science Department, Purdue University, West Lafayette, IN. Also published in Journal of Parallel and Distributed Computing, 1993.
[bib347] [MRSR92] and Parallel Branch-and-Bound, 111–150. Advanced Topics in Computer Science. Blackwell Scientific Publications, Oxford, UK, <year>1992</year>.
[bib348] [MS88] and Downward scalability of parallel architectures. In Proceedings of the 1988 International Conference on Supercomputing, 109–120, 1988.
[bib349] [MS90] and Probabilistic performance analysis of heuristic search using parallel hash tables. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 1990.
[bib350] [MS96] R. Miller and Parallel Algorithms for Regular Architectures. MIT Press, Cambridge, MA, <year>1996</year>.
[bib351] [MV84] and Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, :339–374, November 1984.
[bib352] [MV85] and The ring machine. Technical report, University of Paderborn, Germany, 1985. Also in Computers and Artificial Intelligence, (1987).
[bib353] [MV87] and Parallel processing of combinatorial search trees. In Proceedings of International Workshop on Parallel Algorithms and Architectures, 1987.
[bib354] [MVS86] and Superlinear speedup for parallel backtracking. Technical Report 30, University of Paderborn, Germany, 1986.
[bib356] [Nat90] Investigating the practical value of Cole’s O (log n) time crew pram merge sort algorithm. In 5th International Symposium on Computing and Information Sciences, October 1990.
[bib357] [NBF96] , and Pthreads Programming. O’Reilly & Associates, Newton, MA 02164, <year>1996</year>.
[bib359] [ND96] and Thread time: the multithreaded programming guide. Hewlett-Packard professional books. Prentice-Hall, Englewood Cliffs, NJ 07632, <year>1996</year>.
[bib360] [Ni91] A layered classification of parallel computers. In Proceedings of 1991 International Conference for Young Computer Scientists, 28–33, 1991.
[bib361] [Nic90] The design of the MasPar MP-1: A cost-effective massively parallel computer. In IEEE Digest of Papers—Comcom, 25–28. IEEE Computer Society Press, Los Alamitos, CA, 1990.
[bib362] [NM93] and . A survey of wormhole routing techniques in direct connect networks. IEEE Computer, , February 1993.
[bib363] [NMB83] and Efficient VLSI networks for parallel processing based on orthogonal trees . IEEE Transactions on Computers, :21–23, June 1983.
[bib364] [NS79] and Bitonic sort on a mesh connected parallel computer. IEEE Transactions on Computers, , January 1979.
[bib365] [NS80] and Finding connected components and connected ones on a mesh-connected computer. SIAM Journal of Computing, :744–757, November 1980.
[bib366] [NS87] and Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, :581–591, 1987.
[bib367] [NS93] and Sorting n numbers on n × n reconfigurable meshes with buses. In 7th International Parallel Processing Symposium, 174–181, 1993.
[bib368] [NSF91] Grand Challenges: High Performance Computing and Communications. A Report by the Committee on Physical, Mathematical and Engineering Sciences, NSF/CISE, 1800 G Street NW, Washington, DC, 20550, <year>1991</year>.
[bib369] [Nug88] The iPSC/2 direct-connect communications technology. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 51–60, 1988.
[bib370] [Nus82] Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, New York, NY, <year>1982</year>.
[bib371] [NW88] and Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, :404–420, 1988.
[bib372] [GOV99] Funding a Revolution: Government Support for Computing Research. National Academy Press, <year>1999</year>.
[bib373] [OR88] and The ijk forms of factorization methods II: Parallel systems. Parallel Computing, :149–162, 1988.
[bib374] [Ort88] Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, NY, <year>1988</year>.
[bib375] [OS85] and Data-flow algorithms for parallel matrix computations. Communications of the ACM, :840–853, 1985.
[bib376] [OS86] and Assignment and scheduling in parallel matrix factorization. Linear Algebra and its Applications, :275–299, 1986.
[bib378] [PBG+85] , and The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of 1985 International Conference on Parallel Processing, 764– 771, 1985.
[bib379] [PC89] and A parallel algorithm for the quadratic assignment problem. In Supercomputing ‘89 Proceedings, 351–360. ACM Press, New York, NY, 1989.
[bib380] [PD89] and Dynamic partitioning of multiprocessor systems. International Journal of Parallel Processing, :91–120, 1989.
[bib381] [Pea84] Heuristics—Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, MA, <year>1984</year>.
[bib383] [Pfi98] In Search of Clusters. Prentice Hall, Englewood Cliffs, NJ, <year>1998</year>. 2nd Edition.
[bib384] [PFK90] and Parallel heuristic search: Two approaches. In V.Kumar, and editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.
[bib386] [PH90] and Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.
[bib387] [PH96] and Computer Architecture: A Quantitative Approach, 2nd edition. Morgan Kaufmann, San Mateo, CA, <year>1996</year>.
[bib388] [PK89] and Parallel algorithms for shortest path problems. In Proceedings of 1989 International Conference on Parallel Processing, 14– 19, 1989.
[bib389] [PK95] and An inherently parallel method for heuristic problem-solving: Part I – general framework. IEEE Transactions on Parallel and Distributed Systems, , October 1995.
[bib391] [Pla89] Load balancing, selection and sorting on the hypercube. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 64–73, 1989.
[bib392] [PLRR94] and Lower bounds for the quadratic assignment problem. Annals of Operations Research, :387–411, <year>1994</year>. Special Volume on Applications of Combinatorial Optimization.
[bib393] [PR85] and Efficient parallel solution of linear systems. In 17th Annual ACM Symposium on Theory of Computing, 143–152, 1985.
[bib394] [PR89] and Parallel branch-and-bound algorithms for unconstrainted quadratic zero-one programming. In R.Sharda et al., editors, Impacts of Recent Computer Advances on Operations Research, 131–143. North-Holland, Amsterdam, The Netherlands, <year>1989</year>.
[bib395] [PR90] and Parallel branch-and-bound algorithms for quadratic zero-one programming on a hypercube architecture. Annals of Operations Research, :271–292, <year>1990</year>.
[bib396] [PR91] and A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, :60– 100, 1991.
[bib397] [Pri57] Shortest connection network and some generalizations. Bell Systems Technical Journal, :1389–1401, 1957.
[bib398] [PRV88] and Algorithm PR2 for the parallel size reduction of the 0/1 multiknapsack problem. In INRIA Rapports de Recherche, number 811, 1988.
[bib399] [PS82] and Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, <year>1982</year>.
[bib400] [PV80] and Area-time optimal VLSI networks for matrix multiplication. In Proceedings of the 14th Princeton Conference on Information Science and Systems, 300–309, 1980.
[bib401] [PY88] and Towards an architecture independent analysis of parallel algorithms. In Proceedings of 20th ACM Symposium on Theory of Computing, 510–513, 1988.
[bib402] [QD86] and An upper bound for the speedup of parallel branch-and-bound algorithms. BIT, , March 1986.
[bib403] [Qui88] Parallel sorting algorithms for tightly coupled multiprocessors. Parallel Computing, :349–357, 1988.
[bib404] [Qui89] Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer. IEEE Transactions on Computers, 1989.
[bib405] [Qui94] Parallel Computing: Theory and Practice. McGraw-Hill, New York, NY, <year>1994</year>.
[bib406] [Ram97] Qsm: A general purpose shared-memory model for parallel computation. In Foundations of Software Technology and Theoretical Computer Science, 1–5, <year>1997</year>.
[bib407] [Ran89] Fluent Parallel Computation. Ph.D. Thesis, Department of Computer Science, Yale University, New Haven, CT, <year>1989</year>.
[bib408] [Ran91] Optimal speedup for backtrack search on a butterfly network. In Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, 1991.
[bib409] [Rao90] Parallel Processing of Heuristic Search. Ph.D. Thesis, University of Texas, Austin, TX, <year>1990</year>.
[bib410] [Ras78] Performance Evaluation of Multiple Processor Systems. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, <year>1978</year>.
[bib411] [RB76] and A new principle for Fast fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, :264–265, 1976.
[bib412] [RDK89] and Analytical and heuristic modeling of distributed algorithms. Technical Report E3646, FMC Corporation, Advanced Systems Center, Minneapolis, MN, 1989.
[bib413] [Rei81] Probabilistic algorithms for sorting and selection. SIAM Journal of Computing, 396–409, 1981.
[bib414] [RF89] and Multicomputer Networks: Message-Based Parallel Processing. MIT Press, Cambridge, MA, <year>1989</year>.
[bib415] [RICN88] and An efficient termination detection and abortion algorithm for distributed processing systems. In Proceedings of 1988 International Conference on Parallel Processing: Vol. I, 18–22, 1988.
[bib416] [RK87] and Parallel depth-first search, part I: Implementation. International Journal of Parallel Programming, :479–499, 1987.
[bib418] [RK88b] and Superlinear speedup in state-space search. In Proceedings of the 1988 Foundation of Software Technology and Theoretical Computer Science, number 338, 161–174. Springer-Verlag Series Lecture Notes in Computer Science, 1988.
[bib419] [RK93] and On the efficicency of parallel backtracking. IEEE Transactions on Parallel and Distributed Systems, :427–437, April 1993. available as a technical report TR 90-55, Computer Science Department, University of Minnesota.
[bib420] [RKR87] and A parallel implementation of iterative-deepening-A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI-87), 878–882, 1987.
[bib421] [RND77] and Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, <year>1977</year>.
[bib422] [RO88] and Parallel solution of triangular systems of equations. Parallel Computing, :109–114, 1988.
[bib423] [Rob75] Probabilistic algorithms. In J.Traub, editor, Algorithms and Complexity: New Directions and Recent Results, 21–39. Academic Press, San Diego, CA, <year>1975</year>.
[bib424] [Rob90] The Impact of Vector and Parallel Architectures on Gaussian Elimination. John Wiley and Sons, New York, NY, <year>1990</year>.
[bib425] [Rom87] The parallel solution of triangular systems on a hypercube. In editor, Hypercube Multiprocessors 1987, 552–559. SIAM, Philadelphia, PA, <year>1987</year>.
[bib426] [Rou87] A parallel branch-and-bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, :211–225, 1987.
[bib427] [Rou91] Parallel branch-and-bound on shared-memory multiprocessors. In Proceedings of the Workshop On Parallel Computing of Discrete Optimization Problems, 1991.
[bib428] [RRRR96] and Practical UNIX Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice Hall, <year>1996</year>.
[bib429] [RS90a] and The tera computer system. In International Conference on Supercomputing, 1–6, 1990.
[bib430] [RS90b] and Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, <year>1990</year>.
[bib431] [RV87] and A logarithmic time sort for linear size networks. Journal of the ACM, :60–76, January 1987.
[bib432] [Ryt88] Efficient parallel computations for dynamic programming. Theoretical Computer Science, :297–307, 1988.
[bib433] [RZ89] M.Reeve and S.E.Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, Chichester, UK, <year>1989</year>.
[bib434] [Saa86] Communication complexity of the Gaussian elimination algorithm on multiprocessors. Linear Algebra and its Applications, :315–340, 1986.
[bib435] [SB77] and A large scale, homogeneous, fully distributed parallel machine. In Proceedings of Fourth Symposium on Computer Architecture, 105–124, March 1977.
[bib436] [SBCV90] and Analysis of multithreaded architectures for parallel computing. Report UCB/CSD 90/569, University of California, Berkeley, Computer Science Division, Berkeley, CA, April 1990.
[bib437] [Sch80] Ultracomputers. ACM Transactions on Programming Languages and Systems, :484–521, October 1980.
[bib440] [Sei89] Circuit-switched vs. store-and-forward solutions to symmetric communication problems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 253–255, 1989.
[bib441] [Sei92] Mosaic C: An experimental fine-grain multicomputer. Technical report, California Institute of Technology, Pasadena, CA, 1992.
[bib442] [SG88] and Binsorting on hypercube with d-port communication. In Proceedings of the Third Conference on Hypercube Concurrent Computers, 1455–1461, January 1988.
[bib443] [SG91] and Toward a better parallel performance metric. Parallel Computing, :1093–1109, <year>December 1991</year>. Also available as Technical Report IS-5053, UC-32, Ames Laboratory, Iowa State University, Ames, IA.
[bib446] [SHG93] and Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, :42–50, 1993.
[bib447] [Sie77] The universality of various types of SIMD machine interconnection networks. In Proceedings of the 4th Annual Symposium on Computer Architecture , 23–25, 1977.
[bib448] [Sie85] Interconnection Networks for Large-Scale Parallel Processing. D. C. Heath, Lexington, MA, <year>1985</year>.
[bib449] [SJ81] and Fast, efficient parallel algorithms for some graph problems. SIAM Journal of Computing, :682–690, November 1981.
[bib450] [SK89] and A dynamic scheduling strategy for the chare-kernel system. In Proceedings of Supercomputing Conference, 389–398, 1989.
[bib451] [SK90] and Consistent linear speedup to a first solution in parallel state-space search. In Proceedings of the 1990 National Conference on Artificial Intelligence, 227–233, August 1990.
[bib452] [SKAT91a] and Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, :95–131, 1991.
[bib453] [SKAT91b] and Scalability of parallel sorting on mesh multicomputers. International Journal of Parallel Programming, , 1991.
[bib454] [SM86] and The DADO production system machine. Journal of Parallel and Distributed Computing, :269–296, June 1986.
[bib455] [Smi84] Random trees and the analysis of branch and bound procedures. Journal of the ACM, , 1984.
[bib456] [SN90] and Another view of parallel speedup. In Supercomputing ’90 Proceedings, 324–333, 1990.
[bib457] [SN93] and Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, :27–37, September 1993.
[bib458] [Sni82] On parallel search. In Proceedings of Principles of Distributed Computing, 242–253, 1982.
[bib460] [Sny86] Type architectures, shared-memory and the corollary of modest potential. Annual Review of Computer Science, :289–317, <year>1986</year>.
[bib462] [Sol77] An algorithm attributed to Sollin. In S.Goodman and S.Hedetniemi, editors, Introduction to The Design and Analysis of Algorithms. McGraw-Hill, Cambridge, MA, <year>1977</year>.
[bib463] [SR91] and Scalability of parallel algorithm-machine combinations. Technical Report IS-5057, Ames Laboratory, Iowa State University, Ames, IA, 1991. Also published in IEEE Transactions on Parallel and Distributed Systems.
[bib464] [SS88] and Topological properties of hypercubes. IEEE Transactions on Computers, :867–872, 1988.
[bib465] [SS89a] and Data communication in hypercubes. Journal of Parallel and Distributed Computing, :115–135, 1989. Also available as Technical Report YALEU/DCS/RR-428 from the Department of Computer Science, Yale University, New Haven, CT.
[bib466] [SS89b] and Data communication in parallel architectures. Parallel Computing, :131–150, 1989.
[bib467] [SS90] and Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, :361–372, 1990.
[bib468] [ST95] Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In Proceedings of the 9th International Parallel Processing Symposium, 182– 189, Santa Barbara, CA, April 1995.
[bib469] [Sto71] Parallel processing with the perfect shuffle. IEEE Transactions on Computers, :153–161, 1971.
[bib470] [Sto93] High-Performance Computer Architecture: Third Edition. Addison-Wesley, Reading, MA, <year>1993</year>.
[bib471] [Sun95] Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, <year>1995</year>.
[bib473] [SV81] and Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms, 88–102, 1981.
[bib474] [SV82] and An O (n2 log n) parallel max-flow algorithm. Journal of Algorithms, :128–146, 1982.
[bib475] [SW87] and Passing messages in link-bound hypercubes. In M.T.Heath, editor, Hypercube Multiprocessors 1987, 251–257. SIAM, Philadelphia, PA, <year>1987</year>.
[bib477] [SZ96] and Performance considerations of shared virtual memory machines. IEEE Trans. on Parallel and Distributed Systems, :1185– 1194, 1996.
[bib480] [Tak87] An optimal routing method of all-to-all communication on hypercube networks. In 35th Information Processing Society of Japan, <year>1987</year>.
[bib481] [TEL95] and Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995.
[bib482] [Ten90] Adaptive parallel algorithms for integral knapsack problems. Journal of Parallel and Distributed Computing, :400–406, 1990.
[bib485] [Tho83] Fourier transforms in VLSI. IBM Journal of Research and Development, :1047–1057, 1983.
[bib486] [Thr99] Standards: OpenMP: Shared-memory parallelism from the ashes. Computer, :108–109, May 1999.
[bib487] [Tic88] Parallel matrix multiplication on the connection machine. Technical Report RIACS TR 88.41, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1988.
[bib488] [TK77] and Sorting on a mesh-connected parallel computer. Communications of the ACM, :263–271, 1977.
[bib489] [TL90] and Optimal granularity of grid iteration problems. In Proceedings of the 1990 International Conference on Parallel Processing, I111–I118, 1990.
[bib490] [Tul96] Simultaneous multithreading. Ph.D. Thesis, University of Washington, Seattle, WA, <year>1996</year>.
[bib491] [TV85] and An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, :862–874, November 1985.
[bib492] [TY96] and The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 35–46, 1996.
[bib493] [Upf84] A probabilistic relation between desirable and feasible models of parallel computation. In Proceedings of the 16th ACM Conference on Theory of Computing, 258–265, <year>1984</year>.
[bib494] [UW84] and How to share memory in a distributed system. In Proceedings of the 25th Annual Symposium on the Foundation of Computer Science, 171–180, 1984.
[bib495] [Val75] Parallelism in comparison problems. SIAM Journal of Computing, :348–355, September 1975.
[bib496] [Val82] A scheme for fast parallel communication. SIAM Journal on Computing, :350–361, 1982.
[bib498] [Val90b] General purpose parallel architectures. Handbook of Theoretical Computer Science, <year>1990</year>.
[bib499] [Vav89] Gaussian elimination with pivoting is P-complete. SIAM Journal on Discrete Mathematics, :413–423, 1989.
[bib500] [VB81] and Universal schemes for parallel communication. In Proceedings of the 13th ACM Symposium on Theory of Computation, 263–277, 1981.
[bib501] [VC89] Towards a general model for evaluating the relative performance of computer systems. International Journal of Supercomputer Applications, :100–108, 1989.
[bib502] [Vor86] Implementing branch-and-bound in a ring of processors. Technical Report 29, University of Paderborn, Germany, 1986.
[bib503] [Vor87a] The personal supercomputer: A network of transputers. In Proceedings of the 1987 International Conference on Supercomputing, 1987.
[bib504] [Vor87b] Load balancing in a network of transputers. In Proceedings of the Second International Workshop on Distributed Parallel Algorithms, 1987.
[bib505] [VS86] and New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, May 1986.
[bib506] [VSBR83] and Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, :641–644, 1983.
[bib507] [WA98] and Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, <year>1998</year>.
[bib509] [Wag87] Hyperquicksort: A fast sorting algorithm for hypercubes. In Proceedings of the Second Conference on Hypercube Multiprocessors, 292– 299, 1987.
[bib510] [Wal91] Parallel Processing and Ada. Prentice-Hall, Englewood Cliffs, NJ, <year>1991</year>.
[bib512] [Wei97] Active threads manual. Technical Report TR-97-037, International Computer Science Institute, Berkeley, CA 94704, <year>1997</year>.
[bib513] [WF80] and On a class of multistage interconnection networks. IEEE Transactions on Computers, 669–702, August 1980.
[bib514] [WF84] and Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington, DC, <year>1984</year>.
[bib515] [WI89] and A distributed shortest path algorithm and its mapping on the Multi-PSI. In Proceedings of International Conference of Parallel Processing, 1989.
[bib518] [Wil00] Windows 2000 Systems Programming Black Book. The Coriolis Group, <year>2000</year>.
[bib519] [Win77] A new method for computing DFT. In IEEE International Conference on Acoustics, Speech and Signal Processing, 366–368, 1977.
[bib520] [WL88] and Systolic processing for dynamic programming problems. Circuits, Systems, and Signal Processing, :119–149, 1988.
[bib521] [WLY84] and The status of MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. In Proceedings of 11th Annual International Symposium on Computer Architecture, 56–63, 1984.
[bib522] [WM84] and MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. IEEE Transactions on Computers, , May 1984.
[bib523] [Woo86] J.V.Woods, editor. Fifth Generation Computer Architectures. North-Holland, Amsterdam, The Netherlands, <year>1986</year>.
[bib524] [Wor88] Information Requirements and the Implications for Parallel Computation. Ph.D. Thesis, Stanford University, Department of Computer Science, Palo Alto, CA, <year>1988</year>.
[bib525] [Wor90] The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, :838–858, 1990.
[bib526] [Wor91] Limits on parallelism in the numerical solution of linear PDEs. SIAM Journal on Scientific and Statistical Computing, :1–35, January 1991.
[bib527] [WS88] and A balanced bin sort for hypercube multiprocessors. Journal of Supercomputing, :435–448, 1988.
[bib528] [WS89] and Hypercube computing: Connected components. Journal of Supercomputing, :209–234, 1989.
[bib529] [WS91] and Computing biconnected components on a hypercube. Journal of Supercomputing, June 1991. Also available as Technical Report TR 89-7 from the Department of Computer Science, University of Minnesota, Minneapolis, MN.
[bib530] [WY85] and Stochastic modeling of branch-and-bound algorithms with best-first search. IEEE Transactions on Software Engineering, , September 1985.
[bib531] [Zho89] Bridging the gap between Amdahl’s law and Sandia laboratory’s result. Communications of the ACM, :1014–5, 1989.