Bibliography

[bib1] [ABJ82] S. G. Akl, D. T. Bernard, and R. J. Jordan. Design and implementation of a parallel tree search algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4:192–203, 1982.

[bib2] [ACM91] ACM. Resources in Parallel and Concurrent Systems. ACM Press, New York, NY, <year>1991</year>.

[bib3] [ACS89a] A. Aggarwal, A. K. Chandra, and M. Snir. A model for hierarchical memory. Technical Report RC 15118 (No. 67337), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[bib4] [ACS89b] A. Aggarwal, A. K. Chandra, and M. Snir. On communication latency in PRAM computations. Technical Report RC 14973 (No. 66882), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[bib5] [ACS89c] A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Technical Report RC 14998 (64644), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[bib6] [ADJ+91] A. Agarwal, G. D’Souza, K. Johnson, D. Kranz, J. Kubiatowicz, K. Kurihara, B. -H. Lim, G. Maa, D. Nussbaum, M. Parkin, and D. Yeung. The MIT alewife machine : A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on Scalable Shared Memory Multiprocessors. Kluwer Academic, 1991.

[bib7] [AFKW90] I. Angus, G. C. Fox, J. Kim, and D. W. Walker. Solving Problems on Concurrent Processors: Software for Concurrent Processors: Volume II. Prentice-Hall, Englewood Cliffs, NJ, <year>1990</year>.

[bib8] [AG94] G. S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin/Cummings, Redwood City, CA, <year>1994</year>. (Second Edition).

[bib9] [Aga89] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical Report 89-566, Massachusetts Institute of Technology, Microsystems Program Office, Cambridge, MA, 1989.

[bib10] [Aga91] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical report MIT/LCS/TR 501; VLSI memo no. 89-566, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1991.

[bib11] [AGGM90] A. Averbuch, E. Gabber, B. Gordissky, and Y. Medan. A parallel FFT on an MIMD machine. Parallel Computing, 15:6174, 1990.

[bib12] [Agh86] G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA, <year>1986</year>.

[bib13] [AHMP87] H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata. Deterministic simulation of idealized parallel computers on more realistic ones. SIAM Journal of Computing, 16(5):808–835, October 1987.

[bib14] [AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, <year>1974</year>.

[bib15] [AJM88] D. P. Agrawal, V. K. Janakiram, and R. Mehrotra. A randomized parallel branch-and-bound algorithm. In Proceedings of the 1988 International Conference on Parallel Processing, 1988.

[bib16] [AK84] M. J. Atallah and S. R. Kosaraju. Graph problems on a mesh-connected processor array. Journal of ACM, 31(3):649–667, July 1984.

[bib17] [Akl85] S. G. Akl. Parallel Sorting Algorithms. Academic Press, San Diego, CA, <year>1985</year>.

[bib18] [Akl89] S. G. Akl. The Design and Analysis of Parallel Algorithms. Prentice-Hall, Englewood Cliffs, NJ, <year>1989</year>.

[bib19] [Akl97] S. G. Akl. Parallel Computation Models and Methods. Prentice-Hall, Englewood Cliffs, NJ, <year>1997</year>.

[bib20] [AKR89] S. Arvindam, V. Kumar, and V. N. Rao. Floorplan optimization on multiprocessors. In Proceedings of the 1989 International Conference on Computer Design, 1989. Also published as Technical Report ACT-OODS-241-89, Microelectronics and Computer Corporation, Austin, TX.

[bib21] [AKR90] S. Arvindam, V. Kumar, and V. N. Rao. Efficient parallel algorithms for search problems: Applications in VLSI CAD. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 1990.

[bib22] [AKRS91] S. Arvindam, V. Kumar, V. N. Rao, and V. Singh. Automatic test pattern generation on multiprocessors. Parallel Computing, 17, number 12:1323–1342, December 1991.

[bib23] [AKS83] M. Ajtai, J. Komlos, and E. Szemeredi. An O (n log n) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 1–9, 1983.

[bib24] [AL93] S. G. Akl and K. A. Lyons. Parallel Computational Geometry. Prentice-Hall, Englewood Cliffs, NJ, <year>1993</year>.

[bib25] [AM88] T. S. Abdelrahman and T. N. Mudge. Parallel branch-and-bound algorithms on hypercube multiprocessors. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 1492–1499, New York, NY, 1988. ACM Press.

[bib26] [Amd67] G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, 483–485, 1967.

[bib27] [And91] G. R. Andrews. Concurrent Programming: Principles and Practice. Benjamin/Cummings, Redwood City, CA, <year>1991</year>.

[bib28] [AOB93] B. Abali, F. Ozguner, and A. Bataineh. Balanced parallel sort on hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 4(5):572–581, May 1993.

[bib29] [AS87] B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, C– 36(10):1258–1263, October 1987.

[bib30] [AU72] A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation and Compiling: Volume 1, Parsing. Prentice-Hall, Englewood Cliffs, NJ, <year>1972</year>.

[bib31] [B+97] L. S. Blackford et al. ScaLAPACK Users’ Guide. SIAM, <year>1997</year>.

[bib32] [BA82] M. Ben-Ari. Principles of Concurrent Programming. Prentice-Hall, Englewood Cliffs, NJ, <year>1982</year>.

[bib33] [Bab88] R. G. Babb. Programming Parallel Processors. Addison-Wesley, Reading, MA, <year>1988</year>.

[bib34] [Bai90] D. H. Bailey. FFTs in external or hierarchical memory. Journal of Supercomputing, 4:23–35, 1990.

[bib35] [Bar68] G. H. Barnes. The ILLIAC IV computer. IEEE Transactions on Computers, C-17(8):746–757, 1968.

[bib36] [Bat68] K. E. Batcher. Sorting networks and their applications. In Proceedings of the 1968 Spring Joint Computer Conference , 307–314, 1968.

[bib37] [Bat76] K. E. Batcher. The Flip network in STARAN. In Proceedings of International Conference on Parallel Processing, 65–71, 1976.

[bib38] [Bat80] K. E. Batcher. Design of a massively parallel processor. IEEE Transactions on Computers, 836–840, September 1980.

[bib39] [Bau78] G. M. Baudet. The Design and Analysis of Algorithms for Asynchronous Multiprocessors. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, <year>1978</year>.

[bib40] [BB90] K. P. Belkhale and P. Banerjee. Approximate algorithms for the partitionable independent task scheduling problem. In Proceedings of the 1990 International Conference on Parallel Processing, I72–I75, 1990.

[bib41] [BBN89] BBN Advanced Computers Inc. TC-2000 Technical Product Summary. Cambridge, MA. <year>1989</year>.

[bib42] [BCCL95] R. E. Bixby, W. Cook, A. Cox, and E. K. Lee. Parallel mixed integer programming. Technical Report CRPC TR 95554, Center for Research on Parallel Computation, Research Monograph, 1995.

[bib43] [BCJ90] E. C. Bronson, T. L. Casavant, and L. H. Jamieson. Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system. IEEE Transactions on Parallel and Distributed Systems, 1(2):195–205, 1990.

[bib44] [BDHM84] D. Bitton, D. J. DeWitt, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting. Computing Surveys, 16(3):287–318, September 1984.

[bib45] [Bel57] R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, <year>1957</year>.

[bib46] [Bel58] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1):87–90, 1958.

[bib47] [Ben80] J. L. Bentley. A parallel algorithm for constructing minimum spanning trees. Journal of the ACM, 27(1):51–59, March 1980.

[bib48] [Ber84] S. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18(3):147–150, March 1984.

[bib49] [Ber89] J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12:335–342, 1989.

[bib50] [BH82] A. Borodin and J. E. Hopcroft. Routing merging and sorting on parallel models of computation. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing, 338–344, May 1982.

[bib51] [Bix91] R. E. Bixby. Two applications of linear programming. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.

[bib52] [BJK+95] R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming, 1995.

[bib53] [BKH89] S. Bershader, T. Kraay, and J. Holland. The giant-Fourier-transform. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 387–389, 1989.

[bib54] [Ble90] G. E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, Cambridge, MA, <year>1990</year>.

[bib55] [BMCP98] A. Brungger, A. Marzetta, J. Clausen, and M. Perregaard. Solving large-scale qap problems in parallel with the search library zram. Journal of Parallel and Distributed Computing, 50:157–169, 1998.

[bib56] [BNK92] A. Bar-Noy and S. Kipnis. Designing broadcasting algorithms in the postal model for message-passing systems. In Proceedings of 4th ACM Symposium on Parallel Algorithms and Architectures, 13–22, 1992.

[bib57] [BOS+91] D. P. Bertsekas, C. Ozveren, G. D. Stamoulis, P. Tseng, and J. N. Tsitsiklis. Optimal communication algorithms for hypercubes. Journal of Parallel and Distributed Computing, 11:263–275, 1991.

[bib58] [BR90] R. Boppana and C. S. Raghavendra. On optimal and practical routing methods for a massive data movement operation on hypercubes. Technical report, University of Southern California, Los Angeles, CA, 1990.

[bib59] [Bra97] R. Bramley. Technology news & reviews: Chemkin software; OpenMP Fortran Standard; ODE toolbox for Matlab; Java products; Scientific WorkPlace 3.0. IEEE Computational Science and Engineering, 4(4):75–78, October/December 1997.

[bib60] [Bro79] K. Brown. Dynamic programming in computer science. Technical Report CMU-CS-79-106, Carnegie Mellon University, Pittsburgh, PA, 1979.

[bib61] [BS78] G. M. Baudet and D. Stevenson. Optimal sorting algorithms for parallel computers. IEEE Transactions on Computers, C–27(1):84–87, January 1978.

[bib62] [BT89] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, NJ, <year>1989</year>.

[bib63] [BT97] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, <year>1997</year>.

[bib64] [But97] D. R. Butenhof. Programming with POSIX Threads. Addison-Wesley, Reading, MA, <year>1997</year>.

[bib65] [Buy99] R.Buyya, editor. High Performance Cluster Computing: Architectures and Systems. Prentice Hall, <year>1999</year>.

[bib66] [BW89] M. L. Barton and G. R. Withers. Computing performance as a function of the speed, quantity, and the cost of processors. In Supercomputing ’89 Proceedings, 759–764, 1989.

[bib67] [BW97] J. Beveridge and R. Wiener. Multithreading Applications in Win32: the Complete Guide to Threads. Addison-Wesley Developers Press, Reading, MA, <year>1997</year>.

[bib68] [C+95] J. Choi et al. A proposal for a set of Parallel Basic Linear Algebra Subprograms. Technical Report CS-95-292, Computer Science Department, University of Tennessee, 1995.

[bib69] [CAHH91] N. P. Chrisopchoides, M. Aboelaze, E. N. Houstis, and C. E. Houstis. The parallelization of some level 2 and 3 BLAS operations on distributed-memory machines. In Proceedings of the First International Conference of the Austrian Center of Parallel Computation. Springer-Verlag Series Lecture Notes in Computer Science, 1991.

[bib70] [Can69] L. E. Cannon. A cellular computer to implement the Kalman Filter Algorithm. Ph.D. Thesis, Montana State University, Bozman, MT, <year>1969</year>.

[bib71] [Car89] G.F.Carey, editor. Parallel Supercomputing: Methods, Algorithms and Applications. Wiley, New York, NY, <year>1989</year>.

[bib72] [CD87] S. Chandran and L. S. Davis. An approach to parallel vision algorithms. In R.Porth, editor, Parallel Processing. SIAM, Philadelphia, PA, <year>1987</year>.

[bib73] [CDK+00] R.Chandra, L.Dagum, D.Kohr, D.Maydan, J.McDonald, and R.M. (editors). Parallel Programming in OpenMP. Morgan Kaufmann Publishers, <year>2000</year>.

[bib74] [CG87] E. Chu and A. George. Gaussian elimination with partial pivoting and load balancing on a multiprocessor. Parallel Computing, 5:65–74, 1987.

[bib75] [CGK93] D. Challou, M. Gini, and V. Kumar. Parallel search algorithms for robot motion planning. In Proceedings of the IEEE Conference on Robotics and Automation, 46–51, 1993.

[bib76] [CGL92] D. E. Culler, M. Gunter, and J. C. Lee. Analysis of multithreaded microprocessors under multiprogramming. Report UCB/CSD 92/687, University of California, Berkeley, Computer Science Division, Berkeley, CA, May 1992.

[bib77] [Cha79] A. K. Chandra. Maximal parallelism in matrix multiplication. Technical Report RC-6193, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1979.

[bib78] [Cha87] R. Chamberlain. An alternate view of LU factorization on a hypercube multiprocessor. In M.T.Heath, editor, Hypercube Multiprocessors 1987, 569–575. SIAM, Philadelphia, PA, <year>1987</year>.

[bib79] [CJP83] H. Crowder, E. L. Johnson, and M. Padberg. Solving large-scale zero-one linear programming problem. Operations Research, 2:803–834, 1983.

[bib80] [CKP+93a] D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 1–12, 1993.

[bib81] [CKP+93b] D. E. Culler, R. Karp, D. A. Patterson, et al. Logp: Towards a realistic model of parallel computation. In Principles and Practices of Parallel Programming, <year>May 1993</year>.

[bib82] [CL93] B. Codenotti and M. Leoncini. Introduction to Parallel Processing. Addison-Wesley, <year>1993</year>.

[bib83] [CLC81] F. Y. Chin, J. Lam, and I. Chen. Optimal parallel algorithms for the connected component problem. In Proceedings of the 1981 International Conference on Parallel Processing, 170–175, 1981.

[bib84] [CLC82] F. Y. Chin, J. Lam, and I. Chen. Efficient parallel algorithms for some graph problems. Communications of the ACM, 25(9):659–665, September 1982.

[bib85] [CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, McGraw-Hill, New York, NY, <year>1990</year>.

[bib86] [CM82] K. M. Chandy and J. Misra. Distributed computation on graphs: Shortest path algorithms. Communications of the ACM, 25(11):833–837, November 1982.

[bib87] [CM98] B. Chapman and P. Mehrotra. OpenMP and HPF: Integrating two paradigms. Lecture Notes in Computer Science, 1470, <year>1998</year>.

[bib88] [Col88] R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770–785, August 1988.

[bib89] [Col89] M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA, <year>1989</year>.

[bib90] [Con89] T. Conlon. Programming in PARLOG. Addison-Wesley, Reading, MA, <year>1989</year>.

[bib91] [CR89] E. A. Carmona and M. D. Rice. A model of parallel performance. Technical Report AFWL-TR-89-01, Air Force Weapons Laboratory, 1989.

[bib92] [CR91] E. A. Carmona and M. D. Rice. Modeling the serial and parallel fractions of a parallel algorithm. Journal of Parallel and Distributed Computing, 1991.

[bib93] [CS88] B. V. Cherkassky and R. Smith. Efficient mapping and implementations of matrix algorithms on a hypercube. Journal of Supercomputing, 2:7–27, 1988.

[bib94] [CSG98] D. E. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, <year>1998</year>.

[bib95] [CT92] K. M. Chandy and S. Taylor. An Introduction to Parallel Programming. Jones and Bartlett, Austin, TX, <year>1992</year>.

[bib96] [CV91] B. S. Chlebus and I. Vrto. Parallel quicksort. Journal of Parallel and Distributed Processing, 1991.

[bib97] [Cve87] Z. Cvetanovic. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM Journal of Research and Development, 31(4):435–451, 1987.

[bib98] [CWP98] A. Cohen, M. Woodring, and R. Petrusha. Win32 Multithreaded Programming. O’Reilly & Associates, <year>1998</year>.

[bib99] [D+92] W. J. Dally et al. The message-driven processor. IEEE Micro, 12(2):23–39, 1992.

[bib100] [Dal87] W. J. Dally. A VLSI Architecture for Concurrent Data Structures. Kluwer Academic Publishers, Boston, MA, <year>1987</year>.

[bib101] [Dal90a] W. J. Dally. Analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, 39(6), June 1990.

[bib102] [Dal90b] W. J. Dally. Network and processor architecture for message-driven computers. In R.Sauya and G.Birtwistle, editors, VLSI and Parallel Computation. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.

[bib103] [Dav86] G. J. Davis. Column LU factorization with pivoting on a hypercube multiprocessor. SIAM Journal on Algebraic and Discrete Methods, 7:538–550, 1986. Also available as Technical Report ORNL-6219, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[bib104] [DCG90] J. D. DeMello, J. L. Calvet, and J. M. Garcia. Vectorization and multitasking of dynamic programming in control: experiments on a CRAY-2. Parallel Computing, 13:261–269, 1990.

[bib105] [DDSV99] J. Dongarra, I. S. Duff, D. Sorensen, and H. V. Vorst. Numerical Linear Algebra for High Performance Computers (Software, Environments, Tools). SIAM, <year>1999</year>.

[bib106] [DeC89] A. L. DeCegama. The Technology of Parallel Processing: Parallel Processing Architectures and VLSI Hardware: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, <year>1989</year>.

[bib107] [DEH89] P. M. Dew, R. A. Earnshaw, and T. R. Heywood. Parallel Processing for Computer Vision and Display. Addison-Wesley, Reading, MA, <year>1989</year>.

[bib108] [Dem82] J. Deminet. Experiences with multiprocessor algorithms. IEEE Transactions on Computers, C-31(4):278–288, 1982.

[bib109] [DFHM82] D. J. DeWitt, D. B. Friedland, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting algorithms. Technical Report TR-482, Computer Sciences Department, University of Wisconsin, Madison, WI, 1982.

[bib110] [DFRC96] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable parallel computational geometry for coarse grained multicomputers. International Journal on Computational Geometry, 6(3):379–400, 1996.

[bib111] [DHvdV93] J. W. Demmel, M. T. Heath, and H. A. van der Vorst. Parallel numerical linear algebra. Acta Numerica, 111–197, 1993.

[bib112] [Dij59] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1:269–271, 1959.

[bib113] [DM93] S. Dutt and N. R. Mahapatra. Parallel A* algorithms and their performance on hypercube multiprocessors. In Proceedings of the Seventh International Parallel Processing Symposium, 797–803, 1993.

[bib114] [DM98] L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46–55, January/March 1998.

[bib115] [DNS81] E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal on Computing, 10:657–673, 1981.

[bib116] [Dra96] D. G. Drake. Introduction to Java threads. JavaWorld: IDG’s magazine for the Java community, 1(2), April 1996.

[bib117] [DRGNP] F. Darema-Rogers, D. George, V. Norton, and G. Pfister. VM parallel environment. In Proceedings of the IBM Kingston Parallel Processing Symposium.

[bib118] [DS86] W. J. Dally and C. L. Seitz. The torus routing chip. Journal of Distributed Computing, 1(3):187–196, 1986.

[bib119] [DS87] W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547– 553, 1987.

[bib120] [DSG83] E. W. Dijkstra, W. H. Seijen, and A. J. M. V. Gasteren. Derivation of a termination detection algorithm for a distributed computation. Information Processing Letters, 16(5):217–219, 1983.

[bib121] [DT89] L. Desbat and D. Trystram. Implementing the discrete Fourier transform on a hypercube vector-parallel computer. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 407– 410, 1989.

[bib122] [DV87] K. A. Doshi and P. J. Varman. Optimal graph algorithms on a fixed-size linear array. IEEE Transactions on Computers, C–36(4):460–470, April 1987.

[bib123] [dV89] E. F. V. de Velde. Multicomputer matrix computations: Theory and practice. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 1303–1308, 1989.

[bib124] [DY81] N. Deo and Y. B. Yoo. Parallel algorithms for the minimum spanning tree problem. In Proceedings of the 1981 International Conference on Parallel Processing, 188–189, 1981.

[bib125] [Eck94] J. Eckstein. Parallel branch-and-bound methods for mixed-integer programming on the cm-5. SIAM Journal on Optimization, 4(4):794–814, 1994.

[bib126] [Eck97] J. Eckstein. Distributed versus centralized storage and control for parallel branch and bound: Mixed integer programming on the cm-5. Computational Optimization and Applications, 7(2):199–220, 1997.

[bib127] [Ede89] A. Edelman. Optimal matrix transposition and bit-reversal on hypercubes: Node address–memory address exchanges. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.

[bib128] [EDH80] O. I. El-Dessouki and W. H. Huen. Distributed enumeration on network computers. IEEE Transactions on Computers, C-29:818–825, September 1980.

[bib129] [EHHR88] S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):589–600, 1988.

[bib130] [EHMN90] M. Evett, J. Hendler, A. Mahanti, and D. Nau. PRA*: A memory-limited heuristic search procedure for the connection machine. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 145– 149, 1990.

[bib131] [Ekl72] J. O. Eklundh. A fast computer method for matrix transposing. IEEE Transactions on Computers, 21(7):801–803, 1972.

[bib132] [Ert92] W. Ertel. OR—parallel theorem proving with random competition. In A.Voronokov, editor, LPAR ’92: Logic Programming and Automated Reasoning, 226–237. Springer-Verlag, New York, NY, <year>1992</year>.

[bib133] [EZL89] D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38(3):408–423, 1989.

[bib134] [Fen81] T. Y. Feng. A survey of interconnection networks. IEEE Computer, 12–27, December 1981.

[bib135] [FF82] R. A. Finkel and J. P. Fishburn. Parallelism in alpha-beta search. Artificial Intelligence, 19:89–106, 1982.

[bib136] [FF86] G. C. Fox and W. Furmanski. Optimal communication algorithms on hypercube. Technical Report CCCP-314, California Institute of Technology, Pasadena, CA, 1986.

[bib137] [FJDS96] L. Fosdick, E. Jessup, G. Domik, and C. Schauble. Introduction to High-Performance Scientific Computing. MIT Press, <year>1996</year>.

[bib138] [FJL+88] G. C. Fox, M. Johnson, G. Lyzenga, S. W. Otto, J. Salmon, and D. Walker.Solving Problems on Concurrent Processors: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, <year>1988</year>.

[bib139] [FK88] C. Ferguson and R. Korf. Distributed tree search and its application to alpha-beta pruning. In Proceedings of the 1988 National Conference on Artificial Intelligence, 1988.

[bib140] [FK89] H. P. Flatt and K. Kennedy. Performance of parallel processors. Parallel Computing, 12:1–20, 1989.

[bib141] [FKO86] E. Felten, S. Karlin, and S. W. Otto. Sorting on a hypercube. Caltech/JPL, 1986. Hm 244.

[bib142] [Fla90] H. P. Flatt. Further applications of the overhead model for parallel systems. Technical Report G320-3540, IBM Corporation, Palo Alto Scientific Center, Palo Alto, CA, 1990.

[bib143] [Flo62] R. W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345, June 1962.

[bib144] [Fly72] M. J. Flynn. Some computer organizations and their effectiveness. IEEE Transactions on Computers, C-21(9):948–960, 1972.

[bib145] [Fly95] M. J. Flynn. Computer Architecture: Pipelined and Parallel Processor Design. Jones and Bartlett, <year>1995</year>.

[bib146] [FM70] W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, 17(3):496–507, July 1970.

[bib147] [FM87] R. A. Finkel and U. Manber. DIB—a distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems, 9(2):235–256, April 1987.

[bib148] [FM92] R. Frye and J. Myczkowski. Load balancing algorithms on the connection machine and their use in Monte-Carlo methods. In Proceedings of the Unstructured Scientific Computation on Multiprocessors Conference, 1992.

[bib149] [FMM94] R. Feldmann, P. Mysliwietz, and B. Monien. Studying overheads in massively parallel min/max-tree evaluation. In Proc. of the 6th ACM Symposium on Parallel Algorithms and Architectures, 94–103, 1994.

[bib150] [FOH87] G. C. Fox, S. W. Otto, and A. J. G. Hey. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing, 4:17–31, 1987.

[bib151] [Fos95] I. Foster. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, <year>1995</year>.

[bib152] [Fou94] T. J. Fountain. Parallel Computing: Principles and Practice. Cambridge University Press, <year>1994</year>.

[bib153] [FR62] L. R. Ford and R. L. Rivest. Flows in Networks. Princeton University Press, Princeton, NJ, <year>1962</year>.

[bib154] [Fra93] M. Franklin. The multiscalar architecture. Technical Report CS-TR-1993-1196, University of Wisconsin, 1993.

[bib155] [FTI90] M. Furuichi, K. Taki, and N. Ichiyoshi. A multi-level load balancing scheme for OR-parallel exhaustive search programs on the Multi-PSI. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 50–59, 1990.

[bib156] [FW78] S. Fortune and J. Wyllie. Parallelism in random access machines. In Proceedings of ACM Symposium on Theory of Computing, 114–118, 1978.

[bib157] [Gal95] B. O. Gallmeister. Posix. 4 : Programming for the Real World. O’Reilly & Associates, <year>1995</year>.

[bib158] [GBD+94] G. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine. MIT Press, Cambridge, MA, <year>1994</year>.

[bib159] [Gei85] G. A. Geist. Efficient parallel LU factorization with pivoting on a hypercube multiprocessor. Technical Report ORNL-6211, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[bib160] [GGK+83] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer—designing a MIMD, shared memory parallel computer. IEEE Transactions on Computers, C–32(2):175–189, February 1983.

[bib161] [GGK93] A. Y. Grama, A. Gupta, and V. Kumar. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel and Distributed Technology, 1(3):12–21, August 1993.

[bib162] [GH85] G. A. Geist and M. T. Heath. Parallel Cholesky factorization on a hypercube multiprocessor. Technical Report ORNL-6190, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[bib163] [GH86] G. A. Geist and M. T. Heath. Matrix factorization on a hypercube multiprocessor. In M.T.Heath, editor, Hypercube Multiprocessors 1986, 161–180. SIAM, Philadelphia, PA, <year>1986</year>.

[bib164] [GH01] S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, <year>2001</year>.

[bib165] [Gib85] A. Gibbons. Algorithmic Graph Theory. Cambridge University Press, Cambridge, <year>1985</year>.

[bib166] [Gib89] P. B. Gibbons. A more practical PRAM model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 158–168, 1989.

[bib167] [GK91] A. Gupta and V. Kumar. The scalability of matrix multiplication algorithms on parallel computers. Technical Report TR 91-54, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1991. A short version appears in Proceedings of 1993 International Conference on Parallel Processing, pages III-115–III-119, 1993.

[bib168] [GK93a] A. Gupta and V. Kumar. Performance properties of large scale parallel systems. Journal of Parallel and Distributed Computing, 19:234–244, 1993. Also available as Technical Report TR 92-32, Department of Computer Science, University of Minnesota, Minneapolis, MN.

[bib169] [GK93b] A. Gupta and V. Kumar. The scalability of FFT on parallel computers. IEEE Transactions on Parallel and Distributed Systems, 4(8):922–932, August 1993. A detailed version is available as Technical Report TR 90-53, Department of Computer Science, University of Minnesota, Minneapolis, MN.

[bib170] [GKP92] A. Grama, V. Kumar, and P. M. Pardalos. Parallel processing of discrete optimization problems. In Encyclopaedia of Microcomputers. Marcel Dekker Inc., New York, <year>1992</year>.

[bib171] [GKR91] A. Y. Grama, V. Kumar, and V. N. Rao. Experimental evaluation of load balancing techniques for the hypercube. In Proceedings of the Parallel Computing ’91 Conference, 497–514, 1991.

[bib172] [GKRS96] A. Grama, V. Kumar, S. Ranka, and V. Singh. A3: A simple and asymptotically accurate model for parallel computation. In Proceedings of the Sixth Symposium on Frontiers of Massively Parallel Computing, Annapolis, MD, 1996.

[bib173] [GKS92] A. Gupta, V. Kumar, and A. H. Sameh. Performance and scalability of preconditioned conjugate gradient methods on parallel computers. Technical Report TR 92-64, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1992. A short version appears in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 664–674, 1993.

[bib174] [GKT79] L. J. Guibas, H. T. Kung, and C. D. Thompson. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration, California Institute of Technology, 509–525, 1979.

[bib175] [GL96a] G. H. Golub and C. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, <year>1996</year>.

[bib176] [GL96b] W. D. Gropp and E. Lusk. User’s Guide for mpich, a Portable Implementation of MPI. Mathematics and Computer Science Division, Argonne National Laboratory. ANL-96/6. <year>1996</year>.

[bib177] [GLDS96] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789–828, September 1996.

[bib178] [GLS99] W. Gropp, E. Lusk, and A. Skjellum. Using MPI. MIT Press, <year>1999</year>. 2nd Edition.

[bib179] [GMB88] J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609–638, 1988.

[bib180] [GO93] G. H. Golub and J. M. Ortega. Scientific Computing: An Introduction with Parallel Computing. Academic Press, <year>1993</year>.

[bib181] [GPS90] K. A. Gallivan, R. J. Plemmons, and A. H. Sameh. Parallel algorithms for dense linear algebra computations . SIAM Review, 32(1):54–135, March 1990. Also appears in K. A. Gallivan et al. Parallel Algorithms for Matrix Computations. SIAM, Philadelphia, PA, <year>1990</year>.

[bib182] [GR88] G. A. Geist and C. H. Romine. LU factorization algorithms on distributed-memory multiprocessor architectures. SIAM Journal on Scientific and Statistical Computing, 9(4):639–649, 1988. Also available as Technical Report ORNL/TM-10383, Oak Ridge National Laboratory, Oak Ridge, TN, 1987.

[bib183] [GR90] A. Gibbons and W. Rytter. Efficient Parallel Algorithms. Cambridge University Press, Cambridge, UK, <year>1990</year>.

[bib184] [Gre91] S. Green. Parallel Processing for Computer Graphics. MIT Press, Cambridge, MA, <year>1991</year>.

[bib185] [GSNL98] W. Gropp, M. Snir, W. Nitzberg, and E. Lusk. MPI: The Complete Reference. MIT Press, <year>1998</year>.

[bib186] [GT88] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM, 35(4):921–940, October 1988.

[bib187] [Gup87] A. Gupta. Parallelism in Production Systems. Morgan Kaufmann, Los Altos, CA, <year>1987</year>.

[bib188] [Gus88] J. L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5):532–533, 1988.

[bib189] [Gus92] J. L. Gustafson. The consequences of fixed time performance measurement. In Proceedings of the 25th Hawaii International Conference on System Sciences: Volume III, 113–124, 1992.

[bib190] [HB84] K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, New York, NY, <year>1984</year>.

[bib191] [HB88] M. M. Huntbach and F. W. Burton. Alpha-beta search on virtual tree machines. Information Science, 44:3–17, 1988.

[bib192] [HCH95] F. -H. Hsu, M. S. Campbell, and A. J. Hoane. Deep Blue system overview. In Proceedings of the 1995 International Conference on Supercomputing, Barcelona, Spain, 240–244, 1995.

[bib193] [HCS79] D. S. Hirschberg, A. K. Chandra, and D. V. Sarwate. Computing connected components on parallel computers. Communications of the ACM, 22(8):461– 464, August 1979.

[bib194] [HD87] S. -R. Huang and L. S. Davis. A tight upper bound for the speedup of parallel best-first branch-and-bound algorithms. Technical report, Center for Automation Research, University of Maryland, College Park, MD, 1987.

[bib195] [HD89a] S. R. Huang and L. S. Davis. Parallel iterative a* search: An admissible distributed heuristic search algorithm. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 23–29, 1989.

[bib196] [HD89b] K. Hwang and D. DeGroot. Parallel Processing for Supercomputers and Artificial Intelligence. McGraw-Hill, New York, NY, <year>1989</year>.

[bib197] [HDM97] J. Hill, S. Donaldson, and A. McEwan. Installation and user guide for the oxford bsp toolset: User guide for the oxford bsp toolset (v1.3) implementation of bsplib. Technical report, Oxford University Computing Laboratory, 1997.

[bib198] [Hea85] M. T. Heath. Parallel Cholesky factorization in message-passing multiprocessor environments. Technical Report ORNL-6150, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[bib199] [HG97] S. Hamilton and L. Garber. Deep Blue’s hardware-software synergy. IEEE Computer, 30(10):29–35, October 1997.

[bib200] [Hil85] W. D. Hillis. The Connection Machine. MIT Press, Cambridge, MA, <year>1985</year>.

[bib201] [Hil90] M. D. Hill. What is scalability? Computer Architecture News, 18(4), 1990.

[bib202] [Hip89] P. G. Hipes. Matrix multiplication on the JPL/Caltech Mark IIIfp hypercube. Technical Report C3P 746, Concurrent Computation Program, California Institute of Technology, Pasadena, CA, 1989.

[bib203] [Hir76] D. S. Hirschberg. Parallel algorithms for the transitive closure and connected component problem. In Proceedings of the 8th Annual ACM Symposium on the Theory of Computing, 55–57, 1976.

[bib204] [Hir78] D. S. Hirschberg. Fast parallel sorting algorithms. Communications of the ACM, 21(8):657–666, August 1978.

[bib205] [HJ87] C. -T. Ho and S. L. Johnsson. Spanning balanced trees in Boolean cubes. Technical Report YALEU/DCS/RR-508, Department of Computer Science, Yale University, New Haven, CT, 1987.

[bib206] [HJE91] C. -T. Ho, S. L. Johnsson, and A. Edelman. Matrix multiplication on hypercubes using full bandwidth and constant storage. In Proceedings of the 1991 International Conference on Parallel Processing, 447–451, 1991.

[bib207] [HK96] S. Hambrusch and A. Khokhar. C3: A parallel model for coarse-grained machines. Journal of Parallel and Distributed Computing, 32(2):139–154, February 1996.

[bib208] [HLM84] R. E. Hiromoto, O. M. Lubeck, and J. Moore. Experiences with the Denelcor HEP. Parallel Computing, 1(3–4):197–206, 1984.

[bib209] [HLV90] S. H. S. Huang, H. Liu, and V. Vishwanathan. A sub-linear parallel algorithm for some dynamic programming problems. In Proceedings of the 1990 International Conference on Parallel Processing, III–261–III–264, 1990.

[bib210] [HM80] D. K. Hsiao and M. J. Menon. Parallel record-sorting methods for hardware realization. Osu-cisrc-tr-80-7, Computer Science Information Department, Ohio State University, Columbus, OH, 1980.

[bib211] [HMT+96] H. Hum, O. Maquelin, K. Theobald, X. Tian, and G. Gao. A study of the earth-manna multithreaded system. Intl. J. of Par. Prog., 24:319–347, 1996.

[bib212] [HNR90] P. Heidelberger, A. Norton, and J. T. Robinson. Parallel quicksort using fetch-and-add. IEEE Transactions on Computers, C-39(1):133–138, January 1990.

[bib213] [Hoa62] C. A. R. Hoare. Quicksort. Computer Journal, 5:10–15, 1962.

[bib214] [HP89] S. W. Hornick and F. P. Preparata. Deterministic PRAM simulation with constant redundancy. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 103–109, 1989.

[bib215] [HQ91] P. J. Hatcher and M. J. Quinn. Data Parallel Programming. MIT Press, Cambridge, MA, <year>1991</year>.

[bib216] [HR88] M. T. Heath and C. H. Romine. Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):558–588, 1988.

[bib217] [HR91] C. -T. Ho and M. T. Raghunath. Efficient communication primitives on circuit-switched hypercubes. In Sixth Distributed Memory Computing Conference Proceedings, 390–397, 1991.

[bib218] [HS78] E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, Rockville, MD, <year>1978</year>.

[bib219] [HS86] W. D. Hillis and G. L. Steele. Data parallel algorithms. Communications of the ACM, 29(12):1170–1183, 1986.

[bib220] [Hsu90] F. -H. Hsu. Large scale parallelization of alpha-beta search: An algorithmic and architectural study with computer chess. Technical report, Carnegie Mellon University, Pittsburgh, PA, 1990. Ph.D. Thesis.

[bib221] [Hua85] M. A. Huang. Solving some graph problems with optimal or near-optimal speedup on mesh-of-trees networks. In Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 232–340, 1985.

[bib222] [HX98] K. Hwang and Z. Xu. Scalable Parallel Computing. McGraw-Hill, New York, NY, <year>1998</year>.

[bib223] [Hyd99] P. Hyde. Java Thread Programming. Sams, <year>1999</year>.

[bib224] [IPS91] O. H. Ibarra, T. C. Pong, and S. M. Sohn. Parallel recognition and parsing on the hypercube. IEEE Transactions on Computers, 40(6):764–770, June 1991.

[bib225] [IYF79] M. Imai, Y. Yoshida, and T. Fukumura. A parallel searching scheme for multiprocessor systems and its application to combinatorial problems. In Proceedings of the International Joint Conference on Artificial Intelligence, 416–418, 1979.

[bib226] [Jaj92] J. Jaja. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, <year>1992</year>.

[bib227] [JAM87] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. Randomized parallel algorithms for Prolog programs and backtracking applications. In Proceedings of the 1987 International Conference on Parallel Processing, 278–281, 1987.

[bib228] [JAM88] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. A randomized parallel backtracking algorithm. IEEE Transactions on Computers, C-37(12), 1988.

[bib229] [JGD87] L.H.Jamieson, D.B.Gannon, and R.J.Douglass, editors. The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, <year>1987</year>.

[bib230] [JH88] S. L. Johnsson and C. -T. Ho. Matrix transposition on Boolean n-cube configured ensemble architectures. SIAM Journal on Matrix Analysis and Applications, 9(3):419–454, July 1988.

[bib231] [JH89] S. L. Johnsson and C. -T. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249– 1268, September 1989.

[bib232] [JH91] S. L. Johnsson and C. -T. Ho. Optimal all-to-all personalized communication with minimum span on Boolean cubes. In Sixth Distributed Memory Computing Conference Proceedings, 299–304, 1991.

[bib233] [JKFM89] S. L. Johnsson, R. Krawitz, R. Frye, and D. McDonald. A radix-2 FFT on the connection machine. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.

[bib234] [JNS97] L. Johnson, G. Nemhauser, and M. Savelsbergh. Progress in integer programming: An exposition. Technical report, School of Industrial and Systems Engineering, Georgia Institute of Technology, 1997. Available from http://akula.isye.gatech.edu/mwps/mwps.html.

[bib235] [Joh77] D. B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1–13, March 1977.

[bib236] [Joh84] S. L. Johnsson. Combining parallel and sequential sorting on a boolean n-cube. In Proceedings of International Conference on Parallel Processing, 1984.

[bib237] [Joh87] S. L. Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing, 4(2):133–172, April 1987.

[bib238] [Joh90] S. L. Johnsson. Communication in network architectures. In R.Suaya and G.Birtwistle, editors, VLSI and Parallel Computation, 223–389. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.

[bib239] [JP93] M. T. Jones and P. E. Plassmann. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 14:654–669, 1993.

[bib240] [JS87] J. -F. Jenq and S. Sahni. All pairs shortest paths on a hypercube multiprocessor. In Proceedings of the 1987 International Conference on Parallel Processing, 713–716, 1987.

[bib241] [KA88] R. A. Kamin and G. B. Adams. Fast Fourier transform algorithm design and tradeoffs. Technical Report RIACS TR 88.18, NASA Ames Research Center, Moffet Field, CA, 1988.

[bib242] [KA99a] Y. -K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59:381–422, 1999.

[bib243] [KA99b] Y. -K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 31(4):406– 471, 1999.

[bib244] [KB57] T. C. Koopmans and M. J. Beckmann. Assignment problems and the location of economic activities. Econometrica, 25:53–76, 1957.

[bib245] [Ken90] Kendall Square Research Corporation. KSR-1 Overview. Waltham, MA. <year>1990</year>.

[bib246] [KF90] A. H. Karp and H. P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539–543, 1990.

[bib247] [KG94] V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379–391, 1994. Also available as Technical Report TR 91-18, Department of Computer Science Department, University of Minnesota, Minneapolis, MN.

[bib248] [KGK90] V.Kumar, P.S.Gopalakrishnan, and L.N.Kanal, editors. Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.

[bib249] [KGR94] V. Kumar, A. Grama, and V. N. Rao. Scalable load balancing techniques for parallel computers. Journal of Parallel and Distributed Computing, 22(1):60– 79, July 1994.

[bib250] [KH67] R. M. Karp and M. H. Held. Finite state processes and dynamic programming. SIAM Journal of Applied Math, 15:693–718, 1967.

[bib251] [KH83] M. Kumar and D. S. Hirschberg. An efficient implementation of Batcher’s odd-even merge algorithm and its application in parallel sorting schemes. IEEE Transactions on Computers, C–32, March 1983.

[bib252] [KK79] P. Kermani and L. Kleinrock. Virtual cut-through: A new communication switching technique. Computer Networks, 3(4):267–286, 1979.

[bib253] [KK83] V. Kumar and L. N. Kanal. A general branch-and-bound formulation for understanding and synthesizing and/or tree search procedures. Artificial Intelligence, 21:179–198, 1983.

[bib254] [KK84] V. Kumar and L. N. Kanal. Parallel branch-and-bound formulations for and/or tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–6:768–778, 1984.

[bib255] [KK88a] L. N. Kanal and V. Kumar. Search in Artificial Intelligence. Springer-Verlag, New York, NY, <year>1988</year>.

[bib256] [KK88b] V. Kumar and L. N. Kanal. The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound. In L.N.Kanal and V.Kumar, editors, Search in Artificial Intelligence, 1–27. Springer-Verlag, New York, NY, <year>1988</year>.

[bib257] [KK93] G. Karypis and V. Kumar. Efficient Parallel Mappings of a Dynamic Programming Algorithm. In Proceedings of 7th International Parallel Processing Symposium, number 563–568, 1993.

[bib258] [KK94] G. Karypis and V. Kumar. Unstructured tree search on simd parallel computers. Journal of Parallel and Distributed Computing, 22(3):379–391, September 1994.

[bib259] [KK99] G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278–300, 1999.

[bib260] [KKKS94] L.N.Kanal, V.Kumar, H.Kitano, and C.Suttner, editors. Parallel Processing for Artificial Intelligence. North-Holland, Amsterdam, The Netherlands, <year>1994</year>.

[bib261] [KN91] K. Kimura and I. Nobuyuki. Probabilistic analysis of the efficiency of the dynamic load distribution. In Sixth Distributed Memory Computing Conference Proceedings, 1991.

[bib262] [Knu73] D. E. Knuth. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, MA, <year>1973</year>.

[bib263] [Kor81] W. Kornfeld. The use of parallelism to implement a heuristic search. In Proceedings of the International Joint Conference on Artificial Intelligence, 575–580, 1981.

[bib264] [Kow88] J. S. Kowalik. Parallel Computation and Computers for Artificial Intelligence. Kluwer Academic Publishers, Boston, MA, <year>1988</year>.

[bib265] [KP92] C. Kaklamanis and G. Persiano. Branch-and-bound and backtrack search on mesh-connected arrays of processors. In Proceedings of Fourth Annual Symposium on Parallel Algorithms and Architectures, 118–126, 1992.

[bib266] [KR87a] V. K. P. Kumar and C. S. Raghavendra. Array processor with multiple broadcasting. Journal of Parallel and Distributed Computing, 173–190, 1987.

[bib267] [KR87b] V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501–519, 1987.

[bib268] [KR88] R. M. Karp and V. Ramachandran. A survey of complexity of algorithms for shared-memory machines. Technical Report 408, University of California, Berkeley, 1988.

[bib269] [KR89] V. Kumar and V. N. Rao. Load balancing on the hypercube architecture. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 603–608, 1989.

[bib270] [KRR88] V. Kumar, K. Ramesh, and V. N. Rao. Parallel best-first search of state-space graphs: A summary of results. In Proceedings of the 1988 National Conference on Artificial Intelligence, 122–126, 1988.

[bib271] [KRS88] C. P. Kruskal, L. Rudolph, and M. Snir. A complexity theory of efficient parallel algorithms. Technical Report RC13572, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1988.

[bib272] [Kru56] J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the AMS, volume 7, 48–50, 1956.

[bib273] [KS88] J. Kuehn and B. Smith. The horizon supercomputing system: architecture and software. In Proceedings of Supercomputing Conference, 28–34, 1988.

[bib274] [KS91a] L. V. Kale and V. Saletore. Efficient parallel execution of IDA* on shared and distributed-memory multiprocessors. In Sixth Distributed Memory Computing Conference Proceedings, 1991.

[bib275] [KS91b] V. Kumar and V. Singh. Scalability of parallel algorithms for the all-pairs shortest path problem. Journal of Parallel and Distributed Computing, 13(2):124–138, October 1991. A short version appears in the Proceedings of the International Conference on Parallel Processing, 1990.

[bib276] [KSS95] S. Kleiman, D. Shah, and B. Smaalders. Programming with Threads. SunSoft Press, Mountainview, CA, <year>1995</year>.

[bib277] [KU86] A. R. Karlin and E. Upfal. Parallel hashing – an efficient implementation of shared memory. In Proceedings of 18th ACM Conference on Theory of Computing, 160–168, 1986.

[bib278] [Kun80] J. T. Kung. The structure of parallel algorithms. In M.Yovits, editor, Advances in Computing, 73–74. Academic Press, San Diego, CA, <year>1980</year>.

[bib279] [Kun86] H. T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the 1986 IEEE Symposium on Computer Architecture, 49–54, 1986.

[bib280] [KV92] K. Kreeger and N. R. Vempaty. Comparison of meshes vs. hypercubes for data rearrangement. Technical Report UCF-CS-92-28, Department of Computer Science, University of Central Florida, Orlando, FL, 1992.

[bib281] [KZ88] R. M. Karp and Y. Zhang. A randomized parallel branch-and-bound procedure. In Proceedings of the ACM Annual Symposium on Theory of Computing, 290–300, 1988.

[bib282] [Law75] D. H. Lawrie. Access and alignment of data in an array processor. IEEE Transactions on Computers, C-24(1):1145–1155, 1975.

[bib283] [LB95a] B. Lewis and D. J. Berg. Threads Primer: A Guide to Multithreaded Programming. Prentice Hall PTR/Sun Microsystems Press, <year>1995</year>.

[bib284] [LB95b] B. -H. Lim and R. Bianchini. Limits on the performance benefits of multithreading and prefetching. Research report RC 20238 (89547), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 1995.

[bib285] [LB97] B. Lewis and D. J. Berg. Multithreaded Programming with Pthreads. Prentice Hall PTR/Sun Microsystems Press, <year>1997</year>.

[bib286] [LB98] T. G. Lewis and D. Berg. Multithreaded Programming with PThreads. Sun Microsystems Press / Prentice Hall, <year>1998</year>.

[bib287] [LC88] G. -J. Li and T. Coleman. A parallel triangular solver for a hypercube multiprocessor. SIAM Journal on Scientific and Statistical Computing, 9:485–502, 1988.

[bib288] [LC89] G. -J. Li and T. Coleman. A new method for solving triangular systems on distributed memory message passing multiprocessors. SIAM Journal on Scientific and Statistical Computing, 10:382–396, 1989.

[bib289] [LD90] S. Lakshmivarahan and S. K. Dhall. Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. McGraw-Hill, New York, NY, <year>1990</year>.

[bib290] [LDP89] M. R. Leuze, L. W. Dowdy, and K. H. Park. Multiprogramming a distributed-memory multiprocessor. Concurrency: Practice and Experience, 1(1):19–33, September 1989.

[bib291] [Lea99] D. Lea. Concurrent Programming in Java, Second Edition: Design Principles and Patterns. Addison-Wesley, <year>1999</year>.

[bib292] [Lei83] F. T. Leighton. Parallel computation using mesh of trees. In Proceedings of International Workshop on Graph-Theoretic Concepts in Computer Science, 1983.

[bib293] [Lei85a] F. T. Leighton. Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers, C–34(4):344–354, April 1985.

[bib294] [Lei85b] C. E. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. In Proceedings of the 1985 International Conference on Parallel Processing, 393–402, 1985.

[bib295] [Lei92] F. T. Leighton. Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Mateo, CA, <year>1992</year>.

[bib296] [LER92] T. G. Lewis and H. El-Rewini. Introduction to Parallel Computing. Prentice-Hall, Englewood Cliffs, NJ, <year>1992</year>.

[bib297] [Les93] B. P. Lester. The Art of Parallel Programming. Prentice-Hall, Englewood Cliffs, NJ, <year>1993</year>.

[bib298] [Lev87] S. P. Levitan. Measuring communications structures in parallel architectures and algorithms. In L.H.Jamieson, D.B.Gannon, and R.J.Douglass, editors, The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, <year>1987</year>.

[bib299] [Lew91] D. A. Lewine. Posix Programmer’s Guide: Writing Portable Unix Programs with the Posix. 1 Standard. O’Reilly & Associates, <year>1991</year>.

[bib300] [LHZ98] H. Lu, C. Hu, and W. Zwaenepoel. OpenMP on networks of workstations. In SC ’98, High Performance Networking and Computing Conference, Orlando, Florida, 1998.

[bib301] [Lil92] D. J. Lilja. Architectural Alternatives for Exploiting Parallelism. IEEE Computer Society Press, Los Alamitos, CA, <year>1992</year>.

[bib302] [Lin83] G. Lindstrom. The key node method: A highly parallel alpha-beta algorithm. Technical Report 83-101, Computer Science Department, University of Utah, Salt Lake City, UT, 1983.

[bib303] [Lin92] Z. Lin. A distributed fair polling scheme applied to or-parallel logic programming. International Journal of Parallel Programming, 20(4), August 1992.

[bib304] [LK72] K. N. Levitt and W. T. Kautz. Cellular arrays for the solution of graph problems. Communications of the ACM, 15(9):789–801, September 1972.

[bib305] [LK85] D. B. Leifker and L. N. Kanal. A hybrid SSS*/alpha-beta algorithm for parallel search of game trees. In Proceedings of the International Joint Conference on Artificial Intelligence, 1044–1046, 1985.

[bib306] [LLG+92] D. Lenoski, J. Laudon, K. Gharachorloo, W. D. Weber, A. Gupta, J. L. Hennessy, M. Horowitz, and M. Lam. The Stanford dash multiprocessor. IEEE Computer, 63–79, March 1992.

[bib307] [LM97] E. K. Lee and J. E. Mitchell. Computational experience of an interior-point algorithm in a parallel branch-and-cut framework. In Proceedings for SIAM Conference on Parallel Processing for Scientific Computing, 1997.

[bib308] [LMR88] F. T. Leighton, B. Maggs, and S. K. Rao. Universal packet routing algorithms. In 29th Annual Symposium on Foundations of Computer Science, 256–271, 1988.

[bib309] [Loa92] C. V. Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, PA, <year>1992</year>.

[bib310] [LP92] Y. Li and P. M. Pardalos. Parallel algorithms for the quadratic assignment problem. In P.M.Pardalos, editor, Advances in Optimization and Parallel Computing, 177–189. North-Holland, Amsterdam, The Netherlands, <year>1992</year>.

[bib311] [LPP88] F. Luccio, A. Pietracaprina, and G. Pucci. A probabilistic simulation of PRAMs on a bounded degree network. Information Processing Letters, 28:141–147, July 1988.

[bib312] [LPP89] F. Luccio, A. Pietracaprina, and G. Pucci. A new scheme for deterministic simulation of PRAMs in VLSI. SIAM Journal of Computing, 1989.

[bib313] [LRZ95] C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Santa Barbara, CA, 1995.

[bib314] [LS84] T. H. Lai and S. Sahni. Anomalies in parallel branch and bound algorithms. Communications of the ACM, 594–602, 1984.

[bib315] [LS85] T. H. Lai and A. Sprague. Performance of parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-34(10), October 1985.

[bib316] [LS86] T. H. Lai and A. Sprague. A note on anomalies in parallel branch-and-bound algorithms with one-to-one bounding functions. Information Processing Letters, 23:119–122, October 1986.

[bib317] [LSS88] J. Lee, E. Shragowitz, and S. Sahni. A hypercube algorithm for the 0/1 knapsack problem. Journal of Parallel and Distributed Computing, (5):438–456, 1988.

[bib318] [Lub86] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 15(4):1036–1053, 1986.

[bib319] [LW66] E. L. Lawler and D. Woods. Branch-and-bound methods: A survey. Operations Research, 14, 1966.

[bib320] [LW84] G. -J. Li and B. W. Wah. Computational efficiency of parallel approximate branch-and-bound algorithms. In Proceedings of the 1984 International Conference on Parallel Processing, 473–480, 1984.

[bib321] [LW85] G. -J. Li and B. W. Wah. Parallel processing of serial dynamic programming problems. In Proceedings of COMPSAC 85, 81–89, 1985.

[bib322] [LW86] G. -J. Li and B. W. Wah. Coping with anomalies in parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-35, June 1986.

[bib323] [LW95] D. Lenoski and W. D. Weber. Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, San Mateo, CA, <year>1995</year>.

[bib324] [LY86] M. Li and Y. Yesha. New lower bounds for parallel computations. In Proceedings of 18th ACM Conference on Theory of Computing, 177–187, 1986.

[bib325] [MC82] T. A. Marsland and M. S. Campbell. Parallel search of strongly ordered game trees. Computing Surveys, 14:533–551, 1982.

[bib326] [MD92] A. Mahanti and C. Daniels. SIMD parallel heuristic search. Artificial Intelligence, 1992.

[bib327] [MD93] N. R. Mahapatra and S. Dutt. Scalable duplicate pruning strategies for parallel A* graph search. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993.

[bib328] [MdV87] O. A. McBryan and E. F. V. de Velde. Hypercube algorithms and implementations. SIAM Journal on Scientific and Statistical Computing, 8(2):s227–s287, March 1987.

[bib329] [Mes94] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Available at http://www.mpi-forum.org. May 1994.

[bib330] [Mes97] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. Available at http://www.mpi-forum.org. July 1997.

[bib331] [MFMV90] B. Monien, R. Feldmann, P. Mysliwietz, and O. Vornberger. Parallel game tree search by dynamic tree decomposition. In V.Kumar, P.S.Gopalakrishnan, and L.N.Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.

[bib332] [MG89] C. U. Martel and D. Q. Gusfield. A fast parallel quicksort algorithm. Information Processing Letters, 30:97–102, 1989.

[bib333] [Mil91] D. Miller. Exact distributed algorithms for travelling salesman problem. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.

[bib334] [MK99] J. Magee and J. Kramer. Concurrency: State Models and Java Programs. John Wiley & Sons, <year>1999</year>.

[bib335] [MKRS88] R. Miller, V. K. P. Kumar, D. I. Reisis, and Q. F. Stout. Meshes with reconfigurable buses. In Proceedings of MIT Conference on Advanced Research in VLSI, 163–178, 1988.

[bib336] [MM73] A. Martelli and U. Montanari. From dynamic programming to search algorithms with functional costs. In Proceedings of the International Joint Conference on Artifi cial Intelligence, 345–349, 1973.

[bib337] [MM91] P.Messina and A.Murli, editors. Practical Parallel Computing: Status and Prospects. Wiley, Chichester, UK, <year>1991</year>.

[bib338] [MMR95] B. Mans, T. Mautor, and C. Roucairol. A parallel depth first search branch and bound for the quadratic assignment problem. European Journal of Operational Research, 81(3):617–628, 1995.

[bib339] [Mod88] J. J. Modi. Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford, UK, <year>1988</year>.

[bib340] [Moh83] J. Mohan. Experience with two parallel programs solving the traveling salesman problem. In Proceedings of the 1983 International Conference on Parallel Processing, 191–193, 1983.

[bib341] [Mol86] C. B. Moler. Matrix computation on distributed-memory multiprocessors. In M.T.Heath, editor, Hypercube Multiprocessors 1986, 181–195. SIAM, Philadelphia, PA, <year>1986</year>.

[bib342] [Mol87] C. B. Moler. Another look at Amdahl’s law. Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987.

[bib343] [Mol93] D. I. Moldovan. Parallel Processing: From Applications to Systems. Morgan Kaufmann, San Mateo, CA, <year>1993</year>.

[bib344] [MP85] T. A. Marsland and F. Popowich. Parallel game tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(4):442–452, July 1985.

[bib345] [MP93] D. L. Miller and J. F. Pekny. The role of performance metrics for parallel mathematical programming algorithms. ORSA Journal on Computing, 5(1), 1993.

[bib346] [MR] D. C. Marinescu and J. R. Rice. On high level characterization of parallelism. Technical Report CSD-TR-1011, CAPO Report CER-90-32, Computer Science Department, Purdue University, West Lafayette, IN. Also published in Journal of Parallel and Distributed Computing, 1993.

[bib347] [MRSR92] G. P. McKeown, V. J. Rayward-Smith, and S. A. Rush. Parallel Branch-and-Bound, 111–150. Advanced Topics in Computer Science. Blackwell Scientific Publications, Oxford, UK, <year>1992</year>.

[bib348] [MS88] Y. W. E. Ma and D. G. Shea. Downward scalability of parallel architectures. In Proceedings of the 1988 International Conference on Supercomputing, 109–120, 1988.

[bib349] [MS90] G. Manzini and M. Somalvico. Probabilistic performance analysis of heuristic search using parallel hash tables. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 1990.

[bib350] [MS96] R. Miller and Q. F. Stout. Parallel Algorithms for Regular Architectures. MIT Press, Cambridge, MA, <year>1996</year>.

[bib351] [MV84] K. Mehlhorn and U. Vishkin. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, 21(4):339–374, November 1984.

[bib352] [MV85] B. Monien and O. Vornberger. The ring machine. Technical report, University of Paderborn, Germany, 1985. Also in Computers and Artificial Intelligence, 3(1987).

[bib353] [MV87] B. Monien and O. Vornberger. Parallel processing of combinatorial search trees. In Proceedings of International Workshop on Parallel Algorithms and Architectures, 1987.

[bib354] [MVS86] B. Monien, O. Vornberger, and E. Spekenmeyer. Superlinear speedup for parallel backtracking. Technical Report 30, University of Paderborn, Germany, 1986.

[bib355] [NA91] D. Nussbaum and A. Agarwal. Scalability of parallel machines. Communications of the ACM, 34(3):57–61, 1991.

[bib356] [Nat90] L. Natvig. Investigating the practical value of Cole’s O (log n) time crew pram merge sort algorithm. In 5th International Symposium on Computing and Information Sciences, October 1990.

[bib357] [NBF96] B. Nichols, B. Buttlar, and J. P. Farrell. Pthreads Programming. O’Reilly & Associates, Newton, MA 02164, <year>1996</year>.

[bib358] [nCU90] nCUBE Corporation. nCUBE 6400 Processor Manual. Beaverton, OR, <year>1990</year>.

[bib359] [ND96] S. J. Norton and M. D. DiPasquale. Thread time: the multithreaded programming guide. Hewlett-Packard professional books. Prentice-Hall, Englewood Cliffs, NJ 07632, <year>1996</year>.

[bib360] [Ni91] L. M. Ni. A layered classification of parallel computers. In Proceedings of 1991 International Conference for Young Computer Scientists, 28–33, 1991.

[bib361] [Nic90] J. R. Nickolls. The design of the MasPar MP-1: A cost-effective massively parallel computer. In IEEE Digest of Papers—Comcom, 25–28. IEEE Computer Society Press, Los Alamitos, CA, 1990.

[bib362] [NM93] L. M. Ni and McKinley. A survey of wormhole routing techniques in direct connect networks. IEEE Computer, 26(2), February 1993.

[bib363] [NMB83] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt. Efficient VLSI networks for parallel processing based on orthogonal trees . IEEE Transactions on Computers, C–32:21–23, June 1983.

[bib364] [NS79] D. Nassimi and S. Sahni. Bitonic sort on a mesh connected parallel computer. IEEE Transactions on Computers, C–28(1), January 1979.

[bib365] [NS80] D. Nassimi and S. Sahni. Finding connected components and connected ones on a mesh-connected computer. SIAM Journal of Computing, 9(4):744–757, November 1980.

[bib366] [NS87] A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581–591, 1987.

[bib367] [NS93] M. Nigam and S. Sahni. Sorting n numbers on n × n reconfigurable meshes with buses. In 7th International Parallel Processing Symposium, 174–181, 1993.

[bib368] [NSF91] Grand Challenges: High Performance Computing and Communications. A Report by the Committee on Physical, Mathematical and Engineering Sciences, NSF/CISE, 1800 G Street NW, Washington, DC, 20550, <year>1991</year>.

[bib369] [Nug88] S. F. Nugent. The iPSC/2 direct-connect communications technology. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 51–60, 1988.

[bib370] [Nus82] H. J. Nussbaumer. Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, New York, NY, <year>1982</year>.

[bib371] [NW88] D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, 5:404–420, 1988.

[bib372] [GOV99] Funding a Revolution: Government Support for Computing Research. Committee on Innovations in Computing and Communications. National Academy Press, <year>1999</year>.

[bib373] [OR88] J. M. Ortega and C. H. Romine. The ijk forms of factorization methods II: Parallel systems. Parallel Computing, 7:149–162, 1988.

[bib374] [Ort88] J. M. Ortega. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, NY, <year>1988</year>.

[bib375] [OS85] D. P. O’Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28:840–853, 1985.

[bib376] [OS86] D. P. O’Leary and G. W. Stewart. Assignment and scheduling in parallel matrix factorization. Linear Algebra and its Applications, 77:275–299, 1986.

[bib377] [Pac98] P. Pacheco. Parallel Programming with MPI. Morgan Kaufmann, <year>1998</year>.

[bib378] [PBG+85] G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norlton, and J. Weiss. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of 1985 International Conference on Parallel Processing, 764– 771, 1985.

[bib379] [PC89] P. M. Pardalos and J. Crouse. A parallel algorithm for the quadratic assignment problem. In Supercomputing ‘89 Proceedings, 351–360. ACM Press, New York, NY, 1989.

[bib380] [PD89] H. K. and L. W. Dowdy. Dynamic partitioning of multiprocessor systems. International Journal of Parallel Processing, 18(2):91–120, 1989.

[bib381] [Pea84] J. Pearl. Heuristics—Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, MA, <year>1984</year>.

[bib382] [Per87] R. Perrott. Parallel Programming. Addison-Wesley, Reading, MA, <year>1987</year>.

[bib383] [Pfi98] G. F. Pfister. In Search of Clusters. Prentice Hall, Englewood Cliffs, NJ, <year>1998</year>. 2nd Edition.

[bib384] [PFK90] C. Powley, C. Ferguson, and R. Korf. Parallel heuristic search: Two approaches. In V.Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, <year>1990</year>.

[bib385] [PG98] T. Q. Pham and P. K. Garg. Multithreaded Programming with Win32. Prentice Hall, <year>1998</year>.

[bib386] [PH90] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, <year>1990</year>.

[bib387] [PH96] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, 2nd edition. Morgan Kaufmann, San Mateo, CA, <year>1996</year>.

[bib388] [PK89] R. C. Paige and C. P. Kruskal. Parallel algorithms for shortest path problems. In Proceedings of 1989 International Conference on Parallel Processing, 14– 19, 1989.

[bib389] [PK95] I. Pramanick and J. G. Kuhl. An inherently parallel method for heuristic problem-solving: Part I – general framework. IEEE Transactions on Parallel and Distributed Systems, 6(10), October 1995.

[bib390] [PKF92] C. Powley, R. Korf, and C. Ferguson. IDA* on the connection machine. Artificial Intelligence, 1992.

[bib391] [Pla89] C. C. Plaxton. Load balancing, selection and sorting on the hypercube. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 64–73, 1989.

[bib392] [PLRR94] P. M. Pardalos, Y. Li, K. Ramakrishna, and M. Resende. Lower bounds for the quadratic assignment problem. Annals of Operations Research, 50:387–411, <year>1994</year>. Special Volume on Applications of Combinatorial Optimization.

[bib393] [PR85] V. Pan and J. H. Reif. Efficient parallel solution of linear systems. In 17th Annual ACM Symposium on Theory of Computing, 143–152, 1985.

[bib394] [PR89] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for unconstrainted quadratic zero-one programming. In R.Sharda et al., editors, Impacts of Recent Computer Advances on Operations Research, 131–143. North-Holland, Amsterdam, The Netherlands, <year>1989</year>.

[bib395] [PR90] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for quadratic zero-one programming on a hypercube architecture. Annals of Operations Research, 22:271–292, <year>1990</year>.

[bib396] [PR91] M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33:60– 100, 1991.

[bib397] [Pri57] R. C. Prim. Shortest connection network and some generalizations. Bell Systems Technical Journal, 36:1389–1401, 1957.

[bib398] [PRV88] G. Plateau, C. Roucairol, and I. Valabregue. Algorithm PR2 for the parallel size reduction of the 0/1 multiknapsack problem. In INRIA Rapports de Recherche, number 811, 1988.

[bib399] [PS82] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, <year>1982</year>.

[bib400] [PV80] F. P. Preparata and J. Vuillemin. Area-time optimal VLSI networks for matrix multiplication. In Proceedings of the 14th Princeton Conference on Information Science and Systems, 300–309, 1980.

[bib401] [PY88] C. H. Papadimitriou and M. Yannakakis. Towards an architecture independent analysis of parallel algorithms. In Proceedings of 20th ACM Symposium on Theory of Computing, 510–513, 1988.

[bib402] [QD86] M. J. Quinn and N. Deo. An upper bound for the speedup of parallel branch-and-bound algorithms. BIT, 26(1), March 1986.

[bib403] [Qui88] M. J. Quinn. Parallel sorting algorithms for tightly coupled multiprocessors. Parallel Computing, 6:349–357, 1988.

[bib404] [Qui89] M. J. Quinn. Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer. IEEE Transactions on Computers, 1989.

[bib405] [Qui94] M. J. Quinn. Parallel Computing: Theory and Practice. McGraw-Hill, New York, NY, <year>1994</year>.

[bib406] [Ram97] V. Ramachandran. Qsm: A general purpose shared-memory model for parallel computation. In Foundations of Software Technology and Theoretical Computer Science, 1–5, <year>1997</year>.

[bib407] [Ran89] A. G. Ranade. Fluent Parallel Computation. Ph.D. Thesis, Department of Computer Science, Yale University, New Haven, CT, <year>1989</year>.

[bib408] [Ran91] A. G. Ranade. Optimal speedup for backtrack search on a butterfly network. In Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, 1991.

[bib409] [Rao90] V. N. Rao. Parallel Processing of Heuristic Search. Ph.D. Thesis, University of Texas, Austin, TX, <year>1990</year>.

[bib410] [Ras78] L. Raskin. Performance Evaluation of Multiple Processor Systems. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, <year>1978</year>.

[bib411] [RB76] C. M. Rader and N. M. Brenner. A new principle for Fast fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 24:264–265, 1976.

[bib412] [RDK89] C. Renolet, M. Diamond, and J. Kimbel. Analytical and heuristic modeling of distributed algorithms. Technical Report E3646, FMC Corporation, Advanced Systems Center, Minneapolis, MN, 1989.

[bib413] [Rei81] R. Reischuk. Probabilistic algorithms for sorting and selection. SIAM Journal of Computing, 396–409, 1981.

[bib414] [RF89] D. A. Reed and R. M. Fujimoto. Multicomputer Networks: Message-Based Parallel Processing. MIT Press, Cambridge, MA, <year>1989</year>.

[bib415] [RICN88] K. Rokusawa, N. Ichiyoshi, T. Chikayama, and H. Nakashima. An efficient termination detection and abortion algorithm for distributed processing systems. In Proceedings of 1988 International Conference on Parallel Processing: Vol. I, 18–22, 1988.

[bib416] [RK87] V. N. Rao and V. Kumar. Parallel depth-first search, part I: Implementation. International Journal of Parallel Programming, 16(6):479–499, 1987.

[bib417] [RK88a] V. N. Rao and V. Kumar. Concurrent access of priority queues. IEEE Transactions on Computers, C–37 (12), 1988.

[bib418] [RK88b] V. N. Rao and V. Kumar. Superlinear speedup in state-space search. In Proceedings of the 1988 Foundation of Software Technology and Theoretical Computer Science, number 338, 161–174. Springer-Verlag Series Lecture Notes in Computer Science, 1988.

[bib419] [RK93] V. N. Rao and V. Kumar. On the efficicency of parallel backtracking. IEEE Transactions on Parallel and Distributed Systems, 4(4):427–437, April 1993. available as a technical report TR 90-55, Computer Science Department, University of Minnesota.

[bib420] [RKR87] V. N. Rao, V. Kumar, and K. Ramesh. A parallel implementation of iterative-deepening-A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI-87), 878–882, 1987.

[bib421] [RND77] E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, <year>1977</year>.

[bib422] [RO88] C. H. Romine and J. M. Ortega. Parallel solution of triangular systems of equations. Parallel Computing, 6:109–114, 1988.

[bib423] [Rob75] M. O. Robin. Probabilistic algorithms. In J.Traub, editor, Algorithms and Complexity: New Directions and Recent Results, 21–39. Academic Press, San Diego, CA, <year>1975</year>.

[bib424] [Rob90] Y. Robert. The Impact of Vector and Parallel Architectures on Gaussian Elimination. John Wiley and Sons, New York, NY, <year>1990</year>.

[bib425] [Rom87] C. H. Romine. The parallel solution of triangular systems on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 552–559. SIAM, Philadelphia, PA, <year>1987</year>.

[bib426] [Rou87] C. Roucairol. A parallel branch-and-bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, 18:211–225, 1987.

[bib427] [Rou91] C. Roucairol. Parallel branch-and-bound on shared-memory multiprocessors. In Proceedings of the Workshop On Parallel Computing of Discrete Optimization Problems, 1991.

[bib428] [RRRR96] K. A. Robbins, S. Robbins, K. R. Robbins, and S. Robbins. Practical UNIX Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice Hall, <year>1996</year>.

[bib429] [RS90a] A. P. R. Alverson, D. Callahan, D. Cummings, B. Koblenz and B. Smith. The tera computer system. In International Conference on Supercomputing, 1–6, 1990.

[bib430] [RS90b] S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, <year>1990</year>.

[bib431] [RV87] J. H. Reif and L. G. Valiant. A logarithmic time sort for linear size networks. Journal of the ACM, 34(1):60–76, January 1987.

[bib432] [Ryt88] W. Rytter. Efficient parallel computations for dynamic programming. Theoretical Computer Science, 59:297–307, 1988.

[bib433] [RZ89] M.Reeve and S.E.Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, Chichester, UK, <year>1989</year>.

[bib434] [Saa86] Y. Saad. Communication complexity of the Gaussian elimination algorithm on multiprocessors. Linear Algebra and its Applications, 77:315–340, 1986.

[bib435] [SB77] H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of Fourth Symposium on Computer Architecture, 105–124, March 1977.

[bib436] [SBCV90] R. H. Saavedra-Barrera, D. E. Culler, and T. Von Eiken. Analysis of multithreaded architectures for parallel computing. Report UCB/CSD 90/569, University of California, Berkeley, Computer Science Division, Berkeley, CA, April 1990.

[bib437] [Sch80] J. T. Schwartz. Ultracomputers. ACM Transactions on Programming Languages and Systems, 2:484–521, October 1980.

[bib438] [Sed78] R. Sedgewick. Implementing quicksort programs. Communications of the ACM, 21(10):847–857, 1978.

[bib439] [Sei85] C. L. Seitz. The cosmic cube. Communications of the ACM, 28(1):22–33, 1985.

[bib440] [Sei89] S. R. Seidel. Circuit-switched vs. store-and-forward solutions to symmetric communication problems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 253–255, 1989.

[bib441] [Sei92] C. L. Seitz. Mosaic C: An experimental fine-grain multicomputer. Technical report, California Institute of Technology, Pasadena, CA, 1992.

[bib442] [SG88] S. R. Seidel and W. L. George. Binsorting on hypercube with d-port communication. In Proceedings of the Third Conference on Hypercube Concurrent Computers, 1455–1461, January 1988.

[bib443] [SG91] X. -H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17:1093–1109, <year>December 1991</year>. Also available as Technical Report IS-5053, UC-32, Ames Laboratory, Iowa State University, Ames, IA.

[bib444] [Sha85] J. A. Sharp. Data-Flow Computing. Ellis Horwood, Chichester, UK, <year>1985</year>.

[bib445] [She59] D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7):30–32, July 1959.

[bib446] [SHG93] J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42–50, 1993.

[bib447] [Sie77] H. J. Siegel. The universality of various types of SIMD machine interconnection networks. In Proceedings of the 4th Annual Symposium on Computer Architecture , 23–25, 1977.

[bib448] [Sie85] H. J. Siegel. Interconnection Networks for Large-Scale Parallel Processing. D. C. Heath, Lexington, MA, <year>1985</year>.

[bib449] [SJ81] C. Savage and J. Jaja. Fast, efficient parallel algorithms for some graph problems. SIAM Journal of Computing, 10(4):682–690, November 1981.

[bib450] [SK89] W. Shu and L. V. Kale. A dynamic scheduling strategy for the chare-kernel system. In Proceedings of Supercomputing Conference, 389–398, 1989.

[bib451] [SK90] V. Saletore and L. V. Kale. Consistent linear speedup to a first solution in parallel state-space search. In Proceedings of the 1990 National Conference on Artificial Intelligence, 227–233, August 1990.

[bib452] [SKAT91a] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2):95–131, 1991.

[bib453] [SKAT91b] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Scalability of parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2), 1991.

[bib454] [SM86] S. J. Stolfo and D. P. Miranker. The DADO production system machine. Journal of Parallel and Distributed Computing, 3:269–296, June 1986.

[bib455] [Smi84] D. R. Smith. Random trees and the analysis of branch and bound procedures. Journal of the ACM, 31(1), 1984.

[bib456] [SN90] X. -H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing ’90 Proceedings, 324–333, 1990.

[bib457] [SN93] X. -H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19:27–37, September 1993.

[bib458] [Sni82] M. Snir. On parallel search. In Proceedings of Principles of Distributed Computing, 242–253, 1982.

[bib459] [Sni85] M. Snir. On parallel searching. SIAM Journal of Computing, 14(3):688–708, August 1985.

[bib460] [Sny86] L. Snyder. Type architectures, shared-memory and the corollary of modest potential. Annual Review of Computer Science, 1:289–317, <year>1986</year>.

[bib461] [SOHL+96] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. MIT Press, Cambridge, MA, <year>1996</year>.

[bib462] [Sol77] M. Sollin. An algorithm attributed to Sollin. In S.Goodman and S.Hedetniemi, editors, Introduction to The Design and Analysis of Algorithms. McGraw-Hill, Cambridge, MA, <year>1977</year>.

[bib463] [SR91] X. -H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. Technical Report IS-5057, Ames Laboratory, Iowa State University, Ames, IA, 1991. Also published in IEEE Transactions on Parallel and Distributed Systems.

[bib464] [SS88] Y. Saad and M. H. Schultz. Topological properties of hypercubes. IEEE Transactions on Computers, 37:867–872, 1988.

[bib465] [SS89a] Y. Saad and M. H. Schultz. Data communication in hypercubes. Journal of Parallel and Distributed Computing, 6:115–135, 1989. Also available as Technical Report YALEU/DCS/RR-428 from the Department of Computer Science, Yale University, New Haven, CT.

[bib466] [SS89b] Y. Saad and M. H. Schultz. Data communication in parallel architectures. Parallel Computing, 11:131–150, 1989.

[bib467] [SS90] H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361–372, 1990.

[bib468] [ST95] B. M. S. Tschvke, R. Lling. Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In Proceedings of the 9th International Parallel Processing Symposium, 182– 189, Santa Barbara, CA, April 1995.

[bib469] [Sto71] H. S. Stone. Parallel processing with the perfect shuffle. IEEE Transactions on Computers, C-20(2):153–161, 1971.

[bib470] [Sto93] H. S. Stone. High-Performance Computer Architecture: Third Edition. Addison-Wesley, Reading, MA, <year>1993</year>.

[bib471] [Sun95] SunSoft. Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, <year>1995</year>.

[bib472] [Sup91] Supercomputer Systems Division, Intel Corporation. Paragon XP/S Product Overview. Beaverton, OR, <year>1991</year>.

[bib473] [SV81] Y. Shiloach and U. Vishkin. Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms, 88–102, 1981.

[bib474] [SV82] Y. Shiloach and U. Vishkin. An O (n2 log n) parallel max-flow algorithm. Journal of Algorithms, 3:128–146, 1982.

[bib475] [SW87] Q. F. Stout and B. A. Wagar. Passing messages in link-bound hypercubes. In M.T.Heath, editor, Hypercube Multiprocessors 1987, 251–257. SIAM, Philadelphia, PA, <year>1987</year>.

[bib476] [Swa87] P. N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197–210, 1987.

[bib477] [SZ96] X. -H. Sun and J. P. Zhu. Performance considerations of shared virtual memory machines. IEEE Trans. on Parallel and Distributed Systems, 6(11):1185– 1194, 1996.

[bib478] [Tab90] D. Tabak. Multiprocessors. Prentice-Hall, Englewood Cliffs, NJ, <year>1990</year>.

[bib479] [Tab91] D. Tabak. Advanced Multiprocessors. McGraw-Hill, New York, NY, <year>1991</year>.

[bib480] [Tak87] R. Take. An optimal routing method of all-to-all communication on hypercube networks. In 35th Information Processing Society of Japan, <year>1987</year>.

[bib481] [TEL95] D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995.

[bib482] [Ten90] S. Teng. Adaptive parallel algorithms for integral knapsack problems. Journal of Parallel and Distributed Computing, 8:400–406, 1990.

[bib483] [Thi90] Thinking Machines Corporation. The CM-2 Technical Summary. Cambridge, MA. <year>1990</year>.

[bib484] [Thi91] Thinking Machines Corporation. The CM-5 Technical Summary. Cambridge, MA. <year>1991</year>.

[bib485] [Tho83] C. D. Thompson. Fourier transforms in VLSI. IBM Journal of Research and Development, C-32(11):1047–1057, 1983.

[bib486] [Thr99] J. Throop. Standards: OpenMP: Shared-memory parallelism from the ashes. Computer, 32(5):108–109, May 1999.

[bib487] [Tic88] W. F. Tichy. Parallel matrix multiplication on the connection machine. Technical Report RIACS TR 88.41, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1988.

[bib488] [TK77] C. D. Thompson and H. T. Kung. Sorting on a mesh-connected parallel computer. Communications of the ACM, 21:263–271, 1977.

[bib489] [TL90] Z. Tang and G. -J. Li. Optimal granularity of grid iteration problems. In Proceedings of the 1990 International Conference on Parallel Processing, I111–I118, 1990.

[bib490] [Tul96] D. M. Tullsen. Simultaneous multithreading. Ph.D. Thesis, University of Washington, Seattle, WA, <year>1996</year>.

[bib491] [TV85] R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14(4):862–874, November 1985.

[bib492] [TY96] J. -Y. Tsai and P. -C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 35–46, 1996.

[bib493] [Upf84] E. Upfal. A probabilistic relation between desirable and feasible models of parallel computation. In Proceedings of the 16th ACM Conference on Theory of Computing, 258–265, <year>1984</year>.

[bib494] [UW84] E. Upfal and A. Widgerson. How to share memory in a distributed system. In Proceedings of the 25th Annual Symposium on the Foundation of Computer Science, 171–180, 1984.

[bib495] [Val75] L. G. Valiant. Parallelism in comparison problems. SIAM Journal of Computing, 4(3):348–355, September 1975.

[bib496] [Val82] L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982.

[bib497] [Val90a] L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8), 1990.

[bib498] [Val90b] L. G. Valiant. General purpose parallel architectures. Handbook of Theoretical Computer Science, <year>1990</year>.

[bib499] [Vav89] S. A. Vavasis. Gaussian elimination with pivoting is P-complete. SIAM Journal on Discrete Mathematics, 2:413–423, 1989.

[bib500] [VB81] L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the 13th ACM Symposium on Theory of Computation, 263–277, 1981.

[bib501] [VC89] F. A. Van-Catledge. Towards a general model for evaluating the relative performance of computer systems. International Journal of Supercomputer Applications, 3(2):100–108, 1989.

[bib502] [Vor86] O. Vornberger. Implementing branch-and-bound in a ring of processors. Technical Report 29, University of Paderborn, Germany, 1986.

[bib503] [Vor87a] O. Vornberger. The personal supercomputer: A network of transputers. In Proceedings of the 1987 International Conference on Supercomputing, 1987.

[bib504] [Vor87b] O. Vornberger. Load balancing in a network of transputers. In Proceedings of the Second International Workshop on Distributed Parallel Algorithms, 1987.

[bib505] [VS86] J. S. Vitter and R. A. Simmons. New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, May 1986.

[bib506] [VSBR83] L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641–644, 1983.

[bib507] [WA98] B. Wilkinson and C. M. Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, <year>1998</year>.

[bib508] [WA99] B. Wilkinson and M. Allen. Parallel Programming. Prentice-Hall, NJ, <year>1999</year>.

[bib509] [Wag87] B. A. Wagar. Hyperquicksort: A fast sorting algorithm for hypercubes. In Proceedings of the Second Conference on Hypercube Multiprocessors, 292– 299, 1987.

[bib510] [Wal91] Y. Wallach. Parallel Processing and Ada. Prentice-Hall, Englewood Cliffs, NJ, <year>1991</year>.

[bib511] [WB72] W. A. Wulf and C. G. Bell. C.mmp—a multimicroprocessor. In Proceedings of AFIPS Conference, 765–777, 1972.

[bib512] [Wei97] B. Weissman. Active threads manual. Technical Report TR-97-037, International Computer Science Institute, Berkeley, CA 94704, <year>1997</year>.

[bib513] [WF80] C. L. Wu and T. Y. Feng. On a class of multistage interconnection networks. IEEE Transactions on Computers, 669–702, August 1980.

[bib514] [WF84] C. L. Wu and T. Y. Feng. Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington, DC, <year>1984</year>.

[bib515] [WI89] K. Wada and N. Ichiyoshi. A distributed shortest path algorithm and its mapping on the Multi-PSI. In Proceedings of International Conference of Parallel Processing, 1989.

[bib516] [Wil86] H. S. Wilf. Algorithms and Complexity. Prentice-Hall, NJ, <year>1986</year>.

[bib517] [Wil95] G. V. Wilson. Practical Parallel Programming. MIT Press, Cambridge, MA, <year>1995</year>.

[bib518] [Wil00] A. Williams. Windows 2000 Systems Programming Black Book. The Coriolis Group, <year>2000</year>.

[bib519] [Win77] S. Winograd. A new method for computing DFT. In IEEE International Conference on Acoustics, Speech and Signal Processing, 366–368, 1977.

[bib520] [WL88] B. W. Wah and G. -J. Li. Systolic processing for dynamic programming problems. Circuits, Systems, and Signal Processing, 7(2):119–149, 1988.

[bib521] [WLY84] B. W. Wah, G. -J. Li, and C. F. Yu. The status of MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. In Proceedings of 11th Annual International Symposium on Computer Architecture, 56–63, 1984.

[bib522] [WM84] B. W. Wah and Y. W. E. Ma. MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. IEEE Transactions on Computers, C–33, May 1984.

[bib523] [Woo86] J.V.Woods, editor. Fifth Generation Computer Architectures. North-Holland, Amsterdam, The Netherlands, <year>1986</year>.

[bib524] [Wor88] P. H. Worley. Information Requirements and the Implications for Parallel Computation. Ph.D. Thesis, Stanford University, Department of Computer Science, Palo Alto, CA, <year>1988</year>.

[bib525] [Wor90] P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838–858, 1990.

[bib526] [Wor91] P. H. Worley. Limits on parallelism in the numerical solution of linear PDEs. SIAM Journal on Scientific and Statistical Computing, 12:1–35, January 1991.

[bib527] [WS88] Y. Won and S. Sahni. A balanced bin sort for hypercube multiprocessors. Journal of Supercomputing, 2:435–448, 1988.

[bib528] [WS89] J. Woo and S. Sahni. Hypercube computing: Connected components. Journal of Supercomputing, 3:209–234, 1989.

[bib529] [WS91] J. Woo and S. Sahni. Computing biconnected components on a hypercube. Journal of Supercomputing, June 1991. Also available as Technical Report TR 89-7 from the Department of Computer Science, University of Minnesota, Minneapolis, MN.

[bib530] [WY85] B. W. Wah and C. F. Yu. Stochastic modeling of branch-and-bound algorithms with best-first search. IEEE Transactions on Software Engineering, SE-11, September 1985.

[bib531] [Zho89] X. Zhou. Bridging the gap between Amdahl’s law and Sandia laboratory’s result. Communications of the ACM, 32(8):1014–5, 1989.

[bib532] [Zom96] A.Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill, <year>1996</year>.

[bib533] [ZRV89] J. R. Zorbas, D. J. Reble, and R. E. VanKooten. Measuring the scalability of parallel computer systems. In Supercomputing ’89 Proceedings, 832–841, <year>1989</year>.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset