Development and Investigation of Algorithm of Sparse Matrices Multiplication Task for the Parallel Dataflow Computing System "Buran"

 
Zmejev D.N., Okunev A.S. (IPPM RAS)
 
Abstract - Operations with sparse matrices are relevant in solving scientific and engineering problems. One of these operations is the multiplication of sparse matrices. There are many algorithms for traditional computing systems that solve this task. A sequential algorithm, consisting of nested loops system, is inefficient and is only applicable for single-processor systems. Parallel algorithms are much more effi-cient and run on supercomputers and clusters. The creation of such algorithms requires certain knowledge of parallel programming and taking into account different levels of parallelism, and the creation of an effective program also requires complex system of data organization. Other aproach for solving the task of sparse matrices multiplication is considered in the article - the application of the parallel dataflow computing system (PDCS) "Buran" that imple-ments the dataflow computing model with a dynamicaly formed context. The process of developing an algorithm in dataflow programming paradigm, which is completely dif-ferent from the traditional (imperative) one, is shown. The dataflow algorithm consists of two parts - the description of program nodes and the formation of initial data. Coding of program nodes is carried out using the high-level program-ming language HPL (High-parallel language). The resulting program code, as well as the algorithm described by it, is simple and universal. The program allows the multiplication of two full/sparse matrices or the multiplication of the matrix by a vector or number. The choice of multiplication objects is determined at the stage of initial data formation. When cre-ating dataflow programs, hardware acceleration is possible, which allows reducing the code of the program or completely abandon the use of program nodes. This is also setted up during the stage of initial data formation. The use of this programming mechanic allowed the creation of three data-flow programs: fully programmed implementation, fully hardware implementation and an intermediate version. The experimental part of the work was divided into two compo-nents. In the first part, the dataflow program and the pro-gram created using traditional methods were compared. The comparison was performed on the "Lomonosov" cluster supercomputer and the results showed a high efficiency of the dataflow program when solving tasks with a sparse data structure and with equal programming complexity. In the second part of the experiments, three implementation vari-ants of dataflow algorithm were compared. The results of the comparison showed the high efficiency of applying special approaches to programming, especially when creating small (in terms of the number of computational cores) special computers. In general, the article demonstrated on a concrete example: the potential of using the PDCS "Buran" for solving tasks with a sparse data structure; the simplicity of programming; and the versatility and high efficiency of the created dataflow programs.

Keywords - sparse matrices multiplication, dataflow computing model, parallel dataflow computing system.

Разработка и исследование алгоритма задачи перемножения разреженных матриц для параллельной потоковой вычислительной системы «Буран»

 
Змеев Д.Н., Окунев А.С. (Институт проблем проектирования в микроэлектронике РАН, г. Москва)
 
Аннотация - Для того чтобы повысить эффективность выполнения задачи перемножения разреженных матриц на традиционном суперкомпьютере (кластере) необходимо учитывать при программировании разные уровни параллелизма. Используя потоковую модель вычислений с динамически формируемым контекстом и архитектуру параллельной потоковой вычислительной системы можно обойти возникающие в этой связи проблемы. В статье описывается параллельный алгоритм задачи перемножения разреженных матриц для параллельной потоковой вычислительной системы, а также варианты его реализации на потоковом языке высокого уровня, в котором воплощается потоковая модель вычислений. На примере полученного алгоритма показаны отличия в подходах к составлению алгоритмов для потоковых программ. Эксперименты, проведенные на программном инструментальном комплексе системы, показали высокую эффективность потоковой вычислительной системы при решении задач, использующих разреженную структуру данных.

Ключевые слова - перемножение разреженных матриц, потоковая модель вычислений, параллельная потоковая вычислительная система.