RuRot: Run-Time Rotatable-Expandable Partitions for Efficient Mapping in CGRAs

A4 Konferenspublikationer


Interna författare/redaktörer


Publikationens författare: Jafri SMAH, Serrano G, Iqbal J, Daneshtalab M, Hemani A, Paul K, Plosila J, Tenhunen H
Redaktörer: Veidenbaum AV
Förläggare: IEEE
Förlagsort: Agios Konstantinos
Publiceringsår: 2014
Förläggare: IEEE
Moderpublikationens namn: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulations (SAMOS)
Artikelns första sida, sidnummer: 233
Artikelns sista sida, sidnummer: 241
ISBN: 978-1-4799-3770-7


Abstrakt

Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Compile-time mapping decisions are neither optimal nor desirable to efficiently support the diverse and unpredictable application requirements. As a solution to this problem, recently proposed architectures offer run-time remapping. The run-time remappers displace or expand (parallelize/serialize) an application to optimize different parameters (such as platform utilization). However, the existing remappers support application displacement or expansion in either horizontal or vertical direction. Moreover, most of the works only address dynamic remapping in packet-switched networks and therefore are not applicable to the CGRAs that exploit circuitswitching for low-power and high predictability. To enhance the optimality of the run-time remappers, this paper presents a design framework called Run-time Rotatable-expandable Partitions (RuRot). RuRot provides architectural support to dynamically remap or expand (i.e. parallelize) the hosted applications in CGRAs with circuit-switched interconnects. Compared to state of the art, the proposed design supports application rotation (in clockwise and anticlockwise directions) and displacement (in horizontal and vertical directions), at run-time. Simulation results using a few applications reveal that the additional flexibility enhances the device utilization, significantly (on average 50 % for the tested applications). Synthesis results confirm that the proposed remapper has negligible silicon (0.2 % of the platform) and timing (2 cycles per application) overheads.

Senast uppdaterad 2020-04-04 vid 04:59