Sammanfattning
Most of modern digital electronic devices and systems rely nowadays on multicore processors, in order to deliver a growing list of added‐value features and services. In this way multicore architectures have been able to optimise the performance per watt ratio i.e. the application through‐ put with respect to its power consumption. However this new evolution in hardware architecture has created unprecedented challenges for software engineering. Traditional software models were not designed to optimally utilise the parallelism that applications potentially include. Therefore, the foundation of new software methodologies has been essential in order to increase application performance by exploiting parallel execution on multicore environment. This should result in increased application runtime speed‐up when processing resources (cores) increase. In this thesis we use the notion of scalability to describe how well an application can perform by exploiting a progressively greater number of cores.
Efficiently executing applications in multicore environment requires addressing the challenges of multicore scheduling. In order to obtain schedules with acceptable quality, hardware platform details are typically utilised. There is a motivation to design applications that scale, with minimum or no assumptions on underlying platforms. This would mean that upgrading the underlying hardware of a device with more cores would result in analogous ,ideally linear, application performance upgrade. This would also enable to seamlessly move applications to the Cloud, where the processing resources including the number of cores can be dynamically allocated. The availability of scalable software methodologies is thus impacting significantly the actual cost and quality of digital services.
This thesis’ scope is to contribute to the scalability of signal processing applications executed in multicore environment. The main goal is to provide methodologies and tools to leverage model‐based design and particularly to efficiently utilize the dataflow model towards the availability of scalable high performance application software. The concept of model‐based design is to decouple algorithm description from actual implementation in a hardware platform. The dataflow model algorithm description is based on the flow of data through its operations which are expressed as actors connected in a graph. Dataflow models can easily expose the potential parallelism of an algorithm and thus be an efficient methodology to analyse signal processing algorithms. However transforming dataflow models in executable code for multicore execution is challenging. This is due to the semantics ’gap’ between dataϐlow and traditional imperative programming languages. This thesis explores and proposes methodologies for leveraging dataflow based application design towards scalable application software development. In this way, popular signal processing algorithms that are efficiently modelled with dataflow graphs, could be also executed efficiently in multicore architectures.
In order to analyse how dataflow representations can be executed efficiently, this thesis gives insights regarding how well the traditional thread programming model can scale in order to support efficient parallel execution of signal processing algorithms such as image/video compression. Then a hypothesis is made that task programming models can scale better and overcome the scalability burdens of the thread model. This is justified by experimental results and related analysis.
Another claim of this thesis is that task programming models’ attributes correlate more accurately with the semantics of dataflow model of computation. Therefore, a methodology that combines synchronous dataflow and task programming models in order to enable scalable application execution is proposed. A task‐based code generator is developed and tested within a synchronous dataflow‐based application development framework. Furthermore, to optimise the dataflow to task‐based code transformation and extend its applicability in multicore platforms, a set of optimisation techniques are developed. As a result, the code generator can be used to scale the performance of applications with fine grain level of parallelism, independently from the number of cores. The generated code can also be utilised in both symmetric and asymmetric multicore environments. A set of experiments are performed that verify the proof of concept and the efficiency of the code generation process. The speed‐up and throughput of the task‐based code execution is evaluated against the equivalent thread‐based code execution for the same set of applications described as synchronous dataflow graphs.
Efficiently executing applications in multicore environment requires addressing the challenges of multicore scheduling. In order to obtain schedules with acceptable quality, hardware platform details are typically utilised. There is a motivation to design applications that scale, with minimum or no assumptions on underlying platforms. This would mean that upgrading the underlying hardware of a device with more cores would result in analogous ,ideally linear, application performance upgrade. This would also enable to seamlessly move applications to the Cloud, where the processing resources including the number of cores can be dynamically allocated. The availability of scalable software methodologies is thus impacting significantly the actual cost and quality of digital services.
This thesis’ scope is to contribute to the scalability of signal processing applications executed in multicore environment. The main goal is to provide methodologies and tools to leverage model‐based design and particularly to efficiently utilize the dataflow model towards the availability of scalable high performance application software. The concept of model‐based design is to decouple algorithm description from actual implementation in a hardware platform. The dataflow model algorithm description is based on the flow of data through its operations which are expressed as actors connected in a graph. Dataflow models can easily expose the potential parallelism of an algorithm and thus be an efficient methodology to analyse signal processing algorithms. However transforming dataflow models in executable code for multicore execution is challenging. This is due to the semantics ’gap’ between dataϐlow and traditional imperative programming languages. This thesis explores and proposes methodologies for leveraging dataflow based application design towards scalable application software development. In this way, popular signal processing algorithms that are efficiently modelled with dataflow graphs, could be also executed efficiently in multicore architectures.
In order to analyse how dataflow representations can be executed efficiently, this thesis gives insights regarding how well the traditional thread programming model can scale in order to support efficient parallel execution of signal processing algorithms such as image/video compression. Then a hypothesis is made that task programming models can scale better and overcome the scalability burdens of the thread model. This is justified by experimental results and related analysis.
Another claim of this thesis is that task programming models’ attributes correlate more accurately with the semantics of dataflow model of computation. Therefore, a methodology that combines synchronous dataflow and task programming models in order to enable scalable application execution is proposed. A task‐based code generator is developed and tested within a synchronous dataflow‐based application development framework. Furthermore, to optimise the dataflow to task‐based code transformation and extend its applicability in multicore platforms, a set of optimisation techniques are developed. As a result, the code generator can be used to scale the performance of applications with fine grain level of parallelism, independently from the number of cores. The generated code can also be utilised in both symmetric and asymmetric multicore environments. A set of experiments are performed that verify the proof of concept and the efficiency of the code generation process. The speed‐up and throughput of the task‐based code execution is evaluated against the equivalent thread‐based code execution for the same set of applications described as synchronous dataflow graphs.
Originalspråk | Engelska |
---|---|
Handledare |
|
Utgivningsort | Åbo |
Förlag | |
Tryckta ISBN | 978‐952‐12‐4114‐7 |
Elektroniska ISBN | 978‐952‐12‐4115‐4 |
Status | Publicerad - 2021 |
MoE-publikationstyp | G5 Doktorsavhandling (artikel) |