Understanding current challenges in multicore programming (part 1)

In the first part of a series of technical blog posts, we look at how communication contention can be addressed with SLX in multicore projects.
To read Part 2: Maximizing the latest automotive multicore platforms with SLX – click here
To read Part 3: Communication resources at the heart of multicore SoC – click here
To read Part 4: Communication contention management with SLX: a practical example – click here

A picture of Hong Kong's city lights at night.

Avoid the communication bottleneck and leverage the full potential of your latest multicore SoC


Software professionals are facing the tremendous challenge to use the vast amount of resources available in modern multicore SoCs. This includes integrating the ECU of a car, numerous tasks for autonomous vehicles, designing the next 5G base station, or to implement an AI engine in the next-generation smartphone. The code is highly computation intensive and processes are control-intensive (like in basestations) or you have a huge amount of parallelism (like in AI kernels). Many teams are parallelizing code by hand, or using SLX to parallelize code efficiently and automatically.

To deliver these most challenging projects of our time, you need to take advantage of multicore SoCs.


Do you really know your multicore platform?


You have a leading knowledge in your application domain, but do you really know your multicore platform and how to exploit it?

With past applications on single or dual cores, it was all about MHz and memory size. Today’s multicores encompass dozens or more CPUs. They rely on communication buses, schedulers and memory banks that can be scratchpads or even distributed caches. To enable low-power of safety applications, most of those platforms are even heterogeneous with asymmetric cores or mixed scratchpads and caches in memory hierarchy.

By finding the right way to exploit those platforms, you have the opportunity to split applications in tasks that are spread over dozens of CPUs, allowing you to leverage performances. But how can you understand performances of those SoCs, and more practically, how to program them?


It’s all about mapping and scheduling tasks, communications and buffers


In theory, multicore programming should be easy, it’s about placing tasks on available cores. To communicate data between two tasks, one has just to allocate a data buffer in a shared memory. Then tasks will use available communication means to achieve the actual data transfer.

This approach is scalable. If an application has a lot of parallelism, a lot of tasks can be created to spread computation over all the cores available. One can even execute several applications simultaneously on the same multicore SoC by just mixing their tasks.

Therefore, tasks have to be allocated and scheduled on available cores, as well as data buffers for task communications. Those buffers are allocated in shared memory (cache or scratchpad) and temporally scheduled for capacity reasons. Finally, communication resources between cores have to be managed to avoid contentions and ensure that data can be accessed in time.


The key is to be able to manage communication contentions


In order to exploit the latest multicore SoCs, one has to allocate and schedule the vast number of tasks and communications and memory buffers that are required by applications.
However, the amount of communication resources in latest multicore SoC is limited. One cannot deal with all the data communications required by the tasks without being able to manage communication contentions across the entire SoC.

This complexity grows with more tasks and hence, communication needs. The diversity of modern workloads, together with platform heterogeneity and complexity makes performance prediction almost impossible by rule of thumb. This leads to a lot of costly and hazardous trial and error work to guarantee application requirements.

Nevertheless, this complexity problem can be easily managed by using the right technology. This should be able to expose and deal with communication contention, as easily as task and memory allocation and scheduling.

This is exactly the features that you will get with SLX.  It is a very fast and accurate way to assist you in porting your innovative application to your latest multicore SoC.

Indeed, SLX models very accurately existing communication inside today’s most advanced SoCs, shared bus and memories (caches or scratchpads), CPU clusters with shared caches, crossbars and even latest NoCs. It is able to predict performances, manage communication congestions and even automatically map tasks to get the best of your latest multicore platform.

Diagram showing a system containing different tasks targeting a multicore processor.

Stay tuned, in the coming weeks we will publish a series of articles that show you the many features and capabilities of SLX when it comes to communication contention management on the latest multicore SoCs.

Get a full overview of the range of SLX programming tools here

Leave a Reply

Your email address will not be published. Required fields are marked *