Asymmetric multiprocessing (AMP) powers your next SoC project

Author: Scott McNutt, DesignLinx Hardware Solutions Company


Embedded systems generally fall into two categories: hard real-time performance; and hard real-time performance. In the past, we had to make difficult choices, choosing the performance of real-time operating systems or the rich features of our favorite Linux system, and then working hard to make up for the deficiencies.


Today, embedded developers no longer have to choose between the two. Asymmetric multiprocessing (AMP) combines the advantages of both. Several new system-on-chip (SoC) products integrate multiple CPUs, multiple standard I/O peripherals, and programmable logic. For example, the Xilinx Zynq-7000® All Programmable SoC family includes a dual-core ARM® CortexTM-A9, standard peripherals such as Gigabit Ethernet MAC, USB, DMA, SD/MMC, SPI, and CAN, and a large number of Programming logic array. We can use these SoC products as the basis of Linux/RTOS AMP system to help them achieve a high degree of flexibility.


A typical AMP configuration is similar in many ways to a PCI-based system, where the Linux domain acts as a host, the RTOS domain acts as an adapter, and there are one or more shared memory domains that are used to communicate between the two domains. However, unlike PCI, AMP configuration can more easily and dynamically allocate resources (standard peripherals and custom logic) to one or the other domain. In addition, the Linux/RTOS AMP system can dynamically reconfigure the programmable logic based on runtime requirements, such as the presence or absence of various external devices.


The degree of flexibility is often related to the complexity and difficulty involved in building an AMP system. However, please rest assured that the Linux development community has introduced many features to the core and can greatly simplify AMP configuration and usage.


LINUX multiprocessing introduction

For multiprocessing, the Linux kernel is divided into two types: single processor (UP) cores and symmetric multiprocessor (SMP) cores. Regardless of the number of cores, the UP core can only run on a single core. The AMP system can contain instances of two or more uniprocessor cores.


The SMP core can run on one core or multiple cores at the same time (Figure 1). The optional core command line parameters control the number of cores used by the SMP core after system initialization. When the core is running, various command line utilities control the number of cores allocated to the core. The ability to dynamically control the number of cores used by the kernel is the primary reason why SMP cores are more favored by AMP developers than UP cores.



Figure 1 - Symmetric multiprocessing. SMP cores can run simultaneously on multiple cores.

Remote processor framework


The remoteproc framework is a Linux component responsible for starting and stopping individual cores (remote processors) as well as software that loads kernels in the AMP system. For example, we can dynamically reconfigure the SMP system shown in Figure 1 to the AMP system shown in Figure 2, and then configure it back into SMP using the functionality of remoteproc.



Figure 2 - AMP with Linux SMP Core

We can fully control reconfiguration through user space applications or system initialization scripts. The reconfiguration control feature allows user applications to stop, reload, and run multiple RTOS applications based on the dynamic needs of the system.


The software for the kernel (RTOS and user applications in this case) is loaded from a standard executable and linkable format (ELF) file that contains a special section of a resource table. The resource table is similar to the PCI configuration space and is used to describe the resources required by the RTOS.


These resources include the memory required for RTOS code and data.


Trace buffer


The trace buffer is a memory area that automatically appears as a file in the Linux file system. As the name suggests, the trace buffer provides basic tracing capabilities to remote processors. The remote processor writes trace, debug, and status messages to the buffer for inspection via the Linux command line or a custom application.


Enter an entry in the resource table to request one or more trace buffers. Although it normally contains plain text, the trace buffer can also contain binary data, such as application status information or alarm indications.


Virtual I/O device


We can also use resource tables to define virtual input/output devices (VDEVs), which are primarily pairs of shared memory queues that support messaging between the Linux kernel and remote processors. The VDEV definition includes the fields used to set the size of the queue and the interrupts used to signal between the processors.


The Linux kernel handles the initialization of virtual I/O queues. The software running on the remote processor only needs to include a VDEV description in its resource table and then use the queue at the beginning of execution; the rest is handled by the core. /


Remote processor message framework


The Remote Processor Message (rpmsg) framework is a software message bus based on the Linux kernel-based virtual I/O system. The message bus is similar to a local area subnetwork where a single processor can create addressable endpoints and exchange information through shared memory.


The core rpmsg framework acts as a switch, passing the message to the appropriate endpoint based on the destination address contained in the message. Because the message header contains the source address, a dedicated connection can be established between different processors.


Naming service


The processor can dynamically announce specific services by sending messages to the naming service of the rpmsg framework. The naming service function itself is not very useful. However, the rpmsg framework allows the association of service names to device drivers to support automatic loading and initialization of drivers.


For example, if the remote processor announces the dlinx-h323-v1.0 service, the core can search, load, and initialize the driver associated with that name. This greatly simplifies driver management if services in the system are dynamically installed on remote processors.


Management interruption


Interrupt management is tricky, especially when starting and stopping the kernel. Ultimately, the system needs to dynamically redirect specific interrupts to the remote processor domain when the remote processor starts, and then retract the interrupt when the remote processor stops. In addition, the system must protect the interrupts from being mis-distributed by misconfigured drivers. In short, interruptions must be managed at the system level.


This is a regular event for Linux SMP cores and is another reason why SMP cores are more favored in AMP configurations. The remote processor framework can manage interrupts easily, with minimal support from device drivers.


Device driver


Device-driven development is a constant concern because the required skill set may not be available immediately. Fortunately, the Linux kernel's remoteproc and rpmsg frameworks do most of the heavy lifting; the driver only needs to implement a few standard driver routines. A fully functional driver may only require a few hundred lines of code. The core source tree contains examples of drivers that embedded developers can adapt to their own requirements.


Vendors also provide generic open source device drivers. DesignLinx Hardware Solutions provides the generic rpmsg driver for Linux and FreeRTOS. Since the Universal Driver does not assume the format of the messages exchanged, the embedded developer can use it for a variety of AMP applications without any modification.


Move inside the pin

The core multiprocessing support is not limited to homogeneous multiprocessing systems (systems that use the same type of processor). All the features introduced above can also be used in heterogeneous systems (systems with different types of processors). These multi-processing functions are particularly useful when "in-pin" migrations have been designed.


The new SoC product allows designers to easily migrate various hardware designs from printed circuit boards to system-on-chip (Figure 3). The parts that used to be separate processors and components on the PCB can be implemented entirely within the SoC pins.



Figure 3 - SoC's "In-Pin" Move Discrete PCB Elements

For example, we can use the Xilinx Zynq-7000 series SoC to implement the initial PCB hardware architecture in Figure 3, using one of the ARM processors as the control CPU and soft processor in the programmable logic (such as the Xilinx MicroBlazeTM processor) To replace discrete processors. We can use the remaining ARM processors to run the Linux SMP kernel (Figure 4).



Figure 4 - Multi-processing within the pin

Adding Linux to the initial design provides the ARM core and soft processor with all of the standard multiprocessing features described above (such as start, stop, reload, trace buffer, and remote messages). Moreover, it also brings a rich set of Linux features to support multiple network interfaces (Ethernet, Wi-Fi, Bluetooth), network services (Web server, FTP, SSH, SNMP), file systems (DOS, NFS, cramfs, Flash memory) and other interfaces (PCIe, SPI, USB, MMC, video). These features make it easy to implement new features without major changes to the tested architecture.


The kernel is constantly emerging

Over the past few years, there has been an increase in multi-core SoC products for the embedded market and it is well suited for AMP configuration.


For example, the Xilinx UltraScale+TM MPSoC architecture includes a 64-bit quad-core ARM Cortex-A53, a 32-bit dual-core ARM Cortex-R5, a graphics processing unit (GPU), and a variety of other peripherals, as well as useful Programming logic. This provides fertile ground for designers who know how to harness the performance of real-time operating systems and the rich feature set of the Linux kernel.


For more information on how to design a Linux/RTOS AMP system, contact DesignLinx Hardware Solutions. DesignLinx, a senior member of the Xilinx Alliance Program, specializes in FPGA design and support business, including system design, schematic capture and electrical packaging/mechanical engineering design, and signal integrity design.


Box 5500 Puff

Shenzhen E-wisdom Network Technology Co., Ltd. , https://www.globale-wisdom.com

Posted on