Interview: Markus Levy Highlights The EEMBC Floating Point Benchmark

Markus Levy, founder and president of EEMBC, addresses the new EEMBC floating point benchmark.

Aug. 21, 2013

5 min read

Benchmarking has been around before computers but they remain useful tools. I had a hand in some of the popular PC benchmarks from PC Magazine when I was Lab Director of PC Labs. These days I am looking more at the embedded side of things. This is where the Embedded Microprocessor Benchmark Consortium (EEMBC) comes into play. They have a range of benchmarks targeting chip and embedded designers. Their CoreMark is very popular.

Their latest benchmark targets floating point hardware especially on SoCs. Markus Levy is founder and president of EEMBC. He is also president of the Multicore Association and chairman of the Multicore Developer's Conference. We recently talked about the new benchmark and its impact on embedded developers.

Related Articles

Wong: EEMBC has been around for more than 16 years, why the recent need for floating-point benchmarks?

Levy: While floating-point units in processors have been in use for decades, in recent times floating point has become more mainstream, appearing in many embedded applications such as audio, graphics, automotive, and motor control. The FP-enabled processors are better able to take advantage of the need for increased precision in these applications. Floating-point representation makes numerical computation much easier, and as a matter of fact, many algorithms are first coded in floating-point representation before they are painstakingly converted to integer (to be able to run on processors without hardware floating-point support). Furthermore, FP implementations of many algorithms take fewer cycles to execute than fixed-point code (assuming the fixed-point code offers similar precision). To better support FP, more related features are being added to processors. For example, the Cortex-A cores are including FPU back-ends with multi-issue / speculation / out-of-order execution. There’s even FP capability in low-cost microcontrollers, and devices are being integrated with the ability to perform single cycle FP MAC with dual memory access.

Wong: How does EEMBC’s FPMark compare to existing benchmarks?

Levy: In the same way that the popular EEMBC CoreMark was intended to be a “better Dhrystone”, FPMark provides something better than the “somewhat quirky” Whetstone and Linpack benchmarks. There are also other FP benchmarks already in general use (i.e. Linpack, Nbench, Livermore loops), but each has multiple versions (therefore one never knows how to compare scores) and none of which have a standardized way of running them or reporting results. FPMark is built on the same framework as the EEMBC MultiBench, therefore the porting will be very familiar to those who have previously used MultiBench. Using this framework, a user of FPMark can simultaneously launch one or more contexts of a given workload and thereby study some of the general system-level effects on a multicore device. For example – and although FPMark wasn’t intended as a multicore benchmark and is mostly for computationally-intensive workloads – launching multiple contexts will increasingly stress memory bandwidth and latency and scheduling support.

Wong: What are some of the unique features of FPMark?

Levy: For starters, the FPMark benchmark suite contains 10 different kernels (Fig. 1), including FFT, ray tracing, Fourier coefficients, a back-propagation neural net simulator, Black Scholes, Arc Tangent, etc. This variety of kernels supports many application areas that utilize floating- point representation. But the thing that makes FPMark really unique is that most of the kernels are implemented as single-precision and double-precision workloads.

Figure 1. The FPMark benchmark suite contains 10 different kernels with different emphasis.

Ultimately, the application itself will determine whether single-precision or double-precision is needed, therefore, FPMark provides both to allow users to make the appropriate comparisons. In the methodology, FPMark specifies the required degree of accuracy for the result. When a compiler builds floating-point code or certain floating-point libraries are used, there is a certain amount of inaccuracy that is generated depending on the optimizations. FPMark requires that the final result is accurate to 30 bits of mantissa (out of 52) and 14 bits of mantissa (out of 23), for double precision and single precision, respectively.

To allow FPMark to be used with a wide range of devices (from low-end microcontrollers to high-end PC processors), there are three workloads for each benchmark (small, medium, and large). Specifically, the small data workloads are appropriate for microcontrollers. Medium data workloads are suitable for mobile CPUs and mid-range devices such as Cortex-A7 and Analog Devices SHARC processors. Large data workloads fit the PC processors and even some high-end mobile devices. Using the Linear algebra benchmark as an example, here’s a chart with a simple memory comparison. But this can be less straightforward. For example, some of the benchmarks have smaller data sizes, but the algorithm itself uses the data repeatedly to derive a result, as would be the case for neural net as it hones in on a node. However, this gives an idea of the difference between small, medium and large, as well as single- and double-precision.

EEMBC Linear algebra Memory Requirements (kbytes)

Small

Medium

Large

Single precision

10.42

40.21

3904

Double precision

20.6

80.08

7810

Wong: With a total of 53 workloads in FPMark, how does one make a simple and quick but meaningful comparison?

Levy: While EEMBC realizes that the true value of this benchmark suite will be seen by closely examining all the detailed scores that are generated, the members also crafted two official FPMark scores for quick comparisons. One score is literally the FPMark, calculated by taking the geometric mean of all the individual scores and multiplying the result by 100 for scaling (our attempt at making sure that no processor has a score of less than one (1). We also created a MicroFPMark, targeted at microcontrollers that aren’t able to run the double-precision-large-data workloads. The MicroFPMark is a geometric mean of the single-precision/small data workloads.

About the Author

Markus Levy

Director of Machine Learning Technologies, NXP Semiconductors

Markus Levy joined NXP in 2017 as the Director of AI and Machine Learning Technologies. In this position, he is focused primarily on the technical strategy, roadmap, and marketing of AI and machine learning capabilities for NXP's microcontroller and i.MX applications processor product lines. Previously, Markus was chairman of the board of EEMBC, which he founded and ran as the President since April 1997. Mr. Levy was also president of the Multicore Association, which he co-founded in 2005.

Before that, he was senior analyst at Microprocessor Report and an editor at EDN magazine. Markus began his career at Intel Corp., as both a senior applications engineer and customer training specialist for Intel's microprocessor and flash-memory products. Markus volunteered for 13 years as a first responder—fighting fires and saving lives.

William G. Wong

Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.