Intel’s Parallel Studio XE 2016 provides tools that deliver hints on how to improve parallelism so compilers can provide optimization. The compilers and libraries support the latest standards such as Java, C++14, C11, Fortran 2008, and the draft version of Fortran 2015 along with OpenMP 4.0 support. It includes support for the latest operating systems including Microsoft Windows 10 and Red Hat Enterprise Linux 7. It also supports Intel’s latest processor platform, Skylake, with AVX-512 as well as the Xeon Phi Knights Landing microarchitecture.
Parallel Studio XE 2016 includes updated versions of tools like Thread Building Blocks (TBBs). TBBs now support task arenas. OS X supports a scalable TBB memory allocator and 64-bit Android support is now part of the Linux mix.
Two of the new analysis tools are the Advisor XE Vectorization Optimization and Thread Prototyping and an updated version of the VTune Amplifier XE Performance Profiler. The approach Intel is to provide developers with an indication of how efficient their code is from a parallel perspective and how to improve this performance as many optimizations are beyond what can be included in the compilers.
Advisor XE profiler targets parallel operations of individual tasks that might take of advantage of features like AVX (Fig. 1). The profiler lets developers focus on “hot loop” that are used a lot to see what vectorization issues might exist including problems that may prevent vectorization. The system provides a gauge of how efficient an algorithm is.
VTune Amplifier XE Performance Profiler targets OpenMP clusters. It includes Tune MP* (Fig. 2) that highlights the efficiency of OpenMP* code. Like Advisor XE, it provides hints to developers who can decide what changes can be made or whether changes would be worth the effort. The profiler can compare results from different configurations to show where improvements are needed. The system also includes bandwidth analysis support for multi-core, non-uniform memory (NUMA) environments including multi-socket systems linked via Intel’s QuickPath Interconnect (QPI). This version has a lightweight trace tool that can handle 32,000 processor ranks. It has less overhead to handle these very large clusters versus the more detailed version that is still available.
Intel’s Performance Primitives 9.0 adds optimizations for Intel’s Quark, Atom, and AVX2 instructions. The new API supports external threading and memory allocations, plus there is an improved CPU dispatcher with auto-initialization support.
Also part of the puzzle is Intel’s Data Analytic Acceleration Library. This includes a range of algorithms from classification and clustering to linear regression and correlation-distance matrix support.
Parallel Studio XE 2016 Cluster Edition ($2,949) includes all the tools and libraries. The Professional Edition ($1,699) foregoes the MPI Library and Trace Analyzer and Collector, but includes the Advisor SE, Inspector XE, and VTune Amplifier XE. The Composer Edition ($699) includes only the C++ and Fortran compilers, Intel Data Analytics Acceleration Library, Threading Building Blocks, Integrated Performance Primitives, Math Kernel Library, Cilk Plus, and Intel OpenMP*. The Rogue Wave IMSL* library is optional for all three. Intel has also made the tools available to students and for non-commercial use for free. The Performance Libraries are available with community licensing that does not include royalties or have any restrictions based on company or project size.