Software Week 2016: Intel

Date

2016年9月12日 (月) 10:00 15:00

Location

Lab3 LevelC C700

Description

Knights Landing: 2nd Generation Intel "Xeon Phi" Processor (1 hour + Q&A)

 

Date: September 12th, 2016 10:00-11:00
Room: Lab3 LevelC C700
Presenter: Horikoshi-san 

 

This session describes the architecture of Knights Landing, the second-generation Intel Xeon Phi product family, which targets high-performance computing and other highly parallel workloads. It provides a significant increase in scalar and vector performance and a big boost in memory bandwidth compared to the prior generation, called Knights Corner. Knights Landing is a self-booting, standard CPU that is completely binary compatible with prior Intel Xeon processors and is capable of running all legacy workloads unmodified. Its innovations include a core optimized for power efficiency, a 512-bit vector instruction set, a memory architecture comprising two types of memory for high bandwidth and large capacity, a high-bandwidth on-die interconnect, and an integrated on-package network fabric. These features enable the Knights Landing processor to provide significant performance improvement for computationally intensive and bandwidth-bound workloads while still providing good performance on unoptimized legacy workloads, without requiring any special way of programming other than the standard CPU programming model.

MCDRAM (High Bandwidth Memory) on Knights Landing - Analysis Methods/Tools(1 hour + Q&A)
Date: September 12th, 2016 11:00-12:00
Room: Lab3 LevelC C700
Presenter: Horikoshi-san 

The Intel’s next generation Xeon Phi™ processor family x200 product (Knights Landing) brings in new memory technology, a high bandwidth on package memory called Multi-Channel DRAM (MCDRAM) in addition to the traditional DDR4. MCDRAM is a high bandwidth (~4x more than DDR4), low capacity (up to 16GB) memory, packaged with the Knights Landing Silicon. MCDRAM can be configured as a third level cache (memory side cache) or as a distinct NUMA node (allocatable memory) or somewhere in between. With the different memory modes by which the system can be booted, it becomes very challenging from a software perspective to understand the best mode suitable for an application. At the same time, it is also very essential to utilize the available memory bandwidth in MCDRAM efficiently without leaving any performance on the table. Our tutorial will cover some methods/tools users can exploit to analyze the suitable memory mode for an application. In addition it will also cover the use the ”memkind” library interface which is basically a user extensible heap manager built on top of jemalloc. This enables the users to change their application memory allocations to the high bandwidth MCDRAM as opposed to the standard DDR4.

Expressing Multi Parallelism (Thread, Vector and Heterogeneous) with OpenMP 4.x  – Hands on (1 hour + Q&A)​​
Date: September 12th, 2016 13:00-14:00
Room: Lab3 LevelC C700
Presenter: Sugawara-san  

Introducing explicit vector parallelism vs automatic vectorization. Discussed topics include: the concept of SIMD operations, using OpenMP pragma to vectorize code with Intel compilers for loops. This session also covers  array notations and SIMD-enabled functions.

Optimization of memory traffic – Hands on (1 hour + Q&A)​
Date: September 12th, 2016 14:00-15:00
Room:Lab3 LevelC C700
Presenter: Sugawara-san 

Introducing the requirement of data access locality in space and time and demonstrate techniques for achieving it: loop tiling, cache-oblivious recursion, loop fusion and parallel first touch. Example application performing matrix-vector multiplication is optimized using these techniques. The hands-on part demonstrates the application of the discussed methods to the matrix-vector multiplication code.

All-OIST Category: 

Subscribe to the OIST Calendar: Right-click to download, then open in your calendar application.