Electronicdesign 23800 Habana Goya Promo
Electronicdesign 23800 Habana Goya Promo
Electronicdesign 23800 Habana Goya Promo
Electronicdesign 23800 Habana Goya Promo
Electronicdesign 23800 Habana Goya Promo

Habana Enters Machine-Learning Derby with Goya Platform

Sept. 19, 2018
The Goya HL-1000 board is designed to deliver high performance and low latency for machine-learning applications.

Habana Labs has emerged onto the machine-learning (ML) stage with its Goya HL-1000 processor (Fig. 1). The x16 PCI Express Gen 4 board has a 200-W TDP and comes with 16 GB of DDR4 ECC DRAM. It’s aimed at ML inference chores with a forthcoming Gaudi processor targeting ML training. In the meantime, Goya can take advantage of trained deep-neural-network (DNN) models to handle inference.

1. Habana Labs’ Goya HL-1000 is based on a VLIW SIMD vector core with Tensor addressing support.

The HL-1000 is designed to manage high-throughput chores with low latency (Fig. 2). Its performance scales well, handling thousands of images per second for standard ML test applications such as ResNet-50. It can process this model at over 15,000 images/s with a 1.3-ms latency while dissipating only 100 W of power. Typical latency in the industry at this point sits at around 7 ms. Keeping power requirements low in an enterprise setting with many boards is critical to minimizing overall system costs. Habana delivers passive as well as active cooled models.

2. The Goya’s performance and low latency are impressive.

The architecture is based on a Tensor Processing Core (TPC) that’s fully programmable in C and C++ using an LLVM-based compiler. The HL-1000 processor is built on a cluster of eight TPCs (Fig. 3). As with most ML systems, the processor includes hardware general-matrix-multiply (GEMM) acceleration. There are special functions in dedicated hardware along with Tensor addressing and latency hiding support.

3. Multiple, fully programmable Habana Tensor Processing Cores populate the Goya HL-1000.

The system exploits on-die memory that’s managed by software along with centralized, programmable DMAs to deliver predictable, low-latency operation. Although targeted at TensorFlow applications, it works equally well with other framework models. The TPC handles 8-, 16- and 32-bit integers as well as 32-bit floating point.

Habana’s SynapseAI software transforms standard AI models to applications that run on the HL-1000 (Fig. 4). The tools can import models from MXNet, Caffe 2, Microsoft Cognitive Toolkit, PyTorch, and Open Neural Network Exchange Format (ONNX). It can also utilize user-supplied libraries; a Python-based front end helps automate system operation. The SynapseAI runtime manages resources on the processor. IDE support includes a debugger and simulator. Among its tools are real-time tracing and performance analysis that can be graphically presented.

4. Habana’s SynapseAI software transforms standard AI models to applications that run on the HL-1000.

Habana will have lots of competition in this space. Thus, its high performance, lower power, and low-latency characteristics will be key factors in staying ahead of the pack.

Sponsored Recommendations

Board-Mount DC/DC Converters in Medical Applications

March 27, 2024
AC/DC or board-mount DC/DC converters provide power for medical devices. This article explains why isolation might be needed and which safety standards apply.

Use Rugged Multiband Antennas to Solve the Mobile Connectivity Challenge

March 27, 2024
Selecting and using antennas for mobile applications requires attention to electrical, mechanical, and environmental characteristics: TE modules can help.

Out-of-the-box Cellular and Wi-Fi connectivity with AWS IoT ExpressLink

March 27, 2024
This demo shows how to enroll LTE-M and Wi-Fi evaluation boards with AWS IoT Core, set up a Connected Health Solution as well as AWS AT commands and AWS IoT ExpressLink security...

How to Quickly Leverage Bluetooth AoA and AoD for Indoor Logistics Tracking

March 27, 2024
Real-time asset tracking is an important aspect of Industry 4.0. Various technologies are available for deploying Real-Time Location.

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!