A buffet of presentations covering video processing, processors and parallel processing, memory technologies, communications, and novel silicon applications is sure to keep just about every attendee's plate full at this year's Hot Chips show. Overall, 27 presentations, two keynote addresses, two tutorials, and two panels are on tap (see the table) for Aug. 20-22 at Stanford University in Stanford, Calif.
Justin Rattner, chief technology officer and Senior Fellow of Intel, will chair the first panel, discussing "cool codes." The second panel will cover collaborative innovation, delivered by Bernard Meyerson, Fellow, vice president of strategic alliances, and chief technologist for IBM's Systems & Technology Group. The two tutorials will cover multicore programming and wireless challenges and opportunities in the home.
But that's not all. Here's a look at six of the latest "hot chips" that give the show its name.
If you're looking for a chip that bundles service solutions such as IP, ATM, Ethernet, MPLS, and Frame Relay at GbE, OC-48c, and OC-192c speeds, consider the Agere APP300 programmable network processor (Fig. 1). It handles layer 2 through 7 protocol processing, buffer management, traffic shaping, data modification, and per-flow policing.
Also, the APP300 offers wired and wireless communications by working with physical interface devices and the backplane fabric. It targets various OEM systems, including routers, switches, DLSAMs, Wireless BTS, RNC, and Gateway systems.
"The niche applications for the Agere chip include wireline and wireless multiservice access equipment," says Charlie Hartley, an Agere spokesman. "These include digital subscriber-line access multiplexers and third-generation wireless basestations."
The APP300 handles data streams from industry-standard interfaces and then analyzes the data to classify the frames or cells before packet transformation and transmission. It also supports two datapath interfaces that may be configured into five different ports, with each port supporting multiple standards. On top of that, the APP300 provides:
- Support for 802.1d/s/w/Q/P/ad
- Intelligent broadcast/multicast filtering
- ATM switching
- VCI/VPI merge
- PPP over Ethernet/ATM (PPPoE/A)
- IGMP snooping
- DiffServ queuing using DSCP
- Network address translation (NAT)
- Congestion-management EPD and PPD
- OAM per 1.610 standard.
The APP300 can be updated as new protocols and applications become available. To support application development, Agere provides a combination of a high-level programming language, application-level application programming interface (API), reference code library, and development suite. The suite includes a simulator, a traffic generator, and a debugger. Simultaneous hardware and software design is possible. The APP300 devices range in price from $35 to $200 in volume lots.
If you require a large number of high-speed Ethernet interfaces, there's the FocalPoint chip from Fulcrum Microsystems. The FM2224 comes standard with 24 XAUI (CX4) independent Ethernet interfaces (Fig. 2). Each interface supports the 10G, 2.5G, 1G, 100M, and 10M modes. Based on 130-nm process technology, the FocalPoint chips also offer:
- Low latency (200 ns ball to ball)
- Layer 2 switching capabilities
- Support for 802.1P/Q/D/X/s/w, 802.3ad/x standards
- Queue management
- Packet filtering and monitoring
- Several configuration modes
- Power consumption that's less than 150 mW/Gbits/s/active interface.
The FM2224 can be used as the backplane fabric or system-level interconnect for a variety of platforms and applications, including:
- Blade computer backplane fabric
- Cluster interconnect
- NAS interconnect
- Workgroup and enterprise 1G Ethernet aggregator
- ATCA platform backplane fabric.
"We're seeing tremendous interest in our FocalPoint Ethernet switch chip family," says Mike Zeile, vice president of marketing for Fulcrum Microsystems. "FocalPoint's combination of low latency, high throughput, and high port density has really struck a chord with computing and storage equipment providers who now consider 10G Ethernet the interconnect technology of choice for their next-generation data center deployments."
Fulcrum offers the FM2224 as part of an integrated evaluation platform to help accelerate design-in of the device. The platform comes standard with 24 CX4 connectors in a 1U chassis. The list price for the FM2224 is $450 in quantities of 5000 units.
As designers take full advantage of complex HDTV signals, they need to deal with computationally challenging codecs and post-processing. Currently, parallel data processes dominate this computation. Couple that challenge with the ever-increasing costs associated with building ASICs based on leading-edge process technology, and you're looking for solutions such as Connex Technology's CA1024 (Fig. 3). This massively parallel system-on-a-chip (SoC) can process complex HDTV signals.
ASICs and ASSPs can quickly become outdated with the rise and fall of the latest codec and display processing algorithm. Not so with the CA1024. It can support any codec and post-processing algorithm without additional logic by using its massive array of simple processor cores optimized to exploit inherent digital-video data parallelism.
In addition, the CA1024 includes a memory structure with enough bandwidth to handle multiple high-definition video streams. And unlike reconfigurable gate-array alternatives, the device provides a compact, homogenous vector processor.
"Optimized for high-definition (HD) digital video processing, Connex has developed a massively parallel microprocessor architecture that provides customers with the performance and cost of ASICs while providing a sequential programming model in high-level C," says Anand Sheel, vice president of product management at Connex Technology.
"Large-tier customers have shown a tremendous desire to be able to implement their proprietary IP. The CA1024 will allow Connex to be uniquely positioned to address this growing market demand in a highly competitive landscape at cost effective prices," Sheel adds.
The CA1024 supports dual MPEG-2 transport streams and simultaneous decoding of dual HD H.264, VC-1, and MPEG-2 video (Fig. 4). Yet it leaves enough processing bandwidth to support simultaneous advanced pre-and post-processing signal algorithms. For more process-hungry applications, it offers linear scalability by allowing multichip implementations using expansion ports.
Applications include the high-definition set-top box (STB), integrated digital TV (iDTV), digital media adapter, and multistream digital video recorder (DVR). Other features include:
- Two configurable A/V channels
- Support for any combination of BT.601/656, 8/16/24-bit RBG/YCrCb, and MPEG-2 TS
- 2x I2S audio input
- Dual 656/709 SD/HD digital video outputs
- 4x I2S or 1x S/PDIF digital audio output
- Dedicated MPEG-2 DVB/ATSC-compliant transport
- Integrated "glueless" DDR DRAM controller
- Dual-stream audio and HD video decode
- HD video encode/transcode
- Audio encode
- Glueless NOR flash
- On-chip 32-bit host CPU.
Connex Technology offers development kits with application-specific reference designs, board support packages, and relevant software development kits to get you up and running quickly. The CA1024 will cost $30 in quantities of 10,000 units. It's expected to sample later this year.
Meanwhile, Philips unveils a new generation of Nexperia hybrid television processors that support ATSC and NTSC broadcast standards and target the LCD TV market (Fig. 5). Built on 90-nm technology, they deliver features typically only associated with high-end televisions. The Nexperia PNX8535 includes:
- A silicon tuner front end with integrated VSB/QAM channel decoder
- An image co-processor that provides 16 picture enhancement features, such as edge-dependent de-interlacing, noise-reduction algorithms, MPEG artifact reduction, and a 3D comb with auto-adaptive 2D/3D switching
- Digital-sound-processing algorithms that support standards like Virtual Dolby and SRS TruSurround
- 32-bit RISC and very long instruction-word processors with instructions optimized for video processing
- An MPEG-2 decoder capable of decoding [email protected] and [email protected] streams
- A DVB/DES/MULTI2-compliant transport stream de-multiplexer/ descrambler
- An analog audio decoder for demodulation of sound sources, such as FM A2, BTSC, EIAJ, and NICAM
- Digital-IF decoding for Low-IF and Direct-IF sources
- A 1f/2f analog video decoder that converts CVBS or Y/C signals into YUV format and supports PAL/NSTC and SECAM
- Support for the reception of all analog and digital U.S. public broadcast standards.
The PNX8535's external interfaces include a 90-Mpixel/s low-voltage differential signaling transmitter and HDMI/DVI receiver that supports HDCP. Also included are two video digital-to-analog converters, a digital 24-bit RGB output, and a high-speed DDR1/2 memory interface. And, its complete production-ready hybrid analog/digital television software stack includes a reference user interface and tools for customization.
Furthermore, a hardware and software reference kit (TV520/20) supplies the necessary components required to build an analog/digital television. The kit includes middleware and application software with a common GUI that allows for optimized picture settings for either analog or digital streams. With it, TV manufacturers can produce a bill of materials totaling less than $45 for the required analog and digital processing functionality.
"The Nexperia TV520 family is expressly designed to facilitate the transition to digital TV at very competitive price points," says Jos Klippert, marketing director, digital TV solutions, Philips Semiconductors.
"Today, people want to have digital reception functionalities on the TV and with this become truly connected consumers," he adds. "As high-definition TV content becomes increasingly available, they are seeking LCD TVs that offer the highest-quality viewing experiences in both analog and digital at an affordable price. LCD TVs built on the TV520 family are ideally positioned to fulfill both those needs and will greatly expand the market for new TV technology."
MULTICORE AND MULTITHREAD CHIP MULTIPROCESSORS
Sun Microsystems soon will offer Niagara2, its next-generation low-power multicore server CPU (Fig. 7). With eight cores running up to eight threads each, it can run up to 64 threads simultaneously. Each core has a dedicated floating point unit and L2 cache.
Based on 65-nm process technology and an improved pipeline, it also will come standard with four integrated memory controllers featuring fully buffered dual-inline memory module (FBDIMM) memory support, two integrated 10/1 Gbit Ethernet ports, an x8 PCI Express port, and substantially improved single-thread performance. Also, Niagara2 will provide integrated cryptographic functions. Each core supports block ciphers and hash functions, including AES, RC4, SHA, MD5, RSA, and elliptic curve cryptography (ECC).
With Niagara2, there's no connection between cores. Also, the only resource competition will involve memory and L2 cache. Each thread is a lightweight process scheduled by the operating system with its own copy of registers. Threads compete for L1 cache, address translation buffers, and the ALU. Applications may be scaled linearly across threads.
Since competing processors use an average of 150 W, Sun engineered the original Niagara to use just 70 W. With data-center power, cooling requirements, and space costs all on the rise, the power savings will be a welcome change.
"It's time the technology industry took a stand. Tripling your data-center performance shouldn't mean tripling your power bill and needing more coalfired power plants. It's becoming more obvious by the day that extreme efficiency is good for the environment and good for business," says Jonathan Schwartz, president and chief operating officer of Sun Microsystems.
"There are proof points everywhere, from hybrid auto companies that can't keep up with demand to fuel-efficient aircraft dominating the marketplace," he continues. "Customers want this same eco-responsibility in their datacenters. Our UltraSPARC T1 systems deliver radical performance improvement without the sticker shock of energy costs associated with IBM's Powerbased systems."
Designers can try out either of the Niagara (UltraSPARC T1) Sun Fire servers free for 60 days. Base-model pricing on the T1000 lists for $3495. The T2000 lists for $9045.
And then, AMD's Opteron is the only X86 chip on the market offering a dual-core 64-bit X86 architecture and integrated Northbridge. The company is expanding this offering, and by next year, it will have an Opteron based on 65-nm process technology with:
- True quad-core die
- Four 16-bit or eight 8-bit Hyper-Transport links.
- Enhanced branch prediction
- Out-of-order load execution
- Up to 4 double-precision (DP) FLOPS/cycle
- Dual 128-bit SSE data flow
- Dual 128-bit loads per cycle
- Bit-manipulation extensions ( LZCNT/POPCNT)
- SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)
- Enhanced Direct Connect Architecture and Northbridge
- HT-3 links (Up to 5.2GT/sec)
- Enhanced crossbar
- DDR2 with migration path to DDR3
- FBDIMM when appropriate
- Enhanced power management
- Enhanced RAS.
Each core also will integrate a dedicated L1 and L2 cache. All cores will share an L3 cache. The dedicated L1 cache will help keep critical data local, suppress latency, and provide a hit rate approaching 95%. The dedicated L2 cache will help eliminate conflicts common in shared caches. The L3 cache will provide optimized memory use for a multicore environment.
The Opteron targets the small and medium-sized server market. It provides the greatest benefits to fast database transactions, support for several ecommerce users, graphic-intensive tools such as CAD and DCC, and processor-intensive financial and scientific tools. Pricing for the Opteron Dual-Core chips ranges from $316 for the Model 265 to $2149 for the Model 885 in quantities of 1000 units.
"The success of the 64-bit dual-core AMD Opteron processor in the server space is due in large measure to innovations in the Northbridge," says Pat Conway, principal member of AMD's technical staff.
"AMD's new processor interface balances system traffic across multiple high-bandwidth HyperTransport ports and the integrated memory controller reduces memory latency," Conway continues. "AMD's Direct Connect Architecture helps lower power and cost by completely eliminating the need for external glue chips like switches and memory controllers."