Groq, a startup developing artificial intelligence chips for data centers, autonomous cars, and other areas, announced the closing of a $300 million funding round, bringing its total haul to more than $365 million since it was founded in 2016.
The funds will boost the company's efforts to hire talent and speed up technology development. The firm plans to double the number of employees to 250 by the end of the year after it doubled its headcount to 122 people in 2020. It is also working on the second generation of its AI silicon for data centers, cars and other markets that it plans to use the funding to get it into production.
The funding transforms it into one of the most deep-pocketed challengers to Nvidia, which has long dominated the AI market with its graphics processing units (GPUs) for data centers.
The startup was founded in 2016 by ex-Googlers. CEO Jonathan Ross previously co-founded the team that invented Google's tensor processing unit, or TPU, specifically designed to run machine learning software in its giant data centers, which it rents to others over the cloud. His co-founder, Douglas Wightman, also worked at Google's "X" research branch, but he no longer works at Groq.
“AI is limited by existing systems, many of which are being followed or incrementally improved upon by new entrants," Ross, who also worked at Google's R&D department, said in a statement. "No matter how much money you throw at the problem, legacy architectures like GPUs and CPUs struggle to keep up with growing demands in artificial intelligence and machine learning."
But instead of out-engineering the TPU or AI chips from its rivals, it started with the software and then it created a simple programmable chip designed from a clean sheet for artificial intelligence. According to Ross, the startup said it worked on the software stack and the compiler that it uses to schedule instructions in its chip for half a year before it started designing the silicon.
Instead of creating a small programmable core and replicating it hundreds or thousands of times, the TSP houses a single core that resembles a VLIW processor that has hundreds of functional units and huge amounts of memory to carry out AI chores. The startup said that it can process 1,000 trillion operations per second, giving it more performance than Nvidia chips with 10 times less latency.
One of the unique improvements is that it pushed most of the control and planning functions for the chip—for instance, managing memory and plotting data movement—to software. That saves valuable silicon real estate, giving it more performance in a more compact area. The company's compiler loads instructions to the correct place at the correct time in the chip, reducing the latency.
This is part of what the startup calls its "software-first mindset." The compiler choreographs the movement of instructions through the chip, resulting in fast and predictable (also known as "deterministic") performance. The software knows exactly how the chip works and how long it will take to carry out each AI computation, which is valuable in areas where safety and accuracy are paramount.
Groq said the new architecture is specifically designed to handle AI inference chores but it can also run the types of computations used to train AI on large amounts of data. It is working with several early customers in the financial industry and national labs that run real-time workloads that require the reduced latency and massive SRAM of the single-core TSP, the company said.
Groq has started shipping its first TSP in server accelerator cards and pre-integrated systems and the startup plans to use the funding in part to scale up production to serve its early customers.
For the first couple of years, the company was largely funded by Chamath Palihapitiya's Social Capital. But as it looks to expand its customer base and put its second-generation product into mass-production, the company is bulking up its cash reserves in part to help it hire engineering talent. "We want to hire a lot more people but we're keeping the talent bar very high," Ross said.
For years, tech giants have siphoned top engineers from the semiconductor industry, but he said that many of the same engineers now want to work with hardware, too. "Talent has been flowing out of the semiconductor industry into software for some time. The best people are at the Facebooks, Amazons, Googles," Ross said. "These are the companies we're hiring from."
Other startups have also amassed huge amounts of funding in a bid to dethrone Nvidia's GPUs, the current gold standard in AI training, and Intel's CPUs, which are widely used in inferencing.
Graphcore, which has partnered with Dell and Microsoft, has rolled out chips with more than one thousand of its intelligence processing unit (IPU) cores that work together to carry out AI chores. The startup is valued at $2.5 billion. SambaNova Systems landed $676 million in funding this month led by SoftBank, bringing its total raised to more than $1 billion and valuing it at around $5 billion.
But in spite of the vast amounts of venture funding, most of these startups have struggled to stand out from the crowd. Nvidia—which has continuously upgraded its general-purpose chips with "tensor cores" ideal for AI training and inferencing—and Intel—which has started selling server chips with integrated AI accelerators—have entrenched themselves deep in data center market.
But the increased capital and new hires will give Groq more flexibility and expertise, which is valuable as it works to roll out its second-generation chip and swipe market share from Nvidia.
The funding round was co-led by investment giant Tiger Global Management and D1 Capital, and it was supplemented by The Spruce House Partnership and Addition, the venture capital firm founded by Lee Fixel. TDK Ventures and other existing investors also invested more money in Groq.