Moving Targets in AI

In the face of rapid changes in the field of AI, Programme Director Suraj Bramhavar reflects on evolving benchmarks within ARIA's Scaling Compute programme.

Feb 18, 2025

The thesis that started the Scaling Compute programme and continues to be affirmed by developments in the field of AI is that there is a fundamental asymmetry between our demand for more compute and its supply. Technologies that enable increased compute supply will have major economic, geopolitical, and societal implications: We need research and inventions that help us rethink both our existing hardware infrastructure and training paradigms.

This programme was set up to address these challenges, with the overarching goals of reducing the cost to train large-scale AI models by 1000x, diversifying the semiconductor supply chain, and providing entirely new vectors for continued computing progress.

Since we set the programme targets (almost a year ago!), a lot has changed in the landscape of AI. With work now underway across 12 teams from across industry and academia, we recently convened these teams to share progress, but also to ask some fundamental questions: Given the commercial incentives in a fast-moving industry, how do we remain focused on ARIA’s goal to build technologies that are underserved relative to their impact AND that will be relevant on a decadal timescale? How do we know what to shoot for? We want to ensure that we maintain ambitious and relevant targets that can change the conversation about what is possible.

Two recent external events highlight this pace of change:

OpenAI released its ‘chain-of-thought’ models, introducing an alternative scaling pathway to improve the power of modern AI models via repetitive inference (thus decreasing the importance of scaling pre-training).
DeepSeek released a technical report demonstrating significant algorithmic innovations for training significantly larger models than the ‘large’ GPT3 model identified above, highlighting both the pace at which relevant workloads evolve AND the innovation speed of large AI labs.

Figure 2: Shows 3 graphs measuring total cost in the pound sterling (vertical axis ranging from £10 to £1billion) and time (horizontal axis ranging from 0 to 60 minutes) for 3 AI models of differing sizes: GPT3 (Large) Stable Diffusion (Medium) and ResNet50 (Small). For each graph the MLPerfBench and Target is plotted. The GPT3 graph shows the MLPerf benchmark starting at a cost of 100 million pounds and reducing to just below 10 million pounds across 60 minutes of time. The target plot for GPT3 shows a starting cost of 1 million pounds reducing to a cost of just below 10 thousand pounds across 60 minutes of time. The Stable Diffusion graph shows MLPerf benchmark starting at a cost of 10 million pounds reducing to a cost of 100 thousand pounds across 50 minutes of time. The target plot for Stable Diffusion shows a starting cost of 100 thousand reducing to 100 pounds across 50 minutes of time. The ResNet50 graph shows a MLPerf Benchmark starting at 1 million pounds reducing to just less than 100,000 pounds across 30 minutes of time. The target plot for ResNet-50 starts at 10 thousand pounds and reduces to around 50 pounds across 35 minutes of time.

Figure 1. The current cost of training large AI models and the programme target for training workloads from MLPerf Benchmarks.

We focused our original programme targets on the cost to pre-train large-scale AI models (Figure 1), concluding that a significant amount of commercial interest was being paid to inference, and that much of the societal value stems from our ability to teach computers new capabilities. What we failed to specify (or predict) was that AI model training would take three valuable forms: pre-training, fine-tuning, and more recently, repetitive inference. Each form creates different demands on the underlying hardware.

It is clear now that attempting to predict which pathway will be most impactful in the years ahead may be difficult. We therefore put the following questions to our Creators:

Should we replace our cost targets above with standard hardware-level targets as our north star (shown below)?
Which models/workloads should we focus on?
How can ARIA help?

Figure 2. Plot showing performance of existing commercial AI accelerators. ARIA performance target shown above would result in a 1000x reduction in cost of AI hardware.

Community Response

When posed with the first question, our community overwhelmingly confirmed that the original focus on cost should remain the most appropriate goal (as opposed to specific hardware targets such as OPS/W). The shared rationale was that hardware performance specs inevitably miss many aspects of technological capability, and while technology translation is governed by many factors, the most significant single metric governing translation is cost (so long as all costs are properly accounted for).

The second question presents a more pressing challenge: how can the hardware research community keep pace with an incredibly nimble and extremely well-funded industry who ultimately serve as ‘customers’ for the hardware our Creators seek to build. Striking the right balance between technologies that are too risky for industry efforts yet relevant enough to maintain engagement is a perennial challenge.

The resulting discussion surfaced a few critical items as valuable additions to the Programme:

Thorough (and continuous) baseline cost accounting of existing commercial hardware.
Rapid assessment of relevant workloads and algorithmic optimisations being applied to this hardware.

The DeepSeek News

Much of this discussion comes on the heels of widely distributed reports from DeepSeek demonstrating impressive performance optimisations for training leading-edge AI models. While many popular media reports mischaracterise the magnitude of the achieved cost gains, this should not diminish the significance of what was achieved, and more importantly, it highlights the impact that can be realised by concentrated groups of world-class talent.

For the ARIA community, the news event serves as a reminder that algorithms research moves at breakneck pace, and that the hardware-level technologies being developed under this Programme should strive to either: a) compound atop the advances of this community, or b) catalyse the emergence of entirely new branches of algorithms research.

A New Funding Call For AI Hardware Benchmarking

It became exceedingly clear that, with all the commercial hype surrounding the space, there is a yearning for rapidly updated sources of ground-truth in a world where the ground is continuously shifting beneath our feet. In order to demonstrate technologies that blow past existing goalposts, we need to make sure we know where these goalposts actually are, and track them as they move.

A direct action taken from this discussion is that ARIA will onboard a team to provide this service. Existing benchmarking frameworks such as MLPerf will serve as a foundation from which to build on top of. The chosen team will:

Select a small set of AI models representing the current state-of-the-art, profile the operation of these models on the latest commercial hardware, and report on the cost and existing performance bottlenecks.
Update these reports continuously as the industry evolves and publish quarterly reports of their findings to the global R&D community, serving as an open, nimble, and scientifically grounded source of information.

Ultimately this work will help ensure that each of the ambitious technologies developed within the Programme are measured against the most up-to-date advances in the field.

The request-for-proposals is open until 10 March, with benchmarking activities running from May 2025 through the remainder of the programme in Autumn 2027.

Discussion about this post

HungryDaneAxe

May 24

ARIA!

You are being used in the same way the Nazi scientists where! Hence it would follow that you SHALL be dealt with in like manner! I just hope I'm there to witness it!

What follows is a compilation of material relating to the true purpose of your work!

If the Nazi scientists had known the full details of what they where actually engaged in and facilitating through their efforts, do you think they would have refused and stopped? I guess that's a question that you all need to spend some time with! May God have mercy upon you all!

Job 36:17

"But thou hast fulfilled the judgment of the wicked: judgment and justice take hold on thee."

Deborah Tavares: IEEE Technology Time Machine - Symposium On Technologies Beyond 2035: https://rumble.com/v6trcxj-deborah-tavares-ieee-technology-time-machine-symposium-on-technologies-beyo.html?e9s=src_v1_mfp

U.S Military Academy West Point 2018: Dr. James Giordano The Brain is the Battlefield of the Future: https://rumble.com/v27evsk-dr.-james-giordano-the-brain-is-the-battlefield-of-the-future.html

Short version: https://rumble.com/v4mfsxl-the-brain-is-the-battlefield-of-the-future-darpas-neuro-biological-agents.html

Bluetruth/The Great Reset Vaccine: (Biosensing Graphene oxide Vaccines):

1- https://rumble.com/v5eqhzt-proof-of-bluetooth-in-vaccines.html

2- https://rumble.com/v20xvgk-documentary-bluetruth.-scientific-proof-the-vaxed-emit-a-bluetooth-signal.-.html

3- https://rumble.com/v6topy1-dr-nagase-were-seeing-signs-of-nanotechnology-bluetooth-antenna-receivers-i.html?e9s=src_v1_mfp

4-https://rumble.com/v4frqyr-nano-particles-for-controlling-and-cell-death.html?e9s=src_v1_upp

5-https://rumble.com/v6td2nl-what-is-the-iobnt-and-wban-101-check-the-description-for-more-information-a.html?e9s=src_v1_mfp

6-https://www.youtube.com/watch?v=vYDRI9ept_o

Between Two Ages 1982: The Technocratic Age | Zbigniew Brzezinskihttps: //www.thevoid.uk/void-post/between-two-ages-americas-role-in-the-technocratic-age-zbigniew-brzezinski/

Versarien Products: A UK Advanced Materials Company Developing 2D Technology

Who are we? https://versarienproducts.co.uk/materials

Versarien plc (AIM: VRS) is an IP-led advanced engineering materials group that utilises proprietary technology to create innovative engineering solutions. Versarien holds more than 130 patents covering areas including the manufacture and use of graphene and related materials (GRMs) in diverse applications. We develop and manufacture advanced materials and products globally through a number of subsidiaries, and have the widest portfolio of high-quality verified products.

Our technology originates from the Universities of Manchester, Cambridge and Ulster, where our R&D laboratories are located. The graphene powders (Nanene™) that are manufactured by Versarien Graphene Limited are registered with the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) under Article 20(2) Annex VII. 2-DTech Ltd is also regulated by REACH in both the EU and UK markets.

GOV.UK ; Human Augmentation – The Dawn of a New Paradigm

Details

Scientific and technological developments related to human augmentation are beginning to accelerate and converge with other fields such as sensors, artificial intelligence, novel materials, nanotechnology and additive manufacturing.

This publication offers a conceptual model for thinking about the subject, an overview of the future direction of human augmentation and related fields of study, and identifies key implications and insights for Defence.

Who should read this publication

This publication seeks to inform a wide audience across the defence and security sector. It will be relevant to those involved in: policy and strategy formulation; science and technology; concepts and force development; capability and acquisition; procurement; personnel and workforce planning; and operational commanders and their staffs.

Much more information available upon request!

Expand full comment

No posts

ARIA