Forge CLI

Swarm agents optimize CUDA/Triton for any HF/PyTorch model

1232

Featured: yesterday

Hardware Developer Tools Artificial Intelligence

What is Forge CLI?

Forge generates optimized GPU kernels from any PyTorch or HuggingFace model. 32 parallel Coder+Judge agents compete to find the fastest CUDA/Triton implementation. Up to 5× faster than torch.compile(mode='max-autotune') with 97.6% correctness. Enter HuggingFace model ID, get optimized kernels for every layer. Powered by optimized NVIDIA Nemotron 3 Nano 30B at 250k tokens/sec. "Full refund if we don't beat torch.compile"

AD

Advertise Here

Reach developers and makers who are actively looking for new tools and products.

Today's Hot Products

Next update in

--

:

--

LEGO SMART Play

Bringing LEGO creations to life like never before

Conversation API

Build chatbots with memory using just an API

Dessix

Visual workspace to capture, organize, and create with AI

Market Terminal™

Wall Street‑style market terminal to track trades

Graysky 2.0

A faster alternative client for Bluesky

Planelo

The idea-first hub for developers

AgentNotch

Real-time AI coding assistant telemetry in your Mac's notch

Gridfy.io

Live interactive widgets from Airtable, Notion or Sheets

NotifyGate

One Gate for all your Notifications

Claw Executive OS

Digital chief of staff that automates your files with AI

Products similar to Forge CLI

NativeBridge

NativeBridge

Automate mobile testing on real devices with AI

Giselle

Giselle

Build and run AI workflows. Open source.

Instruct 2.5

Instruct 2.5

The most capable way to automate your work

A2UI

A2UI

A safe way for AI to build UIs your app can render

HMPL

HMPL

Lightweight server-oriented template language for JavaScript

PostSyncer

PostSyncer

AI Content Maker, for Social Media Publishing