All patterns
Replicate
Snapshot from Replicate
Verified on 2026-03-09

GPU Hardware Tier Pricing

Pay for compute time by hardware.

Different GPU tiers (T4, A100, H100) with per-second billing. You pay only for the time your model is running.

Aligns cost with actual compute resources used. Hardware tiers let customers optimize for speed vs cost. Per-second billing ensures fair pricing for short inference jobs.

Implementation

This snippet is the closest Owostack implementation of the live pricing shape above. It is not a literal copy of the vendor's internal billing system.

const t4Seconds = metered("t4-seconds", {
  name: "T4 Inference Seconds",
});

const a100Seconds = metered("a100-seconds", {
  name: "A100 Inference Seconds",
});

plan("inference", {
  name: "Inference",
  price: 0,
  currency: "USD",
  interval: "monthly",
  features: [
    t4Seconds.config({
      usageModel: "usage_based",
      pricePerUnit: 81,
      billingUnits: 3600,
      reset: "monthly",
    }),
    a100Seconds.config({
      usageModel: "usage_based",
      pricePerUnit: 504,
      billingUnits: 3600,
      reset: "monthly",
    }),
  ],
});

Rules

Billing starts when inference begins, ends when complete.

Different models can run on different GPU tiers.

Per-second granularity ensures fair pricing.

Ready to implement this pattern?

Join our Discord for implementation help, or book a call to discuss how Replicate's pricing model fits your product.