- The short — why allocation design matters
- Demand scenarios & slot models
- Fair-share scoring rubric (publishable)
- SLOs, caps & transparency
- Abuse controls & audit trails
- Regional hubs: infra & sustainability
- Playbooks (startups, academia, operators, policy)
- Application form blueprint & KPIs
- FAQs
The short
- Scarcity is policy: Allocation rules steer research & products for years.
- Hybrid wins: Scorecard + credits + random tie-breakers balance merit and access.
- Guardrails: Org caps, use-it-or-lose-it credits, and safety reviews curb hoarding & misuse.
- Trust: Weekly utilization dashboards & unlock calendars sustain legitimacy.
Demand scenarios & slot models
| Scenario | Profile | Best model | Risk | Mitigation |
|---|---|---|---|---|
| Exploration surge | Many small fine-tunes | Usage credits + caps | Credit stacking | Org KYC, rolling hourly caps |
| Pretraining wave | Few very large jobs | Peer review + scheduled blocks | Queue starvation | Reserve small-job lanes |
| Mixed demand | Research + productization | Hybrid (score + credits + lottery) | Score gaming | Random tie-breakers, audits |
Fair-share scoring rubric
Score = 0.35·Impact + 0.25·Execution + 0.20·Public-good + 0.10·Inclusion + 0.10·Safety
Impact (35%)
- Clear user need; estimated users/beneficiaries
- Sector priority (health, education, MSME enablement)
Execution (25%)
- Data readiness; baselines; eval plan
- Team track record; milestones
Public-good (20%)
- Open code/data; benchmark submissions
- Reproducibility commitments
Inclusion (10%)
- Tier-2/3 or non-metro institutions
- First-time grantee bonus
Safety (10%)
- Eval suite & red-teaming
- Data consent & provenance
SLOs, caps & transparency
| Dimension | Target | Why |
|---|---|---|
| Queue time | P95 start ≤ 48h (small), ≤ 7d (large) | Predictability & fairness |
| Org cap | ≤ 5% monthly GPU-hours/org | Anti-hoarding |
| Use-it-or-lose-it | Credits expire in 7 days if idle | Recycle to waitlist |
| Transparency | Weekly utilization & anonymized project list | Trust, auditability |
| Safety | Restricted dual-use; model cards for big runs | Responsible access |
Abuse controls & audit trails
Threats
- Proxy training for large sponsors via shell orgs
- Multi-ID stacking to bypass caps
- Undeclared dual-use/bio risks
Controls
- Org-level KYC; funding disclosure
- Anomaly detection on job graphs & credit spend
- Pre-run safety attestation; post-run model cards
Audit trail
- Immutable logs of queue, allocations, artifacts
- Quarterly independent review & public summary
Regional hubs: infra & sustainability
| Design lever | Choice | Reason |
|---|---|---|
| Power | Long-term PPAs; renewable mix | Cost & stability; ESG |
| Cooling | Liquid/immersion where feasible | Lower PUE; higher density |
| Network | Fiber adjacency; peering | Latency for interactive jobs |
| Placement | Multiple hubs | Resilience; inclusion via proximity |
Playbooks
Startups
- Prefer LoRA/QLoRA & quantization over full pretraining.
- Batch & checkpoint; align runs to SLO windows.
- Publish evals; claim public-good rubric points.
Academia
- Co-PI with Tier-2/3 labs for inclusion credit.
- Open datasets & baselines; reproducibility kits.
- Coordinate around teaching calendars; use reservations.
Operators
- Expose telemetry APIs; auto-resume jobs.
- Credit wallets with org-level controls.
- Incident response SOPs & public postmortems.
Policy
- Carve-outs: 40% academia, 40% startups/SMEs, 20% public-interest.
- Quarterly rebalancing by utilization data.
- Independent ethics & safety board with veto power.
Application form blueprint & KPIs
Applicant fields
- Problem statement; expected beneficiaries/users
- Data provenance & consent
- Baseline metrics & eval plan
- Public-good commitments (open code/data)
- Org links & funding disclosures
Program KPIs
- Wait times (P50/P95) by job size
- Utilization by cohort (acad/startup/public)
- Outputs: papers, open datasets, models shipped
- Inclusion: Tier-2/3 share; first-time grantees
FAQ
- Will auctions shut out academia? Use hybrid: grant credits + org caps + peer-review lanes.
- How to stop proxy training? Org KYC, funding disclosure, anomaly detection, audits.
- What about safety? Restricted dual-use; mandatory evals & model cards for large runs.