Tightwad
InfrastructureA mixed-vendor GPU inference engine that pools CUDA, ROCm, and Metal GPUs into one endpoint. Speculative decoding over pooled GPUs delivers 1.86× measured speedup on Llama 3.3 70B across 4 consumer GPUs over WiFi. Run models that fit on no single machine — and make them usable.