When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark
Benchmarks one-shot prompting vs plan-execute-replan with tools on Event-QA and CMV, quantifying the accuracy/latency trade-off and showing when tool orchestration helps or hurts.