270,000-sample dataset covering spatial, physical, and embodied action reasoning reduces error rates by 66.6% on 20 capability probes; 100K open subset and fine-tuned ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results