To confirm this is an architecture problem rather than a model quality problem, Databricks reran published STaRK baselines ...
Abstract: The application of Large Language Models to complex reasoning tasks over large-scale, structured data, such as enterprise tables, is becoming increasingly prevalent. However, this ...