Can Vision Language Models Judge Action Quality? An Empirical Evaluation - Databubble