Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference - Databubble