FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings - Databubble