Select for better learning: identifying high-quality training data for a multimodal cyclic transformer