Multimodal Learning in Real-world Application: Enhancing Feature Representation and Training Strategies