Deep Weakly Supervised Learning for Whole Slide Image Representation: A Multimodal Approach