Spatio-Temporal Learning For Video Segmentation And Tracking