On Training, Inference, and Sample Efficiencies of Language Models