Analysis Of An Alternate Policy Gradient Estimator For Softmax Policies