||Recent years, computer-aided penalty prediction have been promoted to gain people's trust to the judicial systems, especially in developing Chinese region. In this paper, we propose machine learning based models to predict the legal penalty of criminal cases. Particularly, we focus on drunk driving cases as they are frequent, and the regulations are clear. Unlike western text which words are separated by spaces, words in Chinese text are continuum. In our proposed method, we first use a word segmentation method to separate the Chinese words in text and apply a pre-trained model to convert words into vectors. In the vector space, words with similar meanings have short distance with each other. As the amount of each penalty varies greatly, resulting a data imbalance problem. Therefore, we adapt the Synthetic Minority Oversampling Technique (SMOTE) algorithm as a solution. Finally, we apply deep learning-based models, including Bi-GRU and TextCNN to perform penalty prediction, and compare their advantages and disadvantages.In the experimental result, for drunk driving case penalty prediction, our propose SMOTE + TextCNN solution can reach 73.96% of accuracy. If we allow the prediction to be plus or minus one month from the actual, the accuracy is 95.60%. As for the computation time, our proposed method can predict the penalty of 1,524 drunk driving cases per second.