Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. In this talk, we discuss various types of optimization methods for training large-scale linear classifiers. They range from second-order methods (e.g., Newton-CG) to first-order methods (e.g., coordinate descent or stochastic gradient descent). Although these methods are standard optimization techniques, when applied to machine learning, some adjustments or enhancements are very useful. We investigate how machine learning properties are incorporated in their design. We also check for give machine learning data, how to choose a suitable optimization method. In the end we discuss some future challenges in big-data machine learning.