For the advancement of time series classification, we can summarize that most existing methods adopt a common learning-to-classify paradigm - a classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is difficult to learn transferable representation across domains. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task. Specifically, firstly, a time series discretization module is designed to convert continuous inputs into a sequence of discrete tokens to solve the inconsistency issue across modality data. Secondly, we introduce an alignment projected layer before feeding the transformed token of time series into language models. Thirdly, prior to fine-tuning the language model for the target domain, it is essential to emphasize the necessity of auto-regressive pre-training across various modality inputs. Finally, extensive experimentation are conducted on several prevalent public benchmark datasets, indicating the superior performance of the InstructTime. Our code is at https://github.com/Mingyue-Cheng/InstructTime.