Abstract:Under the distributed computing and storage framework of Hadoop, according to the customed ETL data cleaning rules, based on its year in which it belongs and station number, the hourly singlestation files of mass automatic station data are merged into large files and transferred to the distributed storage HDFS, using the Spark SQL parallel computation framework to deal with and produce the daily statistical values of common meteorological elements, which greatly improves data processing and acquisition efficiency compared with the relational database. The experimental results show that the data processing and querying of multiple meteorological elements, multisite data and longtime series can reach the second level response by using the SparkSQL parallel computing framework, and its advantages are more obvious with the increasing number of statistical sites and the extension of time span. It can support this kind of meteorological data service more efficiently and provide new ideas for the transformation of largescale meteorological data processing from relational database to large data distributed framework.