Hadoop環境下基于SparkSQL海量自動站數據查詢統計初探

首頁 > 過刊瀏覽>2019年第47卷第5期 >768-772

PDF HTML閱讀 XML下載導出引用引用提醒

Hadoop環境下基于SparkSQL海量自動站數據查詢統計初探
DOI:
                        
                    
作者:
                        
                        
                    
作者單位:
作者簡介:
通訊作者:
中圖分類號:
基金項目:國家檔案局項目（2016X06）“基于 Hadoop大數據處理的廣西氣象數字檔案館建設”資助

Query and Statistical Analysis of Mass Automatic Station Data Based on SparkSQL in Hadoop Environment

Author:

Affiliation:

Fund Project:

摘要

|

圖/表

|

訪問統計

|

參考文獻

|

相似文獻

|

引證文獻

|

資源附件

|

文章評論

摘要:

在Hadoop分布式計算和存儲架構下，自定義ETL數據清洗規則將海量自動站小時單站文件按所屬年和站號合并為大文件流轉存儲至HDFS中，并運用SparkSQL并行計算框架進行統計處理生成常用氣象要素日統計值。結果表明，數據處理和獲取時效較關系型數據庫方式有顯著提升。采用SparkSQL并行計算框架對多氣象要素多站點和長時間序列進行數據統計處理查詢均能達到秒級別響應，并隨著統計站點數的不斷增加和時間跨度的延長其優勢更為明顯，能更高效地支撐此類氣象數據服務，為海量氣象數據處理從關系型數據庫到大數據分布式架構的轉換處理提供了新思路。

Abstract:

Under the distributed computing and storage framework of Hadoop, according to the customed ETL data cleaning rules, based on its year in which it belongs and station number, the hourly singlestation files of mass automatic station data are merged into large files and transferred to the distributed storage HDFS， using the Spark SQL parallel computation framework to deal with and produce the daily statistical values of common meteorological elements, which greatly improves data processing and acquisition efficiency compared with the relational database. The experimental results show that the data processing and querying of multiple meteorological elements, multisite data and longtime series can reach the second level response by using the SparkSQL parallel computing framework, and its advantages are more obvious with the increasing number of statistical sites and the extension of time span. It can support this kind of meteorological data service more efficiently and provide new ideas for the transformation of largescale meteorological data processing from relational database to large data distributed framework.

參考文獻

相似文獻

引證文獻

引用本文

黃志,詹利群,任曉煒,李濤. Hadoop環境下基于SparkSQL海量自動站數據查詢統計初探[J].氣象科技,2019,47(5):768~772

復制

分享

文章指標

點擊次數:
下載次數:
HTML閱讀次數:
引用次數:

歷史

收稿日期:2018-04-08
定稿日期:2019-05-21
錄用日期:
在線發布日期: 2019-10-27
出版日期:

您是第位訪問者
技術支持：北京勤云科技發展有限公司

午夜欧美大片免费观看,欧美激情综合五月色丁香,亚洲日本在线视频观看,午夜精品福利在线

>