大數據環境下Spark性能優化分析研究與應用
作者:
作者單位:

作者簡介:

通訊作者:

中圖分類號:

基金項目:

2021年廣西氣象科研計劃指令性項目(桂氣科2021ZL02)資助


Research and Application of Spark Performance Optimization Analysis in Big Data Environment
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪問統計
  • |
  • 參考文獻
  • |
  • 相似文獻
  • |
  • 引證文獻
  • |
  • 資源附件
  • |
  • 文章評論
    摘要:

    針對長時間序列、多站點和多氣象要素的大數據量查詢需求,現有的CIMISS(China Integrated Meteorological Information Sharing System)存在支撐能力嚴重不足的問題。本研究使用廣西氣象站點建站至今的歷史地面氣象記錄月報表數據資料和現有Hadoop集群物理資源,重新設計數據ETL流程,構建Parquet格式數據集并完成HDFS轉換存儲;嵌入Spark的Broadcast廣播變量,優化Spark集群執行參數,提高了集群的處理并行度和SparkSql的關聯查詢效率。結果表明,Parquet格式數據集的最高壓縮比超過95%,一次性大數據量的查詢效率比原來提升了1~5倍,并支持高并發訪問,為各類相關預報預測業務的開展提供了有效的技術支撐。

    Abstract:

    Aiming at a large amount of data query requirements of long-time series, multi-sites and multi-meteorological elements, the supporting capacity of the existing CMISS(China Integrated Meteorological Information Sharing System) is seriously insufficient. In this study, the monthly report data of historical surface meteorological records since the establishment of the meteorological stations in Guangxi and existing Hadoop cluster physical resources are used to redesign the ETL process, construct the Parquet format dataset, and complete HDFS conversion storage. Besides, the Broadcast variable of Spark is embedded to optimize the execution parameters of the Spark cluster, which improves the processing parallelism of the cluster and the association query efficiency of SparkSql. The results show that the maximum compression ratio of the Parquet format data set was more than 95%; the query efficiency of the one-time large amount of data was 1 to 5 times higher than the original and supported high concurrent access, providing effective technical support for the development of various related forecasting services.

    參考文獻
    相似文獻
    引證文獻
引用本文

黃志,蘇傳程,蘇曉紅.大數據環境下Spark性能優化分析研究與應用[J].氣象科技,2022,50(1):51~58

復制
分享
文章指標
  • 點擊次數:
  • 下載次數:
  • HTML閱讀次數:
  • 引用次數:
歷史
  • 收稿日期:2021-04-24
  • 定稿日期:2021-09-06
  • 錄用日期:
  • 在線發布日期: 2022-02-28
  • 出版日期: 2022-02-28
您是第位訪問者
技術支持:北京勤云科技發展有限公司
午夜欧美大片免费观看,欧美激情综合五月色丁香,亚洲日本在线视频观看,午夜精品福利在线
>