GRAPES_MESO與WRF模式在鯤鵬平臺上的高性能計算特征分析
作者:
作者單位:

作者簡介:

通訊作者:

中圖分類號:

基金項目:

國家自然科學基金項目(42475038,42030610)、浙江省科技計劃項目“尖兵領雁+X”研發攻關計劃(2024C03256)、浙江省自然科學基金項目(LY21D050001,LGF21D010001)、浙江省氣象科技計劃重點項目(2022ZD14)共同資助


Analysis of High-Performance Computing Characteristics of GRAPES_MESO and WRF Models on Kunpeng Platform
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪問統計
  • |
  • 參考文獻
  • |
  • 相似文獻
  • |
  • 引證文獻
  • |
  • 資源附件
  • |
  • 文章評論
    摘要:

    本文選取GRAPES_MESO(Global/Regional Assimilation PrEdiction System-Mesoscale version)模式和WRF(Weather Research and Forecasting Model)模式在國產鯤鵬(KUNPENG)平臺上開展數值模式計算特征分析,并與英特爾(X86)平臺進行對比,探討數值模式在鯤鵬平臺上資源使用、計算瓶頸、熱點函數等方面的改進空間。結果表明:經過適配后,兩個模式在國產KUNPENG平臺上能得到與英特爾X86平臺一致的計算結果,呈現出較好的并行擴展性;兩個模式對CPU的使用率均較高,計算瓶頸主要集中在后端CPU瓶頸,對節點的整體內存使用率適當,后續優化主要集中在代碼效率、算法、訪存等方面。在KUNPENG平臺上,可以考慮通過優化集合通信的Collective Sync、Allreduce和Wait算法,來改善GRAPES_MESO模式的MPI的通信效率;可通過優化GCR算法、以uct、ucg為代表的集合通信熱點、以expf、powf等為代表的數學函數、malloc內存操作等熱點函數對GRAPES_MESO模式進行優化。

    Abstract:

    The GRAPES_MESO and WRF models are used to analyse the computational characteristics of numerical models on the KUNPENG platform, and are compared with the Intel (X86) platform to explore the improvement space of numerical models in resource utilisation, computational bottlenecks, hotspot functions, and other aspects on the KUNPENG platform. The results indicate that: (1) After adaptation, both models obtain consistent results on the domestic KUNPENG platform as on the X86 platform. (2) Both models exhibit good parallel scalability on both X86 and KUNPENG platforms. When using the same number of processes, the computing efficiency of the KUNPENG platform is 65% to 90% of that of the X86 platform. However, when using the same number of nodes, the computing efficiency of the KUNPENG platform exceeds that of the X86 platform by 22% to 45%. (3) In terms of hardware resource utilisation, the two models consume the most time in computing, followed by communication, and finally IO. The models have a higher CPU usage rate, appropriate memory usage of nodes, and the subsequent optimisation mainly focuses on code efficiency, algorithm, memory access, etc. (4) In terms of MPI communication, the communication efficiency of MPI in the GRAPES model improves by optimising the Collective Sync, Allreduce, and Wait algorithms of collective communication on the KUNPENG platform. (5) Through top-down analysis, it is found that the computing bottlenecks of the two models on the two platforms are mainly concentrated in the back-end CPU bottleneck and the back-end memory subsystem bottleneck. Thanks to the optimisation of multi-memory channels and the Bisheng compiler, the memory access efficiency, branch prediction rate, and cache hit rate of the GRAPES model on the KUNPENG platform are higher than those on the X86 platform. In addition, from the perspective of memory subsystem bottleneck information, TLB Miss and L1/L2 Miss are generally low, the memory access efficiency is high, and the memory access optimisation space is limited. From the perspective of instruction distribution information, the proportion of memory read and shaping instructions is relatively high, and there are certain floating-point instructions, which reflect the high memory bandwidth advantage of the KUNPENG architecture. In addition, the vectorisation instruction is not high, so vectorisation optimisation is considered. (6) From the analysis of hotspots, the GRAPES model is optimised by the GCR algorithm, the collective communication hotspots represented by uct and ucg, the mathematical functions represented by expf and powf, and the hot functions such as malloc memory operations are also optimised on the KUNPENG platform.

    參考文獻
    相似文獻
    引證文獻
引用本文

陳鋒,何明揚,陳曄峰,吳兵成,徐誠. GRAPES_MESO與WRF模式在鯤鵬平臺上的高性能計算特征分析[J].氣象科技,2025,53(3):347~361

復制
分享
文章指標
  • 點擊次數:
  • 下載次數:
  • HTML閱讀次數:
  • 引用次數:
歷史
  • 收稿日期:2024-04-11
  • 定稿日期:2025-01-07
  • 錄用日期:
  • 在線發布日期: 2025-06-27
  • 出版日期:
您是第位訪問者
技術支持:北京勤云科技發展有限公司
午夜欧美大片免费观看,欧美激情综合五月色丁香,亚洲日本在线视频观看,午夜精品福利在线
>