李文武a,b,张雪映a,b,Daniel Eliote Mbanzea,b,吴〓巍a,b.基于SARSA算法的水库长期随机优化调度研究[J].水电能源科学,2018,36(9):72-75
基于SARSA算法的水库长期随机优化调度研究
Research on Long term Stochastic Optimal Operation of Reservoir Based on SARSA Algorithm
  
DOI:
中文关键词:  水库调度  随机动态规划  强化学习  值迭代  SARSA
英文关键词:reservoir operation  SDP  reinforcement learning  value iteration  SARSA
基金项目:湖北省技术创新专项(重点项目)(2017AAA132)
作者单位
李文武a,b,张雪映a,b,Daniel Eliote Mbanzea,b,吴〓巍a,b 三峡大学 a. 梯级水电站运行与控制湖北省重点实验室; b. 电气与新能源学院 湖北 宜昌 443002 
摘要点击次数: 309
全文下载次数: 
中文摘要:
      针对水库长期随机调度的维数灾问题,在描述来水随机过程的基础上,提出基于强化学习理论的水库长期随机优化调度模型。采用机器学习中有模型的SARSA算法,且考虑入库随机变量的马尔可夫特性,通过贪婪决策与近似值迭代,调整学习参数,求解出近似最优决策序列。实例分析表明,对比随机动态规划(SDP)方法,SARSA算法在获得高质量解的同时,计算时间约减少41%,该算法高效求解能力与较少计算时长为水库长期随机调度问题提供了一种新的求解思路。
英文摘要:
      Aiming at the problem of the curse of dimensionality in long term random scheduling of reservoir, based on describing the random process of inflow, a reinforcement learning method based SARSA algorithm was applied. The model considered the uncertainty of the runoff which was taken as simple Markov Decision Process (MDP). By greedy decision making and approximate value iteration, the learning parameters were adjusted to determine the near optimal decision making sequence. Compared with stochastic dynamic programming (SDP) method, the example shows that the model based SARSA algorithm achieves a high quality solutions and the computation time is reduced by approximately 41%. Its efficient solution and short calculation time provide a new solution idea for long term stochastic operation of reservoir.
查看全文  查看/发表评论  下载PDF阅读器