達飝,達飝講師,達飝聯系方式,達飝培訓師-【中華講師網】
    制造業/專題課程/專項咨詢
    48
    鮮花排名
    0
    鮮花數量
    達飝:AI/ANN -Reinforcement Learning (《人工智能/人工神經元—強化學習方法解析 [英文授課]》)
    2018-11-13 3239
    對象
    歐美外資企業
    目的
    見下文
    內容


    《人工智能/人工神經元—強化學習方法解析 [英語授課]》


    AI/ANN -Reinforcement Learning




    【Background & Goals】

    Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In this course of lectures, reinforcement learning is being saw as approximate dynamic programming, The approach is studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with learning or approximation. In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.

    Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge), The exploration vs. exploitation trade-off has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

    The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP (Markov decision process) and they are able to target large MDPs where exact methods become infeasible.


    【Trainees】

    Programmers and managers engaged in AI/ANN - Reinforcement Learning applications and the managers of the relevant business functions.

    Trainees need to have well-understanding to advanced  higher mathematics.

    (受訓學員必須具備現代高等數學良好基礎)

    【Timing】 6 class hours (6 Class hrs/day)


    【General Content】

    PART 1  Necessary & Essential AI Knowledge

    PART 2  A smart Robot in a room ——Example

    PART 3  Defining a Markov Decision Process

    PART 4  Monte Carlo methods

    PART 5  RL Substantializing & Strengthening ——Q-learning


    【Detailed Content】


    PART 1  Necessary & Essential AI Knowledge

    1.1 Supervised learning

         classification, regression

    1.2 Unsupervised learning

         clustering, dimensionality reduction

    1.3 Reinforcement learning

         generalization of supervised learning

         learn from interaction w/ environment to achieve a goal


    PART 2  A smart Robot in a room ——Example

    What’s the strategy to achieve max reward?

    What if the actions were deterministic?

    No teacher who would say “good” or “bad”

    Explore the environment and learn from the experience


    PART 3  Defining a Markov Decision Process

    3.1 solving an MDP using Dynamic Programming

    states, actions and rewards

    solution and policy

    Markov Decision Process (MDP)

    maximize cumulative reward in the long run

    Computing return from rewards

    3.2 Value functions

    Optimal value functions

    Policy evaluation/improvement

    Policy/Value iteration


    PART 4  Monte Carlo methods

    4.1 Monte Carlo methods

    don’t need full knowledge of environment

    averaging sample returns

    4.2 Monte Carlo policy evaluation

    want to estimate Vp(s)

    first-visit MC

    4.3 Monte Carlo control

    4.4 Maintaining exploration

    4.5 Simulated experience

    4.6 Summary of Monte Carlo


    PART 5  RL Substantializing & Strengthening ——Q-learning

    5.1 off-policy learning

    5.2 State representation

    5.3 Function approximation

    5.4 Features

    5.5 Splitting and aggregation

    5.6 Designing rewards

    5.7 Case study: Back gammon

    全部評論 (0)

    Copyright©2008-2025 版權所有 浙ICP備06026258號-1 浙公網安備 33010802003509號 杭州講師網絡科技有限公司
    講師網 m.transparencyisgood.com 直接對接10000多名優秀講師-省時省力省錢
    講師網常年法律顧問:浙江麥迪律師事務所 梁俊景律師 李小平律師

    主站蜘蛛池模板: 精品亚洲AV无码一区二区三区| 无码精品人妻一区| 一区二区三区AV高清免费波多| 国产色综合一区二区三区| 怡红院一区二区在线观看| 日韩精品无码免费一区二区三区| 国产伦一区二区三区高清 | 色系一区二区三区四区五区 | 四虎永久在线精品免费一区二区 | 69福利视频一区二区| 精品一区二区无码AV| 日韩经典精品无码一区| 老熟妇高潮一区二区三区| 交换国产精品视频一区| 亚洲国产一区在线| 无码国产精品一区二区免费3p| 国产午夜三级一区二区三| 国产精品xxxx国产喷水亚洲国产精品无码久久一区 | 日韩av无码一区二区三区| 高清国产AV一区二区三区| 高清国产精品人妻一区二区| 中文字幕一区二区三区精华液 | 亚洲宅男精品一区在线观看| 久久久久人妻一区精品性色av| 亚洲一区二区三区高清| 久久亚洲AV午夜福利精品一区| 亚洲AV综合色一区二区三区 | 福利一区二区在线| 日韩精品人妻av一区二区三区| 男人的天堂精品国产一区| 日本中文字幕在线视频一区| 国产精品主播一区二区| 国产AV午夜精品一区二区入口| 国产成人精品亚洲一区| 福利一区二区在线| 国产精品久久久久一区二区| 成人精品视频一区二区三区尤物 | 午夜一区二区免费视频| 精品乱子伦一区二区三区高清免费播放| 另类免费视频一区二区在线观看| 高清国产精品人妻一区二区|