達(dá)飝,達(dá)飝講師,達(dá)飝聯(lián)系方式,達(dá)飝培訓(xùn)師-【中華講師網(wǎng)】
    制造業(yè)/專題課程/專項(xiàng)咨詢
    54
    鮮花排名
    0
    鮮花數(shù)量
    達(dá)飝:AI/ANN -Reinforcement Learning (《人工智能/人工神經(jīng)元—強(qiáng)化學(xué)習(xí)方法解析 [英文授課]》)
    2018-11-13 3092
    對(duì)象
    歐美外資企業(yè)
    目的
    見下文
    內(nèi)容


    《人工智能/人工神經(jīng)元—強(qiáng)化學(xué)習(xí)方法解析 [英語授課]》


    AI/ANN -Reinforcement Learning




    【Background & Goals】

    Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In this course of lectures, reinforcement learning is being saw as approximate dynamic programming, The approach is studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with learning or approximation. In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.

    Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge), The exploration vs. exploitation trade-off has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

    The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP (Markov decision process) and they are able to target large MDPs where exact methods become infeasible.


    【Trainees】

    Programmers and managers engaged in AI/ANN - Reinforcement Learning applications and the managers of the relevant business functions.

    Trainees need to have well-understanding to advanced  higher mathematics.

    (受訓(xùn)學(xué)員必須具備現(xiàn)代高等數(shù)學(xué)良好基礎(chǔ))

    【Timing】 6 class hours (6 Class hrs/day)


    【General Content】

    PART 1  Necessary & Essential AI Knowledge

    PART 2  A smart Robot in a room ——Example

    PART 3  Defining a Markov Decision Process

    PART 4  Monte Carlo methods

    PART 5  RL Substantializing & Strengthening ——Q-learning


    【Detailed Content】


    PART 1  Necessary & Essential AI Knowledge

    1.1 Supervised learning

         classification, regression

    1.2 Unsupervised learning

         clustering, dimensionality reduction

    1.3 Reinforcement learning

         generalization of supervised learning

         learn from interaction w/ environment to achieve a goal


    PART 2  A smart Robot in a room ——Example

    What’s the strategy to achieve max reward?

    What if the actions were deterministic?

    No teacher who would say “good” or “bad”

    Explore the environment and learn from the experience


    PART 3  Defining a Markov Decision Process

    3.1 solving an MDP using Dynamic Programming

    states, actions and rewards

    solution and policy

    Markov Decision Process (MDP)

    maximize cumulative reward in the long run

    Computing return from rewards

    3.2 Value functions

    Optimal value functions

    Policy evaluation/improvement

    Policy/Value iteration


    PART 4  Monte Carlo methods

    4.1 Monte Carlo methods

    don’t need full knowledge of environment

    averaging sample returns

    4.2 Monte Carlo policy evaluation

    want to estimate Vp(s)

    first-visit MC

    4.3 Monte Carlo control

    4.4 Maintaining exploration

    4.5 Simulated experience

    4.6 Summary of Monte Carlo


    PART 5  RL Substantializing & Strengthening ——Q-learning

    5.1 off-policy learning

    5.2 State representation

    5.3 Function approximation

    5.4 Features

    5.5 Splitting and aggregation

    5.6 Designing rewards

    5.7 Case study: Back gammon

    全部評(píng)論 (0)

    Copyright©2008-2025 版權(quán)所有 浙ICP備06026258號(hào)-1 浙公網(wǎng)安備 33010802003509號(hào) 杭州講師網(wǎng)絡(luò)科技有限公司
    講師網(wǎng) m.transparencyisgood.com 直接對(duì)接10000多名優(yōu)秀講師-省時(shí)省力省錢
    講師網(wǎng)常年法律顧問:浙江麥迪律師事務(wù)所 梁俊景律師 李小平律師

    主站蜘蛛池模板: 无码AV天堂一区二区三区| 综合激情区视频一区视频二区| 日韩在线不卡免费视频一区| 亚洲综合无码一区二区痴汉 | 看电影来5566一区.二区| 国产乱人伦精品一区二区 | 亚洲一区精品视频在线| 国产裸体舞一区二区三区| 日韩精品一区二区三区中文| 一区二区三区视频观看| 韩国福利视频一区二区| 一区二区三区四区精品视频| 果冻传媒董小宛一区二区| 一区二区高清视频在线观看| 久久精品无码一区二区三区免费 | 成人久久精品一区二区三区| 亚洲国产一区在线| 国产人妖视频一区二区| 一区二区三区在线免费 | 午夜福利av无码一区二区 | 国产精品无码不卡一区二区三区| 国产高清一区二区三区| 3d动漫精品啪啪一区二区中文| 国产免费一区二区视频| 久久se精品一区精品二区| 亚洲一区二区三区在线观看精品中文 | 精品一区二区三区在线视频观看 | 色多多免费视频观看区一区| 日本在线电影一区二区三区| 国产一区二区三区亚洲综合 | 精品日韩亚洲AV无码一区二区三区 | 日本精品一区二区三区四区| 国产在线观看一区二区三区四区 | 亚州日本乱码一区二区三区| 麻豆精品久久久一区二区| 亚洲av鲁丝一区二区三区 | 一区二区三区四区免费视频| 久久一区二区明星换脸| 亚洲av乱码一区二区三区香蕉| 欧美人妻一区黄a片| 福利片福利一区二区三区|