Reinforcement learning -The Introduction !!

Introduction !

Let’s look at a simple example, You want a playstation and you ask your parents to buy it for you. In reply, They ask you to score good marks in your upcoming tests than they will buy it for you. And you work hard, to score good marks, to get a playstation. This scenario of getting a reward for a task leads to a improvement in the performance of a student by the means of motivation alone. This is the simple concept behind reinforcement learning. When the learning is reinforced outcome is more desirable. We use this concept in the world of machine learning to increase the performance of supervised machine learning algorithms.

Due to reinforced learning we can see self driving cars running on roads(Tesla), programs beating world champions of the game ( IBM’s Watson beating jeopardy’s champion) and real robots working in the factories(Baxter).


UNDERSTANDING – Reinforcement Learning !!

In Layman’s terms reinforcement learning can be described as a learning of particular set of steps to take by mapping different situations to actions, all in an attempt to receive a reward.

Perhaps a simpler example is that of a dog’s. Imagine that you have a dog and you have to potty train him. When the dog poops in your house you reduce its doggy treats(punishment) and whenever it poops in dog house you increase it’s doggy treats(reward).Eventually, the dog will learn that by creating a mess in right place well it gets tasty snacks and vice versa, which will be the reason for it to start choosing those actions that maximize its reward.


In the following example their are three important pieces in terms of reinforcement learning :-

  • The Agent(Dog) :- The agent is the one who observes and then take actions accordingly to increase or maximize rewards.
  • The Environment(House) :- The environment is the whole place of setting from which the agent can observe.
  • The Reward (Doggy treat) :- Reward is the reason of motivation for the agent to learn the best set of actions.

The Elements of Reinforcement learning !!

Between the environment and the agent, there are four main elements that are part of every reinforcement learning system.

  • A Policy :- The policy represents the way the agent behaves at a given time. To be more precise, a policy is a mapping from the environmental states to the actions that have to be taken when the agent is in those states. The policies are the backbone of the reinforcement learning as they alone can determine the agent’s behavior. Policies are often stochastic and can be as simple as a function or represent a complex process.
  • A Reward Signal – This represents the goal of the problem. With each step, the agent is given a reward by the environment. The main objective of the agent is to ensure maximization of the reward in the long run. The reward signal is what defines the good and bad events. To better understand this, think of the reward system as pleasure and disappointment. When we are facing a problem, the outcome of the solution is either pleasant or disappointing.
  • A Value Function – If the reward signal determines what is good at the moment, the value function indicates what is good in the long run. To be more precise, the value of one state equals all of the rewards that the agent can expect to receive in the future, beginning with that one state. Values, however, come secondary, as there cannot be any values where there are no rewards. However, that does not mean that if a state always indicates a low reward, it has to have a low value by default. The state can be followed by other states that have high rewards which will then yield in high value as well. Obviously, this can also go the other way around.
  • A Model of the Environment – This is what usually mimics the environment’s behavior. These models are used for planning since they can predict the next state and reward. They help us decide on a course of action by taking into considerations all of the possible outcomes before we actually experience them. The model can be for low-level and trial and error type of learning, or  high-level and deliberative planning.MLSS-2012-Doya-Neural-Implementation-of-Reinforcement-Learning_016

Reinforced learning vs supervised and unsupervised learning !!

In both supervised learning and reinforcement learning the objective of the machine is to map the series of input to the most appropriate outputs but in supervised learning some training data that is the already known pairs of inputs and outputs are present. Machine is provided a teacher(Training data) to learn the mapping between the inputs and outputs.

While in the case of Reinforcement learning machine learns to map the inputs and outputs without any teacher(training data). In Reinforced learning machine learns the certain steps to maximize rewards by the method of “Hit and Trial “.


In simple in supervised learning machine learns by a teacher while in reinforcement learning machine learns by experience.

On the other hand, Both Unsupervised machine learning and reinforced learning techniques learns without a supervisor or a teacher, but it is different in the case of unsupervised learning in the way that it strives to find the structure hidden within the data, whereas the reinforcement learning methodology is all about maximizing the rewards.

Applications of Reinforcement learning !!

Some of the practical applications of reinforcement learning :-

1. Manufacturing

In Fanuc, a robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object and gains knowledge and train’s itself to do this job with great speed and precision.

Many warehousing facilities used by eCommerce sites and other supermarkets use these intelligent robots for sorting their millions of products everyday and helping to deliver the right products to the right people. If you look at Tesla’s factory, it comprises of more than 160 robots that do major part of work on its cars to reduce the risk of any defect.


2. Inventory Management

A major issue in supply chain inventory management is the coordination of inventory policies adopted by different supply chain actors, such as suppliers, manufacturers, distributors, so as to smooth material flow and minimize costs while responsively meeting customer demand.

Reinforcement learning algorithms can be built to reduce transit time for stocking as well as retrieving products in the warehouse for optimizing space utilization and warehouse operations.


3. Delivery Management

Reinforcement learning is used to solve the problem of Split Delivery Vehicle Routing. Q-learning is used to serve appropriate customers with just one vehicle.


4. Power Systems

Reinforcement Learning and optimization techniques are utilized to assess the security of the electric power systems and to enhance Microgrid performance. Adaptive learning methods are employed to develop control and protection schemes. Transmission technologies with High-Voltage Direct Current (HVDC) and Flexible Alternating Current Transmission System devices (FACTS) based on adaptive learning techniques can effectively help to reduce transmission losses and CO2 emissions.

Applications of Reinforcement Learning are highlighted for three research problems in power systems.

First, Reinforcement Learning is used to develop distributed control structure for a set of distributed generation sources. The exchange of information between these sources is governed by a communication graph topology.

Second, an online adaptive learning technique is used to control the voltage level of an autonomous Microgrid. The control strategy is robust against any disturbances in the states and load. Only partial knowledge about the Microgrid’s dynamics is required.

Finally, Q-Learning with eligibility traces technique is adopted to solve the power systems non-convex Economic Dispatch problem with valve point loading effects, multiple fuel options, and power transmission losses. The eligibility traces are used to speed up the Q-Learning process.


5. Finance Sector

Pit.AI is at the forefront leveraging reinforcement learning for evaluating trading strategies. It is turning out to be a robust tool for training systems to optimize financial objectives. It has immense applications in stock market trading where Q-Learning algorithm is able to learn an optimal trading strategy with one simple instruction; maximize the value of our portofolio.

This way anyone who is able to get his/her hands on a Q-Learning algorithm will potentially be able to gain income with worrying about the market price or the risks involved since the Q-Learning algorithm is smart to take all these under considerations while making a trade.


6. Deep Mind

Deep mind is the Artificial intelligence based gaming software of google which utilizes the technique of reinforcement learning to make a machine learn to play any Atari game by providing all the slides of the video played by an efficient player as a environment to learn from and the subsequent keys to press as the actions to learn.


7. Self driving cars

Self driving cars of tesla uses reinforcement learning technique to maximize the safety and minimize the risks. Collisions reduces the rewards and safe driving increases the rewards.



NEXT –>  A simulated environment and its implementation by python code .




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s