Featured

Building a simple chatbot

Chatbot Statistics

As per https://startupbonsai.com/chatbot-statistics/, Gartner predicted that by 2022, 70% of white-collar workers will interact with some form of chatbot by 2022.

Image taken from https://startupbonsai.com/wp-content/uploads/2021/04/2-Chatbot-Statistic-02.png

Customers like chatbots because they give quick responses.

Although I like Cool Technology.

Image taken from https://startupbonsai.com/wp-content/uploads/2021/04/9-Chatbot-Statistic-13.png

Chatbot uses in Customer Support

Chatbots are used in customer support for the following reasons:

First line customer support to know to which representative we need to direct the customer call.
To collect customer feedback.
Help in order confirmation & track shipping.

What is RASA NLU?

Rasa NLU (Natural Language Understanding) is an open source natural language processing tool to convert messages from users into intents and entities that chatbots understand.

What is Intent ?

Rasa uses the concept of intents to describe how user messages should be categorised. Rasa NLU will classify the user messages into one or also multiple user intents.

What is Entity?

Entities are structured pieces of information inside a user message. When deciding which entities you need to extract, think about what information your assistant needs for its user goals.

Intent Entity example

Let’s look at an example of “I want to buy air tickets from Bangalore to Delhi”

We can understand a few basic things:

The person is interested in buying air tickets. (intent)
Source city is Bengaluru (entity)
Destination City is Delhi.(entity)

As you can see that buying air tickets is the intent. Source and Destination Cities are the entities. They are the basic building blocks for most queries.

Some Theory

I have collected a few RASA NLU design pictures to explain how it works they say “A picture is worth a thousand words”. Images given below are from https://rasa.com/blog/intents-entities-understanding-the-rasa-nlu-pipeline/ and https://rasa.com/blog/bending-the-ml-pipeline-in-rasa-3-0/

RASA Pipeline for 2.x versions

RASA NLU Core Pipeline is as shown below

Image is taken from https://rasa.com/blog/intents-entities-understanding-the-rasa-nlu-pipeline/

RASA 3.0 — NLU and Core Pipelines

Their design has changed. This is RASA 3.0 Pipeline with DIET classifier.

Image taken from https://rasa.com/blog/bending-the-ml-pipeline-in-rasa-3-0/

Tokenization and Lemmatization

Features

Image is from https://rasa.com/blog/intents-entities-understanding-the-rasa-nlu-pipeline/

Intent Classifiers

Image from https://rasa.com/blog/intents-entities-understanding-the-rasa-nlu-pipeline/

How DIET classifier extracts intents and entities

What is a Story?

Stories are used to teach Rasa real conversation designs to learn from providing the basis for a scalable machine learning dialogue management.

Creating A Simple Chatbot

Let us create a bot. For simplicity we will just ask a question and let the bot reply “Found intent <intent name>”.

Train questions from Quora kaggle dataset

Let’s create a bot that replies to the following three questions

What are the differences between clients and servers?
What is the difference between a server and a database?
How can I become a data scientist?

I have taken these questions from https://www.kaggle.com/c/quora-question-pairs

Initial Steps

Initialize chatbot as shown below in a fresh directory

$ pip3 install rasa

$ rasa init -v — init-dir /tmp/mybot

Manually Modify domain.xml, data/stories.yml, data/rules.yml and data/nlu.yml

In file data/nlu.yml : Add Intents

Add each question in one intent. We can add multiple questions also in one intent if the have the same meaning but are structured in a different way.

In file data/stories.yml add stories

In this example we have taken single question and answer story. We can add multiple sequential questions and answers also in a story. Add intents and actions in each story as shown below.

Add intents and its actions in each story in stories.yaml

In file domain.yml : add intents and responses

Add intents and responses in domain.yaml as shown below

Final Steps

Validate the files we have modified above. Train a model and start interacting with the rasa shell.

Validate yaml files and train the model and start rasa shell

Test Using Full Sentences

When I type the sentences, it is able to identify the intent as expected.

Pink color ones are the words I typed and purple color lines are the replies.

When we give fill sentences, it is able to understand the intents

Test using just words

When I give individual words, it is able to identify the intent even with the words

When we just give words also it is able to understand the intents

All the files which I used in the demo are available in my github repository here.

References

Featured

Solving Multi Knapsack problem using Linear Programming

Problem Statement

This is a multi knapsack problem. We have to execute a number of tasks on a fixed number of hosts. Duration of each task and the number hosts on which we want to distribute the tasks is fixed (in this case it is let’s say 3). Individual tasks can not be broken into smaller pieces. We need to distribute tasks into these “3” hosts. The aim is to distribute the tasks in such a way that total time to complete all tasks will be minimum.

One of the Solutions can be

Hence total time taken is 9.

Solution

This problem can be solved by Dynamic Programming but let us try solving it using linear programming.

Linear Programming Solution

I have used Pulp https://coin-or.github.io/pulp/ to solve the problem using python. Here is the link to my full code.

First come up with Linear Programming Equations as shown below.

Ideal expected time for all tasks to be executed on each host is (total_time_to_execute_tasks_sequential/number_of_hosts). The number_of_hosts is 3 in our case.

Objective

Aim is to minimize the time tasks took to run on each host minus the ideal time. Objective is to keep running time on each host close to ideal average time.

Here are the additional constraints:

(Variables starting with if_ are of type booleans can take value 0 or 1)

Constraints : Host should not be idle. Each task can be run on maximum one host.

Time taken to run tasks on each host minus ideal time

Calculate absolute difference of deviation from ideal time. Should be minimum as per the objective.

When I run this code, I get this output:

Output shows which task should be run on which host. It matches with our expected outcome of total duration 9.

Conclusion

This problem can be solved by Dynamic Programming but using Linear Programming we can solve problems where even if one or more constraints are not fitting it will try to fit the rest of the constraints.

_{(This post is also available in medium @ https://meenavyas.medium.com/solving-multi-knapsack-problem-using-linear-programming-83b0ac4d53da)}

Featured

Semantic Web and its role in Data Science

The Semantic Web Technology has a lot of The Semantic Web Technology has a lot of potential to facilitate web data to be quickly understood by machines. With original data and a semantic triplet database, one can convert data into useful knowledge. NLP and Semantic Web technologies if combined can provide the capability of processing a mix of structured and large quantity of unstructured data.

What is Semantic Web?

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make internet data machine-readable to be consumed by machines. It is an extension of the WWW. Metadata added to Web pages can make the existing World Wide Web machine readable.

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Over 10 million sites use Schema.org to markup their web pages and email messages

OWL: The W3C Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things.

Ontology: Ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many or all domains of discourse. Ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject. For example, ontology can describe concepts, relationships between entities, and categories of things.

Triple: A triple is a subject, predicate and object (SPO). For example, in Bob plays Guitar, Subject is Bob, Predicate is plays and Object is Guitar. This is the same as an edge in a graph.

RDF: The Resource Description Framework (RDF) is a standard model for data interchange and expressing graph data for the World Wide Web. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (“triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.

Graph resulting from the RDFa is shown below

_{Image is from https://en.wikipedia.org/wiki/Semantic_Web#/media/File:RDF_example.svg}

RDF Schema (RDFS): RDF Schema provides a data-modelling vocabulary for RDF data. RDF Schema is an extension of the basic RDF vocabulary.

RDF Triple store: The RDF triple store is a type of graph database that stores semantic facts.

Semantic Reasoner: A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language.

How is a RDF Triple Store Database more suited than a relational database for storing Semantic Web Data?

In RDF triple store, all entities will be designed as a triple of Subject Property Value or Subject Predicate Object (SPO)

RDF triple stores focus solely on storing rows of RDF triples. In relational Database, it would be similar to

If we model the same Subject Property into a relational Database, we will have a lot of tables with just two columns. It will have a massive number of tables. We can have one table with super sparse 150 columns with 3 entries in it. We ingest from multiple sources with many changing entity types, in SQL you end up with one very wide, very sparsely populated table or too many tables. Also, you end up with dynamic DDL/schemas. That’s why triple stores are more suited for storing and querying this kind of data. Apache Rya is a cloud-based RDF triple store that supports SPARQL queries.

How is a triple store different from a Graph Database?

Graph traversal seems easy but supernodes kill everything.

A “supernode” is a vertex with a disproportionately high number of incident edges. While supernodes are rare in natural graphs, they show up frequently during graph analysis. The reason being is that supernodes are connected to so many other vertices that they exist on numerous paths in the graph. Therefore, an arbitrary traversal is likely to touch a supernode. In graph computing, supernodes can lead to system performance problems.

Property Graphs have not solved this problem.

Triple stores and W3C standard OWL are designed to be ‘internet scale’.

You can just have another relation in RDF. What you can’t do is things like increment a property – you can only set them in RDF. This makes some things difficult, but is why it scales better. In a property database, edges and nodes can have properties built in. In RDF you can just say Person has Age as another relation. The built-in properties and edges are the pain point for performance. If you want to update an edge in a graph, you have to find it, by id normally, then update it. If you want to update an edge in an RDF graph, you just throw a new one in there. Same for insertion, you have to have a way of assigning new ids, which works on one machine but not so well when you go distributed.

Linked Open Data : Linked Open Data defines a vision of globally accessible and linked data on the internet based on the RDF standards of the semantic web. RDF triple store databases are successfully used for managing Linked Open Data datasets, such as DBPedia and GeoNames, which are published as RDFs and are interconnected. Linked Open Data allows for querying and answering queries much faster and for obtaining highly relevant search results.

Thanks to Nigel Brown for insights.

Some Good Tutorials on Semantic Web

Playlist Semantic Web Tutorial: https://www.youtube.com/playlist?list=PLea0WJq13cnDDe8V7eVLReIaOnFztOEAq
Coursera course “Web of Data” : https://www.coursera.org/learn/web-data/home/welcome
Quick Introduction to RDF : https://www.youtube.com/watch?v=zeYfT1cNKQg

References

Featured

Differential Privacy in Machine Learning Algorithms

I was checking out Machine Learning with differential privacy in Tensor Flow at http://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html

Differential Privacy is a framework for measuring the privacy guarantees provided by an algorithm. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data. Learning with differential privacy provides provable guarantees of privacy, mitigating the risk of exposing sensitive training data in machine learning.

_{(Image taken from tensor flow blog https://blog.tensorflow.org/2019/03/introducing-tensorflow-privacy-learning.html)}

A model trained with differential privacy should not be affected by any single training example, or small set of training examples, in its data set. If a single training point does not affect the outcome of learning, the information contained in that training point cannot be memorized and the privacy of the individual who contributed this data point to our dataset is respected.

I took simple CNN code to identify MNIST database of handwritten digits. This uses GradientDescentOptimizer. Also I have used DPGradientDescentGaussianOptimizer and random noise a slightly modified code of what is given in tensor flow examples and compared.

Accuracy and Loss :

After 240 epochs	Training Loss	Training Accuracy	Validation Loss	Validation accuracy	Python Notebook
Without Differential Policy	2.3619	0.0993	2.3579	0.1032	SimpleMnistWithoutDifferentialPrivacy.ipynb
With Differential Policy	2.3638	0.0974	2.3629	0.0982	Classification_Privacy.ipynb

Time:

Training images are 60,000 each of size 28x28x1. Sample Images are given below

Time taken to train normal 240 epochs : 98 seconds

Time taken to train with Differential Policy 240 epochs : 106 seconds

Around 8% more time is taken to train the images with Differential Privacy.

Basic Idea

The basic idea is to use differentially private stochastic gradient descent (DP-SGD), is to modify the gradients used in stochastic gradient descent (SGD). Models trained with DP-SGD provide provable differential privacy guarantees for their input data. There are two modifications made to the vanilla SGD algorithm:

• First, the sensitivity of each gradient needs to be bounded. In other words, we need to limit how much each individual training point sampled in a mini-batch can influence gradient computations and the resulting updates applied to model parameters. This can be done by clipping each gradient computed on each training point.

• Random noise is sampled and added to the clipped gradients to make it statistically impossible to know whether or not a particular data point was included in the training dataset by comparing the updates SGD applies when it operates with or without this particular data point in the training dataset.

BoltOn Privacy adds randomness to weights. Refer https://github.com/tensorflow/privacy/blob/master/tutorials/bolton_tutorial.py

There is a nice video about that in https://towardsdatascience.com/building-differentially-private-machine-learning-models-using-tensorflow-privacy-52068ff6a88e

References

Image Segmentation

In computer vision, image segmentation is the process of partitioning an image into multiple segments and associating every pixel in an input image with a class label.

Semantic segmentation algorithms are used in self-driving cars.

I got intrigued by this post by Lex Fridman on driving scene segmentation. I wanted to see if it works on difficult and different Indian terrain.

So I have created a short video of Tawang, in Arunachal Pradesh India. The video is of duration 16 seconds and it contains around 325 image frames.

Refer this ipython notebook I have used. It is based on tutorial_driving_scene_segmentation.ipynb. It downloads ‘mobilenetv2_coco_cityscapes_trainfine’ model from tensor flow.

Different models in Tensor Flow deeplab are given in the link https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md

MobileNet-v2 based model has been pre-trained on MS-COCO dataset and does not employ ASPP and decoder modules for fast computation. You can choose any pre-trained Tensor Flow model that suits your need.

This model classifies pixels into the following classes

'road', 'sidewalk', 'building', 'wall', 'fence', 
'pole', 'traffic light', 'traffic sign', 'vegetation', 'terrain', 
'sky',  'person', 'rider', 'car', 'truck',
'bus',  'train', 'motorcycle', 'bicycle'

Color code of the classes is shown below

For all the pictures same color coding is used for the same classes. I ran the model on a two images first, here is the output

then I ran the model on the whole video. Here is the link to the output video.

Comparisons of some of the images and their output is shown in the table given below

Original Image	Output Image	Comments
		It has detected jeep properly (in deep blue color) as vehicle. It has detected road as sidewalk (in light pink color)
		It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly. However it wrongly classified edges of the image (in red color) as person.
		It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly.
		A lot of this image is detected as terrain (in light green color) !
		Some part of mountain is detected incorrectly as sidewalk (deep purple color)
		It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly. Its not able to classify the lake or mountain.
		It has detected persons correctly (in red color). Road is detected as road/sidewalk correctly (in pink/purple color). It has misclassified colorful flags on the sides of the image as person.

Thoughts

The model is trained on city images but the input I gave was of a mountainous region. We may get better results if we gave city videos.
We are using 2D segmentation model. In 3D segmentation we will get better results. It will be able to guess the depth.
We are running model per image in the video. We are not taking advantage of information in 2 consecutive frames and correlating them.
Model wasn’t trained mountains or lakes as they were not in the training set output classes.
Input video is not taken from car dashboard as in training set. We may get better results if we do so.
If the objects are far off accuracy of prediction may not be good.
Some roads are not cemented or tarred but are mud roads. And in these places there is no road/pavement distinction. May be the model was trained on better proper roads we would get better results.

References

Anomaly Detection

What is Anomaly Detection

In data science, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.

In the following figure anomaly data which is a spike (shown in red color). But the same spike occurs at frequent intervals is not an anomaly.

There are 3 types of Machine Learning Techniques

Supervised Machine learning
Unsupervised Machine Learning
Semi- supervised Machine learning

Refer https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/ for more details.

Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set.

We will need Unsupervised Anomaly detection when we don’t have labelled data. i.e. we don’t have data with label of when anomaly has occurred.

Different types of Anomaly detection techniques are described below.

A safe bet is to use wisdom of the crowds by using multiple ensemble methods. We can then choose to combine them through majority vote, or union or intersection of the individual algorithms’ verdicts.

Isolation Forest and LoF

This is nearest neighbour based Anomaly detection
sklearn has IsolationForest and LocalOutlierFactor (LoF)
If data is too big, there is an implementation of LoF for spark

‘K’ Nearest Neighbour

This is a Nearest Neighbour based approach
Simply finding z-scores to ‘k’ nearest neighbors and using cutoff of 3 works surprisingly well in practice (though is limited to global anomalies only and can’t figure out local outliers).

One class SVM

Classification based approach
One-class Support Vector Machine (OCSVM), can be used as an unsupervised anomaly detection method.
However, to work well, the percentage of anomalies in the dataset needs to be low.

CBOF (Cohesiveness Based Outlier Factor

It is a clustering based Anomaly detection.

Deep Learning LSTM/Auto encoders

RNN, LSTM (long short term memory), auto encoders Neural network approach
Available in Keras/Tensorflow and other libraries
Typically neural networks need a lot of data

There are some more methods like probability based multivariate gaussian distribution, PCA,t-SNE.

Feel free to walk through my ipython notebook https://github.com/meenavyas/Misc/blob/master/AnomalyDetection.ipynb

In this notebook , I have tried IsolationForest amd Lof. As you can see in the plots given below, points which got high scoring from these algorithms are anomalies.

When we run anomaly detection automatically on streaming data for that we may need infrastructure like Apache Spark.

References

Face recognition – can we identify “Boy” from “Alien”?

The question is can we identify “Boy” from “Alien”?

Face Recognition addresses “who is this identity” question. This is a 1:K matching problem. We have a database of K faces we have to identify whose image is the give input image.

Facenet is Tensorflow implementation of the face recognizer described in the paper “FaceNet: A Unified Embedding for Face Recognition and Clustering”.

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, we can then determine if two pictures are of the same identity. FaceNet is trained by minimizing the triplet loss. For more information on triplet loss refer https://machinelearning.wtf/terms/triplet-loss/

Since training requires a lot of data and a lot of computation, I haven’t trained it from scratch here.

I have used previously trained model. I have taken the inception networks model implementation and weights from 4th course deeplearning.ai “Convolutional Neural Networks” from Coursera.

The network architecture follows the Inception model from [Szegedy *et al.](https://arxiv.org/abs/1409.4842).

More details about inception v1 is in this blog https://www.analyticsvidhya.com/blog/2018/10/understanding-inception-network-from-scratch/

This network uses 96×96 dimensional RGB images as its input. It encodes each input face image into a 128-dimensional vector.

First, for each image of “Alien” and “Boy” (I have taken 52 images of each), I converted them into encoding and stored into a database.

Here is the code that does that

What happens when Alien and Boy will pass through our image recognition system?

For each of the images of “Alien” and “Boy”, first compute the target encoding of the image from image path. Find the encoding from the database that has smallest distance with the target encoding.

If minimum distance (L2 distance between the target “encoding” and the current “encoding” from the database) is greater than 0.7 we assume the face is not in the database.

When Alien tries to pass through our face recognition system

Input Test Image of Alien	Result	Closest image
	Alien
	Alien
	Boy
	Alien
	Alien
	Alien

Note that there is no image in the database like the green eyed image. distance is 0.5105655. So ay be we can keep a cut off at 0.5 instead of 0.7

When Boy tries to pass through our face recognition system

Input Test Image of Boy	Result	Closest image
	Alien
	Boy
	Boy
	Alien
	Alien
	Boy

Results look pretty good.

Summary

We should re-train facenet with Alien and Boy pictures to get better results.
Image dimensions were only 96×96 so that could have thrown a lot of information away
Model was trained on human faces which has different embeddings than cats
I have split database images and final images based on dates on which pictures were taken assuming pictures of the same dates must be similar. On inspection, I found that in the cases were the final images are very different from images we added in database that is they were never seen before, the results are incorrect. This can be fixed by adding more different types of images in the database.

Object Detection Using OpenCV YOLO

You only look once (YOLO) is a state-of-the-art, real-time object detection system.

It applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. It looks at the whole image at test time so its predictions are informed by global context in the image. It is extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.

Non-Maxima Suppression : During prediction time you may have lot’s of box predictions around a single object the non maxima supression algorithm will filter out those boxes that overlap between each other and also some threshold.

I tried object detection on this video.

Download the following files :

yolov3.cfg
yolov3.weight which contains pre-trained weights using wget command as shown below
```
wget https://pjreddie.com/media/files/yolov3.weights
```
yolov3.txt which contains all the class names this library can detect.

Import relevant packages. Add random color to each class which will be used to draw rectangles.

For each image, call processImage function which does the following

Takes image frame as input
read pre-trained model,
Gather predictions
If confidence is less than 0.5 ignore the detection
Apply non-max suppression
Draw boundary boxes
Save the output images with boundary boxes

Now we try to observe a few of these output images

Due to non maxima suppression sometimes if the two cars are in the same area, one gets undetected at times.

Refer my ipython notebook https://github.com/meenavyas/Misc/blob/master/ObjectDetectionUsingYolo/ObjectDetectionUsingYolo.ipynb for full source code.

References and thanks to

Cat face detection using OpenCV

In this blog I am going to explain object detection using OpenCV library.

OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. It has modules like Image Processing, Video Analysis, Object Detection. OpenCV was designed for computational efficiency and with a strong focus on real-time applications.

A Haar Cascade is a classifier which is used to detect the object for which it has been trained for, from the source. The Haar Cascade is by superimposing the positive image over a set of negative images. The training is generally done on a server and on various stages.

Download haar-cascade xml files from link here. Read license terms before downloading, copying, installing or using. You can create your own haar cascade files by looking at the videos here.

I have downloaded two files ‘haarcascade_frontalcatface.xml’ and ‘haarcascade_frontalcatface_extended.xml’

Set tunable parameters like scale factor and minimum neighbors.

I am reading 6 images of cats and 1 image dog. First convert the image to gray scale. Use the above two haar cascades to get coordinates of rectangles where cat front face is located (if any). Plot the rectangles.

When there are some errors retry with different scale factor and minimum neighbors parameters.

As you can see it plots the rectangles around 6 images of cats properly and doesn’t plot anything around the dog face as expected.

Code is in my github repository.

You can also look at yolo object detection blog https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/

Thanks to my models : Alien, Princess, Boy and Lucky

Reference

· https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html#face-detection

· https://opencv.org/

· https://www.quora.com/What-is-haar-cascade

· https://docs.opencv.org/trunk/d7/d8b/tutorial_py_face_detection.html

· https://pythonprogramming.net/haar-cascade-object-detection-python-opencv-tutorial/

· https://github.com/opencv/opencv/tree/master/data/haarcascades

https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/

Code Generation using LSTM (Long Short-term memory) RNN network

A recurrent neural network (RNN) is a class of neural network that performs well when the input/output is a sequence. RNNs can use their internal state/memory to process sequences of inputs.

Neural Network models are of various kinds

One to one: Image classification where we give an input image and it returns a class to which the image belongs to.
One to Many: Image Captioning where input is a picture and output is a sentence describing the picture.
Many to One: Sentimental Analysis where input is a tweet and the output is a class like positive or negative.
Many to Many: Sequence to sequence model with Encoder – Decoder architecture: Language translation model where input is a sentence (let’s say in English) and output is a sentence in another language (let’s say French).

There are two popular variants of RNNs

We should try both to see which one is performing better for the problem we are trying to solve.

In this blog I have tried to generate new source code using LSTM. Here are the steps

Import required packages

Then set EPOCH and Batch size. These should be tuned properly.

In preprocessing stage, I have downloaded Openssl source code from github and concatenated all .c files into a file called “train.txt”. I was getting out of memory so I just took 1/3^rd Openssl files. We can improve this code to load the source code in batches. We have to create a vocab list in preprocessing stage and saving it into a file and reading the file.

I have used character based model. We can made word based model also. We can use word embedding layer also which will be needed when we have more difficult problem sets.

I have used 2 LSTM layers with Dropout of 0.2 each and a Dense in the end with softmax. We can try different models and compare.

Visualize the model as shown below

Training for 10 epochs. As you can see loss is coming down gradually in every epoch from 2.97 to 1.55.

Here is the output it generated. We have given it a random starting point

As you can see it has done a very good job. It has returned values from a function based on if condition and start another function.

Here is the code in github. Please try it out and see.

References