It was December 2018 and I was on my flight to Singapore ??.
We covered ANN for the first 8 days, followed by Big Data in the remaining.
Firstly, we had Dr. Lek Hsiang Hui who gave us an introduction to Data Analytics. It was a really good insight and added value to my understanding of the same concept.
- Explained the different types of Decision Models and gave a few examples of how the data flows between different states, is modified and then a decision is taken based on the output.
- Data mining also was covered with a great flow digram describing each of various stages involved like Data Extraction, Data Cleaning, Data Aggregation, Data Representation, Data Interpretation.
- Basics of R programming and implementing various mathematical formulae was also taught, which included Measures of Location, Measures of Shape, Measures of Dispersion and Measures of Association.
Secondly, we had Dr. Tan Wee Kek who presented the concepts of Machine Learning in a very intuitive manner, and I was able to grasp most of the concepts with ease. Things I understood and implemented:
- Simple and Multi Linear Regression and it’s problems
- Python Data Science Libraries like Numpy, Scipy, Matplotlib and Scikit Learn
- Classification and it’s types: Decision Trees, Bayesian Classifier, Logistic Regression, SVM (Support Vector Machines)
- Clustering: K-Means, K-Mediods, Hierarchical Methods
- Text Mining and KDD using Classification, Association and Clustering
An artificial neural network is designed to function like the neurons in the brain.
Lastly, Dr. Wang Wei introduced us to topics revolving around Artificial Neural Networks. These topics were harder to understand with the level of Calculus involved but Dr. Wei did a great job at teaching us the basics. Here’s everything that I learned and implemented:
- Why ANN: Problems with Logistic Regression
- Back Propagation and Gradient Descent (GD) Algorithm
- Some advanced GD Algorithms like Stochastic GD, Minibatch GD, RMSProp and Adam
- Training Techniques: Random Initialisation, ReLU, Dropout
- Convolutional Neural Networks: Pooling, Padding, Strides and some common CNN Architectures
- Recurrent Neural Networks: Vanilla RNN and LSTM
We had to present our Artificial Neural Networks Course Project two days after our final lecture. Those days went by really quickly with little or no sleep as me and my group mates hustled to finish our project called Quick Draw.
We wanted to make something that has great scope for real-world implementation and helps the society. Our program tracks the strokes of the user and gives an output predicting what the user is trying to draw in real-time.
Quick Draw uses the existing Google’s Dataset of various labeled hand drawn images in the Numpy Array format. We downloaded the data of 4 classes (20,000 images each) and started training it with different algorithms. First we started off with SVM, then K-Means Clustering, then Feed-Forward Neural Network, then Convolutional Neural Network and finally Long Short-term Memory (LSTM). We found out that CNN and LSTM gave us the highest accuracies. So we decided to use those two models and we made a front end for our project using OpenCV library which is basically taking input of our strokes from the keyboard.
- What is Big Data and how it is changing the world
- Problems in Big Data
- Hadoop and it’s features, Hadoop vs RDBMS
- The Hadoop Ecosystem: HDFS, Yarn, MapReduce, HBase, ZooKeeper, Pig and Hive
- Setting up Node Cluster using Ambari Management: NameNode, DataNode, SSH-ing into these nodes without a password
- Setting up HDFS using Shell Commands, Commissioning and De-commissioning, Resource Manager, Scheduler
- HIVE Basic Queries and Data Ingestion mechanisms using Sqoop
- Mapreduce Programming
After the lectures, we had to work on a project in which we set up a three node cluster deployed on a platform and managed using Ambari. We had to do everything from scratch, set up password-less SSH between the three nodes, setup Java, JDBC, Hadoop environments on the .bashrc file, creating a local repository, transferring data from the local repository to HBASE and then performing operations on that Data. We chose the book called ‘Sherlock’ from the Gutenberg Library and performed Word-Count and MapReduce programs on it. Every step of the project was done mostly through shell commands which was really cool and helped us understand the working of Linux commands, when and where to use them.
Thank you for reading!