Machine Learning, Reinforcement Learning & Statistics Projects
This page contains descriptions of a collections of machine learning, reinforcement learning and statistics projects I worked on over the course of my graduate studies. This page is a work in progress and will be updated with the different projects codes on github in the near future, as well as links to my R packages for tensor data analysis.
Improving Online Advertisement Sparse Tensors Completion
June 2020 -- December 2020 Python | Matlab
June 2020 -- December 2020 Python | Matlab
- Implemented COSTCO algorithm on advertisement CTR tensor data collected over a 2 months period from a leading internet company.
- Conducted data preprocessing to yield a 1000 x 140 x 3 (User x Ad x Device) sparse CTR tensor dataset with 98% of mising entries and 40% sparsity level.
- Compared CTR tensor entries recovery performance of COSTCO on test data to that of standard completion algorithms which yielded 23% improvemnent in recovery accuracy.
Simulations for Accessing Global Convergence of Policy Gradient Methods in Reinforcement Learning
October 2019 Python | Tensorflow
October 2019 Python | Tensorflow
- Evaluated the convergence performance of two policy gradient methods(Model-free & Mode-based) introduced in Fazel et al (2018) on the cartpole problem.
- Used the cartpole {position, velocity, angle and rotation} as input in the model free case in addition to the action taken at time t for the model-based case.
- Model the ouput in both cases as the probability of the pole moving left or right as output.
- Trained a neural network with 32 nodes and the ReLu activating function and adaptive learning to represent the policy gradient model for the model free method. The
- Used cross entropy loss function of the discounted reward as loss function and chose reward discount rate to cause future rewards to be highly valued.
- Details and simulation codes for this project are available on my Github page here .
Predicting Users Music Sequence using Word2Vec Skipgram and LSTM model
December 2018 Python | Tensorflow
December 2018 Python | Tensorflow
- Used a repertoire of 3888 unique artists and 972 users to predict user playlist sequence after sequence of 29 songs.
- Trained a word2vec skipgram and LSTM model to generate sequencial predictions.
- Project won second price in in-class (CS 573) Kaggle competition among 20+ competing groups.
- Detail about this project and code can be found on my github page here .
Sentiment Analysis of Amazon, IMDb and Yelp Data
October 2018 Python
October 2018 Python
- Performed sentiment analysis on customers and users reviews data from Amazon, IMDb, and Yelp.
- Trained a Multinomial Naive Bayes classifier to distinguish between positive and negative customer reviews.
- Wrote a Python program which reads in reviews, performs data cleaning and feature extraction and for each review decides whether it holds a positive or a negative sentiment.
Detecting Fraudulent Credit Card Transactions using Weighted Logistic Regression
May 2019 Python
May 2019 Python
- Used credit card transactions data from a major bank to train a logistic regression for detecting fraudulent transactions in Python.
- Performed data cleaning and feature selections.
- Applied L2 regularization on logistic regression and adjusted for unbalanced number of fraudulent and non-fraudulent cases in training data using weights to improve classifier performance and reduce bias.
Detecting Quantitative Trait Loci using Bayesian Lasso Hierarchical Model
December 2018 R
December 2018 R
- Implemented an EM algorithm in R which sequentially locates and estimates the magnitude of the effects of 176 markers on blood pressure in mice.
- Performed a permutation test to compute the critical value for the test statistics.
- Used a hierarchical model approach with a non-informative prior on the tuning parameter to implementBayesian Lasso in R which allowed for simultaneously testing the location and effect of all markers at once.
- Used a Gibb sampler to sample from the full conditional posterior of 341 parameters and hyper-parameters.
QTL Mapping of Lipid Profiles in Mouse
May 2018 R | QTL Cartographer
May 2018 R | QTL Cartographer
- Performed QTL mapping to identify Quantitative Trait Loci associated with plasma triglyceride and HDL concentration exibiting a mixture of normal distribution.
- Estimated genetic map in R using the two-point algorithm, Rapid Chain Delineation (RCD).
- Conducted a permutation test to determine the significance threshold for interval and composite interval mapping.
- A full report for this project can be found here .