Eugene Shevchuk MIT Projects

If you're not afraid of your goals, may be they are not good enough.

F15

Touching
Data Science

Currently, I am working on AI powered personal assistant

Below are some things I've already done at MIT

Selected projects

MIT projects for now. Here are the most interesting ones:

Digits recognition

MNIST data set. Single and double overlapping digits

Movies scores prediction

Netflix data set. 1-5 ✩ score per user prediction

AI playing computer game

Model learn text game rules & strategy by playing it

I am experienced with

Principle component analysis
Convolutional neural networks
Gaussian Mixture model (EM kernalization)
Collaborative filtering
Reinforcement learning (MDP + NN)

Projects 1-2

Digits recognition

Goal: Recognize handwritted digits

Result: minimal error rate 0.002 for two overlapping digits recognition

Input data:
MNIST Dataset - handwritten single & double overlapping digits
- 60K - training, 10K - testing

→ View the code

Single digit recognition
(manually coded)

Linear regression
(closed form solution)

Test error: 0.7702
Yep. It will be better.

View the code →

Multi class SVM

Test error: 0.0819

View the code →

Multinomial (softmax) regression
with Gradient Descend

Test error: 0.1005

View the code →

Multinomial (softmax) regression with PCA & cubic features

PCA dimensions reduction 784 ⤑ 18
Kernel: $\phi (x)^ T \phi (x') = (x^ T x' + 1)^3$

Test error: 0.08520

View the code →

Convolutional Neural Network

Optimized over:
- baseline (no modifications)
- batch size
- learning rate
- momentum
- activation (ReLU / LeakyReLU)

Test error: 0.9902

Double digit recognition
(PyTorch)

Convolutional Neural Network
(PyTorch)

Test error: < 0.0200

→ View the code

Project 3

Netflix ratings prediction

Goal: Predict unrated movies

Result: $\Delta \geqslant 2$ error rate 0.17

Input data:

1200 users for 1200 movies
Ratings values $\in {1, ..., 5}$
Value $= 0$ for unrated ones

Description:

I was given a data matrix containing movie ratings made by users extracted from Netflix database. Any particular user has rated only a small fraction of the movies so the data matrix was only partially filled. The goal was to predict all the remaining entries of the matrix. I approached it by building a Gaussian Mixture Model (GMM) for collaborative filtering educating it with Expectation Maximization algorithm.

→ View the code

Project 4

AI playing Text Quest

Goal: Develop algorithm able to learn how to play Text Quest and complete its quests in the most efficient way.

Result: Efficiency $0.99$ of theoretical maximum

Methods:

Markov Decision Processes
Feed forward neural network

Description

Text quest. It's space consist of locations. Each location contains set of intractable objects. Agent can interact with objects in his current location or move to other locations. Agent and engine interact with each other via text.

Each turn:

engine provide text description of current location and active quest
agent submit action in "action" "object" format, for instance, "eat apple"

Agent goal - move to specific location and perform specific action to complete quest as fast as possible.

MDP & linear model Q-value iteration

Efficiency: 0.69

→ View the code

MDP & Neural Network Q-value iteration
(PyTorch)

Efficiency: 0.99

→ View the code