Touching
Data Science
Currently, I am working on AI powered personal assistant

Below are some things I've already done at MIT
Selected projects
MIT projects for now. Here are the most interesting ones:
Digits recognition
MNIST data set. Single and double overlapping digits
Movies scores prediction
Netflix data set. 1-5 ✩ score per user prediction
AI playing computer game
Model learn text game rules & strategy by playing it
I am experienced with
  • Principle component analysis
  • Convolutional neural networks
  • Gaussian Mixture model (EM kernalization)
  • Collaborative filtering
  • Reinforcement learning (MDP + NN)
Projects 1-2
Digits recognition
Goal: Recognize handwritted digits

Result:
minimal error rate 0.002 for two overlapping digits recognition

Input data:
MNIST Dataset - handwritten single & double overlapping digits
- 60K - training, 10K - testing

→ View the code

Code access

Due to MIT policies, I cannot grant public access to my projects code, but I can show it to my potential employers by request.

Use your business email to request your access password to the code below, please:
Single digit recognition
(manually coded)
Linear regression
(closed form solution)
Test error: 0.7702
Yep. It will be better.
View the code →
Multi class SVM
Test error: 0.0819
View the code →
Multinomial (softmax) regression
with Gradient Descend
Test error: 0.1005
View the code →
Multinomial (softmax) regression with PCA & cubic features
PCA dimensions reduction 784 ⤑ 18
Kernel: $\phi (x)^ T \phi (x') = (x^ T x' + 1)^3$

Test error:
0.08520
View the code →
Convolutional Neural Network
Optimized over:
- baseline (no modifications)
- batch size
- learning rate
- momentum
- activation (ReLU / LeakyReLU)

Test error:
0.9902
Double digit recognition
(PyTorch)
Convolutional Neural Network
(PyTorch)

Test error:
< 0.0200
→ View the code
Project 3
Netflix ratings prediction
Goal: Predict unrated movies

Result: $\Delta \geqslant 2$ error rate 0.17

Input data:

1200 users for 1200 movies
Ratings values $\in {1, ..., 5}$
Value $= 0$ for unrated ones



Description:


I was given a data matrix containing movie ratings made by users extracted from Netflix database. Any particular user has rated only a small fraction of the movies so the data matrix was only partially filled. The goal was to predict all the remaining entries of the matrix. I approached it by building a Gaussian Mixture Model (GMM) for collaborative filtering educating it with Expectation Maximization algorithm.
    Project 4
    AI playing Text Quest
    Goal: Develop algorithm able to learn how to play Text Quest and complete its quests in the most efficient way.

    Result: Efficiency $0.99$ of theoretical maximum

    Methods:

    • Markov Decision Processes
    • Feed forward neural network

    Description

    Text quest. It's space consist of locations. Each location contains set of intractable objects. Agent can interact with objects in his current location or move to other locations. Agent and engine interact with each other via text.

    Each turn:

    • engine provide text description of current location and active quest
    • agent submit action in "action" "object" format, for instance, "eat apple"

    Agent goal - move to specific location and perform specific action to complete quest as fast as possible.
      MDP & linear model Q-value iteration

      Efficiency: 0.69
      → View the code
      MDP & Neural Network Q-value iteration
      (PyTorch)
      Efficiency: 0.99
      → View the code
      (c) 2020 BKMK F15