Category Archives: Machine Learning

Model calibration plot

I finally read Nate Silver’s The Signal and the Noise. At the time of its release in 2012, it was a rather unique book. It discussed statistical modeling, Bayes theorem, and the art and science of predictions in a way that the general public could follow and understand. A book ahead of its time and it has held up nicely when I read it in 2019.

One of the things the author talks about in the book are weather predictions, and in the chapter, he has a short mention of model calibration plots. “Calibration plots? That looks useful!” I thought and my interest had piqued enough to try it on some of my own models.

When evaluating models we run into mentions of accuracy, f1 score, or confusion matrixes. Calibration is not something I see too often and it turns out it’s a pretty good view into how your model is performing.

In general terms, calibration is a comparison of the confidence of your model with the actual results. If a model is 90% confident in the prediction, what’s the percentage it is actually correct? Does it have “blind spots” where it is a model is overconfident consistently? Calibration plot can help you spot such trends.

The method to calculate it is pretty straightforward. Here is a snippet of code that illustrates the approach:

predictions = model.predict(X)
probabilities = model.predict_proba(X)

calibration_map:Dict = {}

for idx,val in enumerate(predictions):
	true_outcome = y[idx]
	predicted_outcome = predictions[idx]
	confidence = float(max(probabilities[idx]))

	calibration_key = int(confidence * 100)
	# use 5% increments for calibration values (50, 55, 60, etc)
	calibration_key = calibration_key - (calibration_key%5)

	if calibration_key not in calibration_map:
		calibration_map[calibration_key] = (0, 0)
	
	wins_losses = calibration_map[calibration_key]
	if predicted_outcome == true_outcome:
		wins_losses = (wins_losses[0] + 1, wins_losses[1])
	else:
		wins_losses = (wins_losses[0], wins_losses[1] + 1)
	calibration_map[calibration_key] = wins_losses

with open("calibration.csv", "w", newline='') as o:
	writer = csv.writer(o)
	writer.writerow(["index","real","model","number_of_games"])
	for pct in calibration_map:
		wins_losses = calibration_map[pct]
		number_of_games = wins_losses[0] + wins_losses[1]
		true_pct = wins_losses[0] / number_of_games
		true_pct = int(true_pct * 100)

		# don't bother with small sample size
		if number_of_games > 20:
			writer.writerow([pct,pct,true_pct,number_of_games])

What we are doing above is running through model predictions. For each prediction, round down to the nearest 5% interval and note the outcome of his prediction. Tally # of correct vs incorrect and you have the accuracy % for each interval. I output this into a CSV to later render with pandas:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()

df = pd.read_csv("calibration.csv", index_col="index")
df.sort_index(inplace=True)
df.loc[:,["predicted","actual"]].plot(figsize=(15,10))

Once you run this, you should see something like this:

This is what I see when I do a calibration plot for my NBA model for 2018-2019 games. You can see how the actual values are pretty close to what it thought it should be with biggest drift around 50% and 85% prediction interval. Pretty cool!

What are the odds of …. ?

My mind is kind of blown right now. Need to put this into words and write it down. Perhaps others have some input to validate or contradict what I am saying.

I was studying probabilities and going through a topic of binomial distributions. I finished the lectures and went online to see how this information could be useful in the real world. Textbook examples are all about coin flips and I had a hard time relating the information to the real world scenarios. How often do you need to know the probability of tails coming up twice in three flips? Why does this binomial distribution topic matter?

A cursory search brought more coin examples. The real stuff was starting to show through here and there, starting with this article. Yet nothing that I could relate to or apply directly. Until it hit me: binomial distribution could help explain some of the NBA betting behavior I have been seeing! Let me backtrack to how I got there.

Background

I’ve been running an experiment to see if it would be possible to come out cash positive when betting on NBA games using wagering lines provided online. A short answer to that is a “no”: gambling institutions are sophisticated money-making operations that have no interest in losing money to a common man. Yet, the temptation to find an edge is strong, if you combine that with machine learning and stats analysis, one can’t help but try.

For three months now I simulate NBA bets against the real lines and record what happens. The first two months resulted in a negative balance. Last month+ has been positive. My initial approach that was losing money was strictly using a machine learning model I built and I made bets using moneyline odds. It was a good experience in that I learned that favorites in NBA lose much more often than a casual fan realizes. I also learned a ton on how simple ML models won’t cut it against skewed odds that oddsmakers setup against you, simple stab won’t work.

Bad results did not discourage me from experimenting, and I decided to see what would happen if I used the model only as one of the signals, and then relied on my own intuition as I watch the games and keep up with the teams. I no longer did “moneylines” and instead picked “the spread” bets that have payoff anywhere from -115 to -105. I also decided to only bet up to three games per day regardless of the number of matchups in the day. The three game choice was random, but I figured it was a number that forced me to think about the choices harder and pick the games I had the most confidence. With the spread odds, picking two out of three correctly guaranteed a positive night if you bet equal amounts on each game. 66% target rate did not sound too intimidating.

The new strategy has been in place for most of December and all of January. During this time I am winning ~62% of bets and slowly making up for the losses encountered during the first two months. 62% is close to 66% I was hoping to achieve. I track each bet, and the average odds come out to be ~-114. Basically for $10, you win $8.77 ($10 * 1000 / 114). Expected Value with params like this is positive: 0.62 * 8.77 – 0.38 * 10 = 5.43 – 3.8 = 1.63. All of this is assuring. I still can’t explain the winning rate with formula since intuition is a big part of it. This is concerning since I could be just hitting a lucky streak but for now, I am continuing with the approach.

During all of the time that the strategy has been running, I’ve had two questions in the back of my mind. My most common outcome when betting three games has been two wins and one loss. Kind of makes sense since in aggregate I am hitting about 62% win rate. But I wondered “what are the odds of hitting all three correctly? what about two, and one, and zero?”. And also wondered if I should continue to make three bets a day or increase it. I did not spend too much time trying to answer those questions and instead focused on getting better data, etc.

Aha! moment

With all the background info out of the way, here is the big realization of the day: betting on games is exactly where the field of binomial distribution can be applied. The outcome is binary: you either win or lose the bet (I consider push a win). A historical probability of winning is known from your past results, 0.62 in my case. If you expect to maintain this rate, you can calculate the probability of hitting all three, two, one, and zero bets when you make three picks using the binomial distribution formula! Like it or not, you are rolling a weighted dice three times!

Here is the formula for calculating the probability of each specific outcome. Let “n” be the number of bets, “k” be the number of wins, probability of success be Ps then the probability of “k” wins out of “n” bets can be calculated using this formula:

n! / ((n-k)! k!) * (Ps) ^ k * (1-Ps) ^ (n-k)

For my example of 3 picks and 0.62% of success rate:

3! / ((3-k)! k!) * (0.62) ^ k * (0.32) ^ (3-k)

And here are the results:

# of winsprobability
3 wins23.8%
2 wins43.8%
1 win26.8%
0 wins5.5%

Two win outcome is the most probably, no wonder I am seeing it the most! I have hit zero win days, but those are rare, and that fits in line with a 5% according to the probabilities above. I should be seeing three-win days more frequently than I am but I am extremely excited at the moment to know what is expected.

I was happily going through binomial distributions topic for two days, searching for real world examples, and only after a repeated push to find one did I realize that I am living this binomial distribution outcome almost every day for the last three months!

I can now only answer the question if I should stick to three-game bets or increase this further. Let’s set the n=4, what do we get? Here is the new table:

# of winsprobability
4 wins14.7%
3 wins36.2%
2 wins33.3%
1 win13.6%
0 wins2%

In aggregate, I will end up positive for the day only if I win three or four times out of the four picks. Combined probability of that is 50.9%. Compare that with the three-bet approach, where the likelihood of coming out positive for the day is combined probabilities of hitting two and three right, which comes out to 67.6%! Should I increase the number of games I bet? The math says: no.

This realization was hugely helpful in explaining the results I was seeing. It also convinces me to stick with the current approach of three bets. I am still not clear if my 62% hit rate is blind luck or if it’s something sustainable, so I will continue to probe further as I learn and run this experiment.

Predicting sports game outcomes

NBA model performance for the last month

Being a sports fan and someone who uses machine learning, it is just a matter of time before you will attempt to use the skills to predict the outcomes of the sports games. The temptation is too great: alure of making money on the side while bettering your skills combined with working on something that’s interesting outside of writing software is a recipe for action. The number of articles like this is huge. My goal is to share the lessons I learned and offer tips for others that might want to do a similar project in the future.

It was a fall of 2018 and I was a complete newbie when it came to sports wagering. I knew nothing about sports betting, lines, spreads, etc. It was gibberish that I heard on sports podcasts that I found amusing but never really dug in or cared too much about.

The question of “could I build a model that gave accurate predictions” kept on coming back ever since I picked up machine learning two years ago. I’ve built numerous models for work and for play since and I tossed the idea around but never executed on it. It kept on nagging at me until one weekend in the fall of 2018 I decided to jump in.

It took me about two weeks to gather the basic stats and build training database and start building models. I then spent the next two weeks simulating bets against real lines posted online looking for the winning strategies. Basically, a month (I worked on this during the weekends only for the most part) before I settled on a model I liked and evaluated against the real wagering numbers week to week.

Outcome

Let’s me summarize what was accomplished before I dive into details as to how things were done and what was learned:

  • I built an NFL prediction model that predicted 68% of the games correctly for the 2017 season, and 65% for the 2018 season. Not that impressive, some of the models out there that I had seen were achieving 70-73% rate, but good enough to make money if you used it to wager on a certain class of games.
  • I learned a ton about financial modeling and dissected two financial modeling books, treating NFL games like futures one could purchase and sell. I ended up with a system in place that allowed me to experiment with strategies, evaluate the results, graph the outcomes and see what works and does not work.
  • I learned a ton about gambling and how the system is basically very skewed against the gambler, and how gamblers don’t care and go for it anyway. Hey, sometimes you just want to have fun.
  • The final setup that was built was robust enough to be reused in predicting NBA games with similar overall accuracy but not as useful as the NFL model if one tried to use it for wagering.

Again, 65% might not sound that impressive, but the devil is in the details. I had some models that resulted in %69-70 accuracy but were not that useful from a money-making perspective. That’s the beauty of a real-world exercise vs a classroom problem. If you are working on a hypothetical model that will never be used in real world, you will do all kinds of crazy optimizations to achieve the best overall accuracy. But when you go out and try to use it against real-world data that’s available to you, the best accuracy model might not be the one that makes you the most money. More on that later.

Key Lessons learned

  1. As is usually the case, the machine learning part was a small part of the project. The amount of time spent on model type selection, feature selection, and training is small compared to the time you have to spend on gathering the data, setting up systems to fetch the results automatically, pre-processing it to fit your model needs, and the amount of time you spend on looking for winning strategies.
  2. You can’t predict randomness and searching for perfect accuracy is a futile attempt that probably ruins many that attempt this exercise. I could have spent time in trying to optimize the model and gather more data to perhaps bring up the model accuracy to the %70+ range but settled on something that was good enough to give me %28 profit if the money was spent on real wagering lines.
  3. Evaluate, evaluate, and evaluate your strategies. After using the model for the 2018 season and looking back at the results, I have a feeling I got lucky and should spend more time on the evaluation of my model and more plotting of its pasts results that would give me more confidence of its performance. I went with a bit of blind faith and gut feeling that the model was good enough but retroactively was able to prove out that indeed it was good enough.
  4. Reading financial machine learning books gave good ideas in the system setup and evaluation. “Building Winning Algorithmic Trading Systems” was fantastic in that regard, and the same can be said about “Advances in Financial Machine Learning“.

Future Posts

I want to dive in deeper into how things were done and some key ideas that were used in building a system to predict the games. In the next coming weeks, I will publish more details about the process that was used to build the model. Then will dive into the model evaluation, the most critical step, if you ask me and then share more lessons and tidbits of knowledge gathered. Depending on how much data I have gathered by then, will also share the results of the system and how it is doing for NBA game predictions.

Artificial Intelligence Engineer Udacity Degree – Finished

Happy to say that earlier this month I have graduated from the Udacity’s AI nano degree! It was the hardest course I have taken since college, which was some 15 years ago. At the same time, it was a fun challenge that I am happy to see reach a successful end.

I want to quickly summarize the parts of the course I enjoyed the most. I really had two subjects that were awesome: game playing AI and deep learning.

We learned game agent AI topics such as minimax, alpha-beta pruning, and various other tree/graph traversal techniques. It was awesome, enjoyable, just immersed myself during that part of the course. Search algorithms kept me busy and tinkering for weeks. A*, DFS/BFS, the basics that I had to re-learn and happy that I did. The game playing agent project was cool to go through and actually see your agent in action was rewarding.

The second term introduced neural networks. I could not recommend enough the neural network (NN) lectures. I went from zero knowledge of NN to somewhat being able to grasp the basics in a very short time and was able to build several models that could return decent predictions in their domain.

Would I recommend this course overall? It depends on your learning goals. If you want to get an introduction to a wide variety of topics that AI encompasses, this course is great. The field itself is HUGE. I took it to find out what the field is all about and equally important challenge myself to see if I could finish an intense course like that. I will say I am happy that I did.

But if you want to learn about ML/NN, this course contains too many topics outside of the ML field. Planning, HMM, and a few other topics were quite out dated and more important from the historical perspective. They took up a lot of time that I wish I would have spent doing ML projects instead.

What’s next? I have enrolled in Udacity’s Machine Learning Nano Degree! It will allow me to close up some of the knowledge gaps I have around machine learning, also give me an opportunity to continue practice machine learning on more projects that come with the degree. In addition to that, I am looking for ways to apply ML at work and have a few areas that I am trying it out already. Just keep on doing work and gathering the practical experience.

AI Nano Degree update – Term 2, almost done

I am loving Term 2. It starts with the basics of neural networks and the introduction to Multilayer Perceptron networks (MPLs) where you build a basic neural network that you can train and use to predict either categorical (classification) or numerical values (regression). It’s power and limitations are presented through very fun exercises involving analyzing IMDB and image data.

MLPs are followed by Convolution Neural Network (CNN) introduction, by far the most fun I’ve had in this course. CNNs are mostly used for image recognition and you get to experience building image recognition networks while working on a project to classify dog breeds from a set of provided images. Really cool and fun project. It barely scratches the surface of what’s possible but you get a very nice intro and then you can branch out on your own.

CNN intro was followed by the Recurring Neural Networks (RNN). The application here focuses on language and character prediction tests with the project asking to implement a model that is capable generating English language words. It was really cool how effective the model was and how it was able to generate English words after just a few hours of toying around with training and parameters. This article http://karpathy.github.io/2015/05/21/rnn-effectiveness/ contains really cool info on RNNs.

After these topics, you reach a point in the degree where you have to select a specialization. I went with Computer Vision as that is the area that I find really fascinating and exciting. The other choices are Natural Language Processing (NLP) and Voice User Interfaces. NLP is probably the most useful but I am trying to stay away from it as everything that has to do with “language” I find very boring. It’s something that I will get over but right now I want to just have a lot of fun and learn and I know Computer Vision will provide me with both.

The plan right now is to finish up the lectures, get the project done (I hear it’s short, a weekend type of endeavor) and I am done! Can’t wait to graduate!

AI Nano degree update – Finished Term 1

With much relief, I am happy to say that Term 1 (out of 2) is now complete. Since my last update, I went through two more projects. How did it go?  Overall satisfied with how much I learned but the ending was rather boring and I couldn’t wait to complete the assignments and move on to the next term.

Project 3: Implement a planning search sounded very excited. Planning AIs is a big part of AI systems but the way the topic was presented in the class was just absolutely terrible. It was presented in a strict academic fashion and I walked away from the project feeling very dissatisfied. I spent most of my time not on learning and experimenting with the concepts but deciphering rather confusing code setup and data structures from AIMA code. The actual solution was a very straightforward copy/paste exercise that taught you close to nothing.

After thinking about this more, I don’t per se blame the materials or the Udacity. The issue here is that the subject is very broad that doing it as a 3-week exercise most likely will leave you dissatisfied no matter how you present it. To do a full project from the ground up, coming up with planning language, possible states, etc is an involved exercise. That’s where I think Udacity was better off leaving this as an optional part of the course and spent more time on search techniques or tackled more advanced adversarial search techniques like Monte Carlo Tree Search.

Project 4 was not that much different in how much it disappointed. The topic itself once more very sophisticated and exciting: HMMs. I am not sure how much I learned from the lectures and instead relied on other resources to start to understand how would one go about using HMMs. After doing the project I realized that the lectures were incredibly disconnected from the project. One could do the project without looking at the lectures at all. The videos for part 4 should just be scrapped altogether. They add no value.

 

I don’t want to end on a negative note since overall I enjoyed Term 1. The first two months were incredible, exciting and I learned a lot. The second project especially, search techniques for solving problems, was by far my favorite and one of the main reasons why I took the course.  The udacity’s system of slack rooms and office hours combined with the projects were really useful for learning and I am glad to be able to continue on.

For now, I am taking couple weeks off until Term 2 begins. Term 2 focus on more practical uses of AI and deep learning, culminating in students selecting a single specialization out of these three areas: voice user interfaces, natural language processing, and computer vision. I am leaning towards computer vision as I have always wanted to learn more how such systems operate. Although I do wonder if voice user interfaces would be more useful from a career perspective. Perhaps I will do image processing and then do voice user interfaces specialization as an add-on. Let’s hope Term 2 starts with a bang like Term 1 did and stays that way!

AI Nano degree update – Project Two

This is the second post in the Udacity’s AI Nano degree series. Taking the course is my attempt at learning and re-familiarizing with AI / CS concepts.

Project #2 tasked the students to implement an AI agent that is capable of playing a game of isolation. The only exception from the usual isolation game was that the moves had to follow a chess knight’s pattern. The course provided the board implementation and we had to write AI agent that is capable of searching for optimal moves through the board with the goal of defeating an opponent.

Sample isolation board. Visited squares are in the grey/black shade.

Initially, the game itself was not something to get too excited about. But when the implementation of algorithms started that’s when the fun picked up! Let’s start at the high-level definition of a problem. Given a board, it is not obvious what should the next move be in order to guarantee a win. One could run simulations and try to find it, but the search space can be too large even for the small boards (7×7) making it impossible to solve by a brute force alone. Instead, AI agent should focus on optimizing how it traverses the game tree by doing two things:

  • Come up with a way to evaluate the score for the board positions. A winning board has a score of infinity, a losing board has a score of negative infinity, anything in between should have a score that correctly determines how favorable the board is for winning or losing.
  • Iterate the possible solution tree in a way that is fast. You want to throw out the boards that are unfavorable to you, or select ones that were known to benefit you.

For the first problem, coming up with a way to value a board, means defining a heuristic function for the board position. The strength of the heuristic function is the difference maker in the AI agents. It requires a balance between being accurate but also fast to compute so that your algorithm can evaluate many boards during a single turn. If you make a heuristic function that is too complicated, then AI agent will be slow and with certain rules where time limits are enforced will lose.

For the second problem, iterating the possible solution tree, there are approaches such as minimax, alpha-beta pruning to optimize your move selection. Mix in iterative deepening search and you got an effective agent.

The heuristic functions I tried out where giving a score to a players position that indicated how many open moves the player had vs the opponent (the more moves over the opponent you have the better) COMBINED with how close the player was to the open positions. The idea here is not to get trapped. Another variation of the above I tried was staying as far away from the walls as possible. That one turned out to be a good heuristic, but not as effective as staying close to the open fields. And lastly I tried one combination where I combine staying away from the walls with staying close to the open fields and the results were still less than just staying close to the open fields.

All in all, it was a successful implementation that beat the baseline score set out by the project’s creators. Now I am trying to decide what to do with the agent and see what can be added to it so that it can participate in the competition against the agents of other students.

Some observations from going through the exercise:

  • Visualization is the king. Visualizing the board positions and move calculations really helped me discover bugs in my implementation. I should just always start with the visualization when working on the problems and go from there.
  • Iterative deepening was somewhat unintuitive at start. It is amazing how much it helps to find the solution faster without going too deep into only certain parts of the tree.
  • Alpha-beta can be a bit confusing at the start and you definitely need many manual/on-paper implementations to see why it is effective.

Some things that I did not implement as part of this exercise that still need to explore:

  • quiescent search. This one is a mystery and something that I will bring up on the course forums. I have read book materials on it and other online resources but something tells me that until I will try to implement it, it will not fully make sense.
  • Monte Carlo tree search roll outs. I really want to implement this one and see how it would make the AI agent better. Seems like it is a big part of any game playing AIs and making them effective.

This was fun. On to the advanced search concepts and pacman lab!

AI Nano degree update: Project One

First project required to write an AI agent capable of solving a Sudoku puzzle. The key goals of the exercise: familiarize students with concepts of constraint propagation and search for solving problems.

I would summarize constraint propagation as follows. In abstract, when you have a function that needs to pick a solution given multiple of choices, it can narrow down the answers by coming up with strategies that eliminate the subset of given solutions until a single solution remains or the number of potential solutions is smaller than the input solution space.

In Sudoku project, the three strategies for eliminating the solutions employed were as follows:

  • straight up elimination – find boxes that are solved, and then remove that box’s value from its peers (rows, cols, squares)

  • only-choice – given a box with multiple possible digits, for each of those digits, see if it is not present in other unit’s of that box thus making that number the only viable choice for the box

  • Naked twins – sometimes you have units that have two boxes with the same possible solutions, so that means those two squares will have either one of those digits. Thus those digits can be eliminated from other units of the box. There are variations of this called naked triplets, and so on. Twins seemed to be the most effective.

Now you can imagine that one can run through the solution elimination sequence in a loop, with each pass applying all the elimination techniques. You stop looping if none of the techniques are reducing the solution space further. You are stalled.

What now? Well, brute force search. Pick un unsolved box, and iterate through its solutions, in each pass applying the constraint elimination sequence until you either solve the problem or you stall. If you stall – the initially picked solution is not a good choice, move on to the next until solution is found.

The project was a very fun exercise. I never was much into Sudoku before so at the very least it gave me an excuse to try the puzzles out. It quickly became a fun exercise of finding patterns. And once you mix in writing Python code to solve the problem automatically, it was nothing but a delight.

Here is a screenshot of AI agent in action:

AI agent in action solving a sudoku puzzle
AI agent in action solving a Sudoku puzzle

To generalize the idea: when you have a function which has a set of possible solutions to choose from – go ahead and think through how you could constrain possible solutions to reduce the search space. Then brute force search through each until the answer is discovered.

Besides a great warm up into search, the first project also gave me a great intro into Anaconda, “Leading Open Data Science Platform Powered by Python.” Think of it as a Python environment that is loaded with data science libraries and tools. If that is not enough, it can “containerize” your Python environments that are entirely isolated and across machines/platforms. You can setup a Python 2 environment and Python 3 environments, load it on the same machine, and neither will impact each other. And again, not to mention that it comes pre-installed with a variety of data/ml related packages.

On to the second project, which goes much deeper into general AI ideas around search and advanced game-playing techniques. I am done with that project two and should have a write up for it shortly.

Machine Learning update – Feb 2017

In the last update on my machine learning journey, I had just finished the Udacity’s intro and started with the Coursera / Stanford Intro to Machine Learning. I am happy to say that this course is now complete as well!

It feels slightly surreal to reach this point. When I first setup my plan for ML, Coursera’s course was something I had marked as being challenging and a “maybe if time allowed.” The reviews and the feedback mentioned how great the course was, but also many people seemed to drop off at the neural network chapters. Essentially I had my doubts about being able to finish it on time while doing it part time. There is no time limit to the course, and you could transfer to the next cohort, but I wanted to make sure I did it in the same session I had started. Once you start delaying an online course, there is a chance you will delay it indefinitely.

In retrospect, the course was indeed challenging but not as bad as I expected it to be. The hardest part was to get comfortable with Octave environment and translating lecture notes and formulas into matrix equivalents. I am quite happy that I stuck to the end, and with 100% grade to boot.

If I had to compare the intro courses from Udacity and Coursera, I would still recommend Udacity to start and then use Coursera to augment and deepen the understanding of the basics. I had quite a few “aha!” moments when taking Coursera’s course, but Udacity makes ML more practical and attainable. I thought it demystified the Machine Learning field. After taking the course, you see the application opportunities and the landscape which you should further study. Perhaps best is to combine the two classes – they are drastically different – and learn and compare the concepts between the two.

What’s next? Feb 16th I am starting AI Engineer Nano Degree on Udacity. The same feeling again, a bit daunting and challenging. Hopefully, I will hang in there and power through it. I am sure to post the update as I go.

Before the course starts, I am taking a quick detour back to stats and statistical analysis, to make sure I grasp the basic concepts of analyzing data. Trying to go deeper into kernels and data sets on kaggle.com, familiarizing myself with pandas framework, etc. Basically having fun before AI degree ruins it all.

View story at Medium.com

A start of machine learning journey

Early in September, I started taking a course on learning. Essentially it is a course on how to be a better learner. Learning about learning might sound silly, but it was a great course with many great strategies to employ when trying to master new material or acquire new skills.

As part of the course, we had to pick a project that we will use to apply the techniques we were learning. The concept made a lot of sense. In my experience, the best way to learn a practical skill is to combine the theoretical knowledge with practical work, so the project seemed very appropriate.

My project was to take and finish a course on Machine Learning. I knew close to nothing in this area, and it is a field that is hot in software engineering. ML being new to me, it gave me a chance to program using techniques that are completely unknown and that makes things a bit frustrating but also a lot of fun.

The course I went with was Udacity’s Intro to Machine Learning course. So far the intuition to pick that course is proving to be correct. I finished it ahead of my planned scheduled. It took me just a tad over two months, while mostly studying on weekends and occasional early morning.

The most fun part was applying the skills learned from the course at my current job. We do some video processing tasks and such things have been notoriously tricky to estimate how long they will take to complete. With ML, and more specifically regression analysis, it was a breeze to build a model that gave excellent predictions on processing durations. Some of the predicted times were within seconds of the actual times, most within minutes, which was more than sufficient when you consider the processing could last anywhere from 20 to 40 minutes (with some outliers of course shorter or much longer).

My goal was to apply the techniques in some capacity by February 2017, and being able to do that so much earlier was a big boost and motivator to continue going strong. Actually one of the learning course main preaching points was to use what is being learned right away, even if you don’t feel like you know what you are doing. It just strengthens the knowledge and right away deepens your grasp of the concepts that you are learning.

I am highly recommending udacity’s course for the others that might like to start ML journey themselves. It is not very heavy on theory, although one should use the topics presented to dive into more theory online. The examples and mini projects they present are really great, interesting and informative. If you know a bit of python you are pretty much ready to go. Knowing some of the advanced math conceps helps to understand the course better, but it is not necessary in order to use the techniques.

What’s next? I am happy to share that I got accepted to AI nano degree. It does not start until February 2017, so in the meantime, I am taking another ML intro course, this time Stanford’s Machine Learning course which I debated to take before picking Udacity’s option. Stanford course is great, but presents much more theory and is a bit more “drier”, more pedantic. I am on a week 3 now of the course, going ahead as far as I can before Neural Network weeks. That area might prove to be very complicated so it will be good to have as much cushion as possible for quizzes and learning.

 

That’s it for now. I can’t wait to see where this ML journey will bring me. Hopefully, I will continue to deepen and strengthen my practical skills and start applying it in everyday life with regularity.