Deep Learning Nanodegree Review
recently i completed an nanodegree from udacity teaching deep learning techniques for various problems (from sentiment analysis, to image classification, to generative models). i wanted to do a quick overview of the course, my thoughts at each stage and a final conclusion with my overall recommendation for whether or not it was worth it.
overall i thought udacity did a good and gentle introduction to the course that anyone with basic programming could pick up. there was a refresher on anaconda, jupyter notebooks, and pandas, all powerful and commonly used data science tools. there was no project for this section but there was an exercise on linear regression, which is in many ways the simplest form of machine learning model that finds linear relationships between data points and outcomes, there were a lot of graphs to help you visualize how the model fits simple linear data and failed regressions on non linear data for contrast. they had a cool exercise where you could download some code and apply style transfers to images, which was a cool way to get students stoked about keeping their pace up.
neural nets -
this section was great review for how to construct a neural net. they had refreshers on matrix math (which are always welcome) and other useful discrete/logic math. the execises in this section were really good at building a solid foundation for understanding nerual nets. they make students implement gradient descent 3 times which is about how many times you need for it to sink in. udacity introduces students to model evaluation (choosing between two or more models) and even had us implement a simple nerual net to do sentiment analysis (with andrew trask who is a fantastic teacher). making students build their nets in numpy is a good way to make sure they understand the underlying mechanics and matrix math that happens from layer to layer in a neural net. the intro to tensorflow was by far the best introduction to tensorflow i’ve seen or done yet. they make you build a simplified version of the tensorflow library (miniflow). the mindset behind graph computation is hard to understand sometimes. implementing forward and backward propagation through the miniflow graph computation made tensorflow seem really approachable and it’s not something i’ve seen anywhere else.
the project here was to implement most of the parts of a nerual network in numpy to predict bike rides. they also had us implement the backpropagation algorithm again. interestingly enough i solidified my understanding of how backprop works by doing this project. a lot of people, even ml engineers, use backprop without understanding what it does very well and this project cures that disease. i actually messed up my nerual net by putting an activation on the last layer, which was wrong because this was a regression problem where you want the raw output of your net (without the non-linearity). in the lessons they didn’t show examples of regression nets (or at least didn’t talk much about the difference) all the nets were classifiers.the fact that students had to make this connection themselves was extremely valuable to drive home what the net is: a model, whose outputs you control. even though i’ve used nets before now i always double check my actications when i’m making my own nets, to make sure the transformations it produces are what i want it to produce.
convolutional nets -
the section on convolutional nets was really good. convolutional nets are basically normal neural nets that map a feature from somewhere in the image to a neuron in a feature map and that the weights bewteen the pixels and the neruon are the values in the filter. it’s like passing a filter over mutliple parts of an image that looks for cretain pixel patterns. i thought i had a good grasp before (and i was using conv nets) and realized there was a lot i didn’t understand as clearly as i thought i did. students will obtain a solid understanding of how a conv net learns visual features, how a net learns more complex features from previous feature maps, and how it takes convolutional features and then learns the final classes. the whole section is taught in tensorflow which is really great. by the end of this section i understood how tensorflow works and how to use it to build nets.
the project here was to make a convolutional net to classify images from the cifar-10 dataset. it wasn’t super interesting but it was good practice in tensorflow and you had to do a little bit of hyper parameter tweaking to get past the grader’s threshold of 50% accuracy (which is better than random because there are 10 classes). my neural net made it in the 50s but my reviewer actually told me to rework my conv net because he thought i could get a better performance. i thought this was cool because it showed us that even when we thought our model was good enough there were other ways to get a higher performance. i ended up with performance of 65% which was a significant improvement on my original submission. overall i was very satisfied with the section and thought it was a good overview of convolutional nets.
recurrent nerual nets -
everything up to this section was technically review for me (even though i solidified my understanding of many methods/tools/details) i was least satisfied with this section on recurrent neural nets (rnns). i found the approach was very high level. i would have liked less content but more in depth explorations of that content (like exploring the stored state values at each step of an rnn). students learn about rnns, lstms (long short term memory) and seq2seq (sequence to sequence) models and implement a few of these models in the excercises at a very high level. students will understand more about rnn’s but may still have a weak understanding of many details. they covered a lot of important topics, especially related to natural language processing (nlp). if you are interested in nlp this is probably going to be the most exciting section. this section also covered word embeddings (which are just reduced sets of weights for each word) as well as how to choose and search for hyperparameters. these were probably the most applicable topics from this part of the course.
this section had two projects, both were generative models (models that make a thing instead of predicting something). the first one was to use tv scripts from the simpsons for generating novel conversations using an lstm network. it was a decent overview of processing raw text data into embeddings and then training a model on these embeddings to find the relationships between them. i very much enjoyed watching my little algorithm generate nonesense conversations between lisa, moe, homer. again the project was all in tensorflow so it was a good overview of some high level tensorflow rnn tools. here is an example script my lstm generated, i’ll let you decide if it’s funny:
moe_szyslak:(snaps fingers, inspired) hey, how about uncle moe's family feedbag? homer_simpson: i want you to meet the springfield. homer_simpson: hey, i created this. barney_gumble: drinks all around! homer_simpson: what's with the crazy getup! moe_szyslak: wait a minute...(to moe) pardon me? i'm all alone! lenny_leonard: it's too late to turn back, moe. we've exchanged for the first time with the world of my life. bart's, why don't you slap him some payback? homer_simpson: revenge? on mr. x were here. moe_szyslak: here you go, homer. a hundred bucks says". moe_szyslak:(furious) you callin' my one of my? barney_gumble: you know, i heard of a new reality show where they...(sobs) lisa_simpson:(bursts in) moe, my family's gone, my dog hates me, and i can't stand do better. moe_szyslak:(annoyed) hey, come on, there's your picture on the front of my youth
the second project was to create a seq2seq model for translating certain types of phrases from english to french. again i felt like this was very high level and i would have really liked more depth. here is an example of a translated sentence:
Input Word Ids: [52, 190, 203, 223, 13, 162, 179, 197, 187, 190, 210, 135, 82, 174] English Words: ['france', 'is', 'never', 'cold', 'during', 'september', ',', 'and', 'it', 'is', 'snowy', 'in', 'october', '.'] Prediction Word Ids: [291, 254, 288, 94, 89, 221, 163, 158, 216, 205, 69, 355, 221, 64, 155, 1] French Words: ['france', 'ne', 'fait', 'jamais', 'froid', 'en', 'septembre', ',', 'et', 'il', 'est', 'neigeux', 'en', 'octobre', '.', '<EOS>']
in summary this section on rnns has good content and touched on pretty much all the core concepts of rnns and nlp but i wish we had gone into more depth about what happens to the ‘state’ inside an rnn at each timestep. nlp is interesting but it’s never captivated me in the same way other topics do. that probably one of the reasons i felt a bit ambivalent towards this section.
generative adversarial networks -
this section was pretty fun. we went over the basic concept of generative adversarial networks (gans), a type of network with two neural nets that compete against each other to get better. we also covered deep convolutional gans (dcgans) and using gans for semisupervised learning. this section touches the bleeding edge of machine learning and that was really neat to see in what is essentially an intro course. the lesson i really apperciated was the one on batch normalization (batchnorm). they showed that that batch norm is effective with graphs comparing model training speed and quality with and without batchnorm. they expalined how batchnorm works at a high level, as well as in depth (batch and population means, variances, gamma, bias, etc). it was really cool to understand this tool.
the project for thes section was to make a gan that would generate realistic human faces. it is very difficult to optimize and tune hyperparameters on gans and i spent a lot of time reading papers and forums to try to figure out the best kernel size to use as well as how deep to make each network and which to make deeper, etc. one thing i found very disorienting is that the loss function loses its meaning in a lot of ways when training a gan. in gan training if your loss goes down, it could be good, or it could mean very little in this training round. the generator loss falls and rises as your generator and discriminator compete against each other to try to generate a fake image or determine if an image is fake. overtime the generator should gain an advantage and start to produce faces your discriminator canot tell apart from a real human face. overall the face genreation gan was an excellenta and challenging final project, a fitting end to a good course.
there was an extra component to the course that covered some topics like tensorboard and reinfocement learning (rl). tensorboard is neat and it was nice to have some videos about it. its a visual graphig library for seeing what your tensorflow graph computation actually looks like and where data goes at each step. its useful for debugging and im glad it was offered as extra course material. the other cool thing in this section was a quick intro to rl with the polecart game from openai. udacity implemented a q-learning network to play polecart sudents can toy with different hyperparameters and network architectures and explore several possible solutions to the game.
the course is worth it if your are a beginner. if you are at an intermediate level and want to review or get more details about deep learning methods the course is also helpful. my recommendation: yes, do the course, it’s worth the money ($300-600 depending on promotions, etc). especially if you are a beginner and you want to start in deep learning. overall i’d rate the course a solid 9/10, this is a very good course and introduction to deep learning.
if you can’t afford it there are free/cheaper options to consider:
fast ai mooc
andrew ng coursera
another udacity ml
another udacity nanodegree ($200)
choosing one of these options may be a good idea but your progress may be slower and you may learn less deep learning than if you did udacity’s deep learning nano dergee.
these are videos by siraj raval, an educator and youtuber, throughout the course who makes really great videos about machine learning. i wanted to focus on udacity’s content in this review. i highly recommend you go checkout siraj’s video series’ as well. all videos are available for free on youtube and he will teach you how to prcatically use many machine learning tools.
the course forums were really useful and a very cool way to interact with students/taechers. i wanted to keep this writeup about the actual course content though.
thanks for reading, if you enjoyed check out my website: