Survey of Machine Learning In Breadth and Depth: Part 1
This TLDR article is my personal summary of machine learning methodologies targeting on engineers for understanding and practical usages.
The Hype and Messes
Machine learning is not something magical, instead, like anything else, it’s rooted from continuous practicing, evolving and common senses. In general, it’s a way to summarize observations(from data, by algorithms, bottom-up or top-down) into predictions, thus learning by machine is just like human summarize knowledge(by remembering, guessing, inferring or deduction) from readings, watching or experiences.
But different researchers look things from different angels, and they may be from different background(Mathematicians vs. Computer Scientist vs. EE people) and would like to brand their discoveries differently or without knowing similar ones when things are discovered independently(e.g. logistic regression vs. maximum entropy are essentially the same by math, and they are usually used as one of the activation function for final prediction in Deep Learning), there were a big mess on naming which makes learning “Machine learning” hard because of those terminology or methodology islands.
The Master Algorithm did a good but still over simplified summary of those lands from different historic angels, depicted like following graph. You don’t need to remember all those terminologies, just get an overview of different fashions of archiving similar/same goals. This graph split the circle into three inner layers: Representation, Evaluation and Optimization which are three important parts of machine learning, each slice (one out of the four) is the different methodologies named by the most outer layer.
This summary gives a good historic view of the machine learning world.
I’ll use the most unfamiliar Symbolists as an example to explain here. It could be the oldest way, which is the way to use some formal representation (like a special math language) to describe things into logics, then use those logics to do logical computations. This is most like programming languages(actually, we do have a type of its kind: Prolog), but unlike generic programming languages to describe the procedure or functions, the AI language is to describe things/objects and their relationships using symbols which have mathematical definitions of meaning and interactions. e.g. you could define transitive relationships between humans, e.g. parents’ parent is ancestor of its children. animal can eat and drink, human are animal, so they can eat and drink as well, etc. which is looking like inheritance in programming language but those 1st order logics are more expressive than inheritance since it’s designed to describe the world by simple symbol and operations among them just like math, thus it’s a Model can be used for prediction instead of a code module used for execution. A model will have its solver(optimizer) to find the solution from the model and evaluator to find out how good it is performing against truth(golden set). This way is not totally out dated, NLP(Nature Language Processing) still use it a lot, the latest research around Ontology or Knowledge Graph can still be categorized as Symbolists approach, just in limited logics or hybrid ways for resolving.
I hope I didn’t go too far away here. let’s get back to basics first.
Data Preparation
Machine learning needs data. No matter which way you use, get, understand and prepare your data is always the first thing to do.
Different algorithms demands data in different format or form. The major components of machine learning data are a serials of records, with optional labels of each/some records. Label is the prediction target, again, it’s optional, not only optional for training(we’ll talk more about unsupervised), but also optional for prediction(we’ll talk more about regression).
Record could be structured like a list of fields with values(nested structures usually need to be flattened first) or unstructured like a blob of text(e.g. sentence, document) or binary(e.g. Audio, Image). These records usually suppose to be in the same form. If not, you may need to translate them into the same form first. In machine learning world, we normally call record as Instance. Each record is an instance of something we want to learn from or need a prediction from.
To prepare your data, you need to split them for different stage, each split is called a set. a set used in each stage of the machine learning step is called that stage’s set. e.g. Training Set is the split of your data for Training stage.
Ok, we’ll pause here for data preparation, before you understand more, we can’t dive deeper on data preparation, this is just to give you the context about data and its split(set separation)
Basic Steps of Machine learning
Learning is always starts with Training(training could be automatically summarizing from labelled data or being human guided by human’s coding rules or heuristics like what we called Symbolists approach), After training, you want to know how good it has learned, thus do some Test. But test itself is a confusing term, we use evaluation for model tuning targeted tests, but use test for predication of unseen data, thus, Test itself split into two steps: Evaluation and Test. Isn’t it confusing?
A practical full cycle of good machine learning would be an iterative approach:
Training -> Validation ->Adjust ->Training ->Validation->Adjust->Training ….. ->Validation ->Test
This is a typical cycle for build a good enough model with multiple rounds of (Training, Validation and Adjust)s, then finally Test the unseen test set. Then go back to this cycle sometime later to renew the model. We often call
Although many people omit steps by using cross-validation on one dataset, in real world applications, to get competitive performance, above is still over simplified. We could possibly have different distribution(with different characteristics, e.g. General audio clips vs. audio clips inside a car) of data across different datasets and we need to make the splits across those different distributions. Following is a typical way described by Andrew Ng which you can use for cheat sheet(here, the dataset used for validation purpose is called Dev set, people also call it as validation set):
Detailed description is here(pointed the position for this context, but full video is still recommended):
Classification of Machine learning
An intuitive classification from most people would immediately come to: Supervised and Unsupervised learning, where supervise means have prediction labeled for learning purpose. Well, this is a good dimension to start with, but is not the only dimension to do classifications. And as always, even in this dimension, there is cross boundary cases: the so called Semi-Supervised. The Semi could mean a lot of things as well, it could be:
- Use labelled data for training as seeding with supervised learning, then iterate with unsupervised way to process for larger scale data
- Or, Use unsupervised way to prepare labels at scale(produce rough predictions which could be not as good) then do supervised training to get better on predictions.
- Or, Use some model built from other dataset or for other purpose, apply on new data set and do training(also called fine tuning, or sometimes transfer learning).
Of course, the complexity of combination can introduce structure of learning steps, the previous section still oversimplified some approaches which uses a structure for learning, the Training Step itself would be a multi-step learning. Deep Learning is such a thing.
Other than whether it’s supervised or not, we do have other dimensions for machine learning classifications.
By Prediction Type
- Classification: assign labels to each record. could be binary labels( one of two) or multi-class labels(one of many).
- Clustering: optional labels to predict, grouping records instead of assign labels
- Regression: no labels to predict, only predict numeric numbers. e.g. You can use regression to Forecast stock price or your future billing.
- Recommendation: a specialized prediction task, to predict an ordered list from a given record
By ways of learning
This list can go on and on since it’s evolving a lot.
- Deep learning: learning with deep structures, usually neural networks.
- Reinforcement learning: learning from experiencing and rewarding
- Online learning: learning in streaming way, don’t see all the data at once
- Active learning: iterative way of learning by adding only needed training data base on previous evaluation
- Transfer learning: learning from learned model from other domain or data sets.
By Domain applied
This list can go on when we started new domains.
- Nature Language Processing
- Image Processing, Object detection etc.
- Speech recognition
- Automated Driving
- Robotics
- Information Retrieval
- Spam Filtering/Fraud Detection/Anomaly detections
- Recommendation. Yes, this is both an application type and prediction type
By Algorithm Used
This is less shared way for classifications, we can treat it as algorithm classification instead. put here since people may refer them directly as categories. Could be about realm or methodology just like the “The Master algorithm” introduced categories as well.
- Bayesian learning: e.g. Naive Bayes, Bayes Network
- Genetic Algorithm based learning
- Neural network based learning
- Vector machines: e.g. SVM
- Tree based learning: e.g. random forest, decision tree
- Meta learning: boosting or bagging , use weaker learning to train better model
- Instance-based learning: e.g. Clustering
- Liner or non-liner learner: e.g. linear regression, logistic regression
- Discriminative vs Generative learning
Anatomy of Algorithms
To be continued.