Machine Learning

Machine Learning is commonly defined as the ‘Science of getting computers to learn, without being explicitly programmed’. Machine learning remains a statistical-based method. As such, its goal is still to find data structure. Nevertheless, there are still many differences between traditional Statistics and Machine Learning. While the former entails expectations about the data relations (the statistician performs a theoretical test in order to accept or reject a null hypothesis), Machine learning is more focused on the patterns between the data and does not require assumptions (although finance and healthcare applications of Machine Learning still largely imply expectations about the data structures). The test for a machine learning model is a validation error on new data. Passes are run through the data until a robust pattern is found.

The ‘Learning’ can be Supervised, Unsupervised, or Semi-supervised.

In Supervised Learning, the machines are trained with labeled data, which means they are provided with both features, or inputs (X) and labels, or outputs (Y). From these, it learns all the relations between X and Y. Next, the machine will only be provided features and will be able to predict the labels. Classical examples of Supervised learning are regression and classification analysis. In the former, the model will produce continuous data. In the latter, the output will be discrete.

In Unsupervised learning, we only have input data (the ‘features’) and from these, we look for their underlying structure or distribution. There is no right answer and no training is needed. Unsupervised learning is pretty common in transactional data and is also widespread in marketing research. Clustering analysis and Association Rules are typical examples of Unsupervised Learning.

In Semi-Supervised Learning, the model is trained on a smaller amount of labeled and a larger amount of unlabeled data. In fact, Iabeling all the data is expensive and time-consuming. In Reinforcement Learning the agent is expected to act in the environment so as to maximize its rewards over a given amount of time. Since the agent will reach the goal much faster by following a good policy, the final goal in reinforcement learning is to learn the best policy. This branch of machine learning is mostly used for robotics, gaming and navigation.