MLE and MAP Learning
In many statistical learning tasks, our goal is to find the optimal parameter set 
 according to a maximization criterion. The most common approach is based on the likelihood 
 and is called MLE. 
In fact, given a statistical model 
 parametrized with the vector 
, the likelihood can be interpreted as the probability of such a model generating the training data. Therefore, given a suitable structure of 
 the MLE provides a simple but extremely effective tool to define a generative model that is never biased by prior belief. For our purposes, let's suppose we have a data-generating process pdata, used to draw a dataset X:

In this case, the optimal set 
 that maximizes the likelihood of a generic statistical model 
 parametrized with 
 is found as follows:

This approach has the advantage of being unbiased by incorrect preconditions, because the optimal value 
 depends exclusively on the observed data. However, at the same time, this approach...