Here we discuss a general setup for a statistical inference problem. At the first place, from the data, we estimate the desired quantity and there might be unknown quantities too that we would like to estimate. It could be simply a response variable or predicted variable, a class, a label, or simply a number. If you are familiar with the frequentist approach, you might know that in this approach the unknown quantity say θ is assumed to be a fixed (nonrandom) quantity that is to be estimated by the observed data.
However, in the Bayesian framework, an unknown quantity say θ is treated as a random variable. More specifically, it is assumed that we have an initial guess about the distribution of θ, which is commonly referred to as the prior distribution. Now, after observing some data, the distribution of θ is updated. This step is usually performed using Bayes' rule (for more details, refer to the next section). This is why this approach is called the Bayesian approach. However, in short, from the prior distribution, we can compute predictive distributions for future observations.
This unpretentious process can be justified as the appropriate methodology to uncertain inference with the help of numerous arguments. However, the consistency is maintained with the clear principles of the rationality of these arguments. In spite of this strong mathematical evidence, many machine learning practitioners are uncomfortable with, and a bit reluctant of, using the Bayesian approach. The reason behind this is that often they view the selection of a posterior probability or prior as being arbitrary and subjective; however, in reality, this is subjective but not arbitrary.