The input x1, x2 is multiplied by the weights w1..w4 in each node and the respective bias is added. This is the linear transformation:
An activation function is applied on top to make it a non-linear transformation. A sigmoid transformation looks like this:
Now these activations are multiplied by v1..v4 and the respective bias is added:
Once again, the activation is applied on top of it:
A softmax function is applied, finally, to get the prediction output.