The model components

The Skip-Gram model contains a 200-dimensional embedding vector for each vocabulary item, resulting in 31,300 x 200 trainable parameters, plus two for the sigmoid output.

In each iteration, the model computes the dot product of the context and the target-embedding vectors, passes the result through the sigmoid to produce a probability and adjusts the embedding based on the gradient of the loss.

