4.3. METHODOLOGY 31
where z
x
ik
denotes the hidden representation, W
x
k
and b
x
k
, k D 1; : : : ; K, are weight matrices and
biases, respectively. e superscripts t and b refer to top and bottom. s W R 7! R is a nonlinear
function applied element wise.
1
We treat the output of the K-th layer as the latent representa-
tions for tops and bottoms, i.e., Qz
x
i
D z
x
iK
2 R
D
l
; x D ft; bg, where D
l
denotes the dimension-
ality of the latent compatibility space. Accordingly, we can measure the compatibility between
top t
i
and bottom b
j
as follows:
m
ij
D
Qz
t
i
T
Qz
b
j
: (4.2)
In this chapter, we also adopt the BPR framework that has proven to be powerful in the
implicit preference modeling [6, 36]. In particular, we assume that bottoms from the positive
set B
C
i
are more compatible to top t
i
than those non-composed neutral bottoms. Accordingly,
we build the following training set:
D
S
WD
˚
.i; j; k/jt
i
2 T ; b
j
2 B
C
i
^ b
k
2 BnB
C
i
; (4.3)
where the triplet .i; j; k/ refers to that bottom b
j
is more compatible with top t
i
compared to
bottom b
k
.
en according to [100], we have the objective function,
L
bpr
D
X
.i;j;k/2D
S
L
bpr
m
ij
; m
ik
D
X
.i;j;k/2D
S
ln
m
ij
m
ik

C
2
2
F
; (4.4)
where is the non-negative hyperparameter, the last term is designed to avoid overfitting and
refers to the set of parameters (i.e., W
x
k
and b
x
k
) of neural networks.
4.3.3 ATTENTIVE KNOWLEDGE DISTILLATION
As an important aspect of peoples daily life, clothing matching has gradually accumulated much
valuable human knowledge. For example, it is favorable that a coat goes better with a dress than
with short pants, while a silk top can hardly go with a knit bottom. In order to fully leverage
the valuable domain knowledge, we utilize the knowledge distillation technique to guide the
neural networks and allow the model to learn from general rules [45]. In particular, we adopt
the teacher-student scheme, whose underlying intuition is analogous to the human education,
where the teacher is aware of several professional rules and he/she thus can instruct students
with his/her solutions to particular questions. In this work, considering the flexibility of logic
rules [23] as a declarative language, we use logic rules to represent the fashion domain knowl-
edge. We encode these rules via regularization terms into a teacher network q, which can be
1
In this work, we use the sigmoid function s.x/ D 1=.1 C e
x
/.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset