Teacher Network Construction

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.3. METHODOLOGY 31

where z

denotes the hidden representation, W

and b

, k D 1; : : : ; K, are weight matrices and

biases, respectively. e superscripts t and b refer to top and bottom. s W R 7! R is a nonlinear

function applied element wise.

We treat the output of the K-th layer as the latent representa-

tions for tops and bottoms, i.e., Qz

D z

2 R

; x D ft; bg, where D

denotes the dimension-

ality of the latent compatibility space. Accordingly, we can measure the compatibility between

top t

and bottom b

as follows:





: (4.2)

In this chapter, we also adopt the BPR framework that has proven to be powerful in the

implicit preference modeling [6, 36]. In particular, we assume that bottoms from the positive

set B

are more compatible to top t

than those non-composed neutral bottoms. Accordingly,

we build the following training set:

.i; j; k/jt

2 T ; b

2 B

^ b

2 BnB



; (4.3)

where the triplet .i; j; k/ refers to that bottom b

is more compatible with top t

compared to

bottom b

en according to [100], we have the objective function,

bpr

.i;j;k/2D

bpr



; m



.i;j;k/2D

 ln







 m







‚



; (4.4)

where  is the non-negative hyperparameter, the last term is designed to avoid overﬁtting and

‚ refers to the set of parameters (i.e., W

and b

) of neural networks.

4.3.3 ATTENTIVE KNOWLEDGE DISTILLATION

As an important aspect of people’s daily life, clothing matching has gradually accumulated much

valuable human knowledge. For example, it is favorable that a coat goes better with a dress than

with short pants, while a silk top can hardly go with a knit bottom. In order to fully leverage

the valuable domain knowledge, we utilize the knowledge distillation technique to guide the

neural networks and allow the model to learn from general rules [45]. In particular, we adopt

the teacher-student scheme, whose underlying intuition is analogous to the human education,

where the teacher is aware of several professional rules and he/she thus can instruct students

with his/her solutions to particular questions. In this work, considering the ﬂexibility of logic

rules [23] as a declarative language, we use logic rules to represent the fashion domain knowl-

edge. We encode these rules via regularization terms into a teacher network q, which can be

In this work, we use the sigmoid function s.x/ D 1=.1 C e

x

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Teacher Network Construction

Create new playlist

Sign In

Sign Up

Table of Contents for
Teacher Network Construction