3.4. EXPERIMENT 23
BPR-DAE-Nomod: To check the modality regularizer component that controls the con-
sistency between latent representations of different modalities, we removed the L
mod
by setting
D 0.
BPR-DAE-No: We removed both the reconstruction and modality regularizers by setting
D 0 and D 0.
Table 3.3 shows the performance of our model with different component configurations.
It can be seen that BPR-DAE outperforms all the other derivative models, which verifies the
impact of each component in our model. For example, we noticed that BPR-DAE shows supe-
riority over BPR-DAE-Nomod, which implies that the visual and contextual information of the
same fashion items does share certain consistency in terms of characterizing the fashion items.
Besides, the worse performance achieved by BPR-DAE-Norec as compared to BPR-DAE sug-
gests that the latent compatibility space can be helpful to reconstruct the fashion items.
Table 3.3: Performance comparison of our model with different component configurations with
respect to AUC
Approach AUC
BPR-DAE 0.7616
BPR-DAE-Norec 0.7533
BPR-DAE-Nomod 0.7539
BPR-DAE-No 0.7421
3.4.5 ON MODALITY COMPARISON (RQ3)
To verify the effectiveness of multi-modal integration, we also conducted experiments over dif-
ferent modality combinations. In particular, we adapted our model to BPR-DAE-V and BPR-
DAE-C to cope with the visual and contextual modality of fashion items, respectively, by re-
moving the other unnecessary autoencoder networks as well as the L
mod
regularizer. Figure 3.6
shows the comparative performance of different approaches with respect to AUC. We observed
that BPR-DAE outperforms both BPR-DAE-V and BPR-DAE-C, which suggests that the
visual and contextual information does complement each other and both contributes to the com-
patibility measurement between fashion items. It is surprising that BPR-DAE-C is more effec-
tive than BPR-DAE-V. One plausible explanation is that the contextual information is more
concise to present the key features of fashion items.
To intuitively illustrate the impact of contextual information, we illustrated the compar-
ison between BPR-DAE and BPR-DAE-V on testing triplets in Figure 3.7. As can be seen,
contextual metadata works better in cases where the given two bottom candidates b
j
and b
k
share similar visual signals, such as color or shape, where visual signals could be insufficient to
24 3. DATA-DRIVEN COMPATIBILITY MODELING
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
All Sweater Blouse Jacket
Different Categories
AUC
Sweatshirt T-shirt Cardigan Coat
BPR-DAE-VBPR-DAE BPR-DAE-C
Figure 3.6: Performance of the proposed models on tops of different categories. All refers to
the whole testing set.
BPR-DAE BPR-DAE-V BPR-DAE BPR-DAE-V
Fur Coat
t
i
b
j
b
k
t
i
b
j
b
k
t
i
b
j
b
k
Mens Jackets Biker Jeans Skinny Jeans Black Jeans
Denim Skirt
Embellished
Dress
Skinny
Jeans
Cotton
Trousers
Cotton
Sweatshirt
H & M
Sweaters
Tall Eastwood
Jean
High-Waisted
Cut Off Shorts
Striped
Blouse
Pinstriped
Culottes
Knee-Length
Skirts
Chunky Knit
Jumper
Knee-Length
Skirts
Figure 3.7: Illustration of the comparison between BPR-DAE and BPR-DAE-V on testing
triples. All the triples satisfy m
ij
> m
ik
. Due to the limited space, we only list the key phrases of
items’ contextual metadata.
distinguish the compatibility between them with the given top t
i
. Nevertheless, such contextual
information may also lead to certain failed triplets due to the category matching bias, especially
when visual signals of bottom candidates differ significantly. For example, it is popular to match
blouses with knee-length skirts according to our dataset, which may thus lead to the first failed
testing triplet in the right most column.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset