0% found this document useful (0 votes)

369 views47 pages

Deep Learning for Recommendation Systems

The document discusses a deep learning based recommendation system project presented by Nishanth Reddy Pinnapareddy to the Department of Computer Science at San Jose State University in fulfillment of the requirements for a Master of Science degree. The project aims to tackle the shortcomings of collaborative filtering recommendation systems using deep neural network techniques. It proposes replacing the inner product user-item interaction function used in matrix factorization with a neural network that learns the interaction function from data. Experimental results on two datasets demonstrate improvements over existing collaborative filtering methods.

Uploaded by

zhreni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

369 views47 pages

Deep Learning for Recommendation Systems

Uploaded by

zhreni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

San Jose State University

SJSU ScholarWorks
Master's Projects Master's Theses and Graduate Research

Winter 2018

Deep Learning based Recommendation Systems

Nishanth Reddy Pinnapareddy
San Jose State University

Follow this and additional works at: [Link]

Part of the Computer Sciences Commons

Recommended Citation
Pinnapareddy, Nishanth Reddy, "Deep Learning based Recommendation Systems" (2018). Master's Projects. 644.
[Link]

This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been
accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact
scholarworks@[Link].
Deep Learning based Recommendation Systems

A Project

Presented to

The Faculty of the Department of Computer Science

San Jose State University

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

Nishanth Reddy Pinnapareddy

May 2018
○
c 2018

Nishanth Reddy Pinnapareddy

ALL RIGHTS RESERVED

The Designated Project Committee Approves the Project Titled

Deep Learning based Recommendation Systems

Nishanth Reddy Pinnapareddy

APPROVED FOR THE DEPARTMENTS OF COMPUTER SCIENCE

SAN JOSE STATE UNIVERSITY

May 2018

Katerina Potika Department of Computer Science

Sami Khuri Department of Computer Science

Abhinand Lingareddy VMware Inc.

ABSTRACT

Deep Learning based Recommendation Systems

by Nishanth Reddy Pinnapareddy

The usage of Internet applications, such as social networking and e-commerce is

increasing exponentially, which leads to an increased offered content. Recommender

systems help users filter out relevant content from a large pool of available content.

The recommender systems play a vital role in today’s internet applications. Collabo-

rative Filtering (CF) is one of the popular technique used to design recommendation

systems. This technique recommends new content to users based on preferences that

the user and similar users have. However, there are some shortcomings to current CF

techniques, which affects negatively the performance of the recommendation models.

In recent years, deep learning has achieved great success in natural language process-

ing, computer vision and speech recognition. However, the use of deep learning in

recommendation domain is relatively new. In this work, we tackle the shortcomings

of collaborative filtering by using deep neural network techniques.

Although some recent work has employed deep learning for recommendation,

they only focused on modeling content descriptions, such as content information of

items and auricular features of audios. Moreover, these models ignore the important

factor of collaborative filtering, that is the user-item interaction function, but some

models still employ matrix factorization, by using inner product on the latent features

of items and users.

In this project, the inner product is replaced by a neural network architec-

ture, which learns an user-item interaction function from data. To handle any non-

linearities in the user-item interaction function, a multi-layer perceptron is used.

Extensive experiments on two real-world datasets demonstrate improvements made

by our model compared to existing popular collaborative filtering techniques. Empir-

ical evidence shows deep learning based recommendation models have better perfor-

mance.
ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor, Prof. Katerina Potika,

who expertly guided me through my graduate education and my master’s project.

Her unwavering enthusiasm for the study of social networks kept me engaged with

my research. Her constant mentorship, advice and support helped me to move in a

right direction towards completion of the project. I would like to thank her for her

time, help and efforts towards me and this project.

My deep gratitude also goes to Prof. Sami Khuri and my co-worker at work

Abhinand Lingareddy for being on my defense committee. I would like to thank

them for their time and efforts. Lastly, I would like to thank my friends and family.

They supported and helped me to survive this stress and not letting me give up.

vi
TABLE OF CONTENTS

CHAPTER

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Content-based filter techniques . . . . . . . . . . . . . . . . 5

2.1.2 Collaborative filter techniques . . . . . . . . . . . . . . . . 5

2.1.3 Hybrid filtering techniques . . . . . . . . . . . . . . . . . . 6

2.2 Deep Learning and Artificial Neural Networks . . . . . . . . . . . 7

2.2.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . 8

3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Recommendations from Implicit Feedback . . . . . . . . . . . . . 10

3.2 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1.1 Learning Model Parameters . . . . . . . . . . . . . . . . . 18

5.2 Generalized Matrix Factorization (GMF) . . . . . . . . . . . . . . 18

5.3 Multi-Layer Perceptron (MLP) . . . . . . . . . . . . . . . . . . . . 19

5.4 Neural Matrix Factorization . . . . . . . . . . . . . . . . . . . . . 20

6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

vii
6.1.1 MovieLens . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1.2 Pinterest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.3 Competing Methods . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.3.1 Most-Popular Item . . . . . . . . . . . . . . . . . . . . . . 24

6.3.2 User-KNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.3.3 Bayesian Personalized Ranking . . . . . . . . . . . . . . . 24

6.4 System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.5 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.6 Performance Comparisons . . . . . . . . . . . . . . . . . . . . . . 26

6.6.1 Experiments - Research Question 1 . . . . . . . . . . . . . 26

6.6.2 Experiments - Research Question 2 . . . . . . . . . . . . . 31

7 The Conclusion and Future Work . . . . . . . . . . . . . . . . . . 33

LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

viii
LIST OF TABLES

1 Characteristics of Datasets . . . . . . . . . . . . . . . . . . . . . . 22

2 System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Neural MF performance with and with out pre-training . . . . . . 31

5 Hit Ratio@10 of MLP with different hidden layer units . . . . . . 32

6 NDCG@10 of MLP with different hidden layer units . . . . . . . 32

ix
LIST OF FIGURES

1 Recommender System techniques . . . . . . . . . . . . . . . . . . 4

2 Representation of neuron . . . . . . . . . . . . . . . . . . . . . . . 7

3 An example explains MF’s limitation . . . . . . . . . . . . . . . . 12

4 Generalized Neural Network Framework . . . . . . . . . . . . . . 17

5 Neural Network Matrix Factorization . . . . . . . . . . . . . . . . 20

6 Performance of HitRatio@10 and NDCG@10 on MovieLens Dataset 27

7 Performance of HitRatio@10 and NDCG@10 on Pinterest Dataset 28

8 Top-K item recommendation on MovieLens Dataset . . . . . . . . 29

9 Top-K item recommendation on on Pinterest Dataset . . . . . . . 30

x
CHAPTER 1

Introduction

Recommender systems are intelligent systems which exploit users ratings on items

in the past to recommend similar items to other users. These systems play a crucial

role in on-line businesses by pro-actively narrowing the navigation items for the users

based on their preferences. The main problem solved by recommender systems is the

problem of information overload. For bigger companies, having an efficient recom-

mender system would provide a positive impact on their revenue. The personalized

content provided by a recommender system would improve the experience of the user

which would save them a lot of time.

Collaborative filtering (CF) [1], Content-based filtering (CB) and Hybrid Meth-

ods are the popular recommendation techniques in recommender system domain. The

collaborative filtering methods and hybrid methods use different criteria to suggest

the items tailored for users. For example, CF-based methods make use of history of

the user ratings on products whereas hybrid methods, which combine both collab-

orative filtering and content-based methods. Due to some security concerns in CB

methods like collecting user profile information, collaborative filtering based methods

became popular for personalized recommendations. Matrix Factorization (MF) [2] is

the most popular method of all CF based methods. This method rely on user-item

interactions, where interactions can be represented or modeled as inner product of

latent vectors.

After Netflix Prize [3] competition, Matrix Factorization became the default

method to model user-item interactions based collaborative filtering methods. A lot

1
of research work was put in to enhance Matrix Factorization, essentially to integrate

MF along with neighbor-based models [4], merging MF with topic models of the

item description [5], and extending the functionality to factorization machines [6] for

analyzing general methods to model the features. Though MF is very effective for

collaborative filtering , it is also a known fact that its capabilities could be negatively

impacted by selecting the inner product of IF(Interaction Function). Consider a

scenario of rating the analysis on EF(Explicit Feedback), where we very well know

that by using item bias terms and user into IF, it will improve the performance of

MF model. Interactions between items and users which are termed as latent feature

interactions can be designed in an effective way by just making a minor tweak to the

inner product operator. Inner product, which is a combination of product of latent

features in a linear method, would not be sufficient for obtaining the complex model

of the data about user interaction.

Many people have previously worked on handcraft models but we have explored

the IF from data instead of handcraft by using deep neural networks [7]. Several

domains like processing the text from speech recognition and computer vision [8] are

the prominent areas where deep neural networks[DNN’s] have already proven their

capability of calculating any continuous function(CF) [9]. There is a large amount of

literature available on MF models but a considerably lesser work has been done to get

recommendation by applying DNN’s. Some latest advancements [10] have used DNN’s

for recommendations and have reported great results. DNN’s have been mostly used

to model auxiliary information like visual content of images, textual information on

items and audio features of music. But to model the effect of collaborative filtering,

MF has been used for the calculation of inner product by adding user and latent

features of an item.

2
In this project, we address the drawbacks of Matrix factorization by replacing in-

ner product with a neural network architecture. Rather than using explicit ratings we

focused on implicit ratings, that inherently shows user preference through behaviors

like clicking or buying items and watching videos and can be tracked automatically.

Thus, it makes easy for content providers to collect data. However, there is one

problem with this feedback as we can not differentiate between positive feedback and

negative feedback. To reduce the effect of negative feedback we utilize deep neural

networks to model recommendation task. The main contribution of our work is that

we extended the existing approaches of Generalized Matrix Factorization(GMF) and

Multi-layer Perceptron(MLP) models [11] by fusing them together to a new model

Neural Matrix Factorization and performed extensive experiments to evaluate the

performance of this new model.

3
CHAPTER 2

Background

This chapter presents the essential background.

2.1 Recommender Systems

A recommender system or a recommendation system (sometimes replacing "sys-

tem" with a synonym such as platform or engine) is a subclass of information filtering

system that seeks to predict the "rating" or "preference" a user would give to an item.

Figure 1: Recommender System techniques

As shown in Figure 1, the techniques in recommender systems is broadly classified

into three categories.

∙ Collaboratives filtering techniques rely on user activity like ratings on items or

buying patterns.

∙ Content based filtering techniques rely on user activity attributes like keywords

4
during search or their profiles.

∙ Hybrid filtering techniques combine both above techniques to overcome their

limitations and improves performance.

The following sections we will cover each of the techniques and their limitations.

2.1.1 Content-based filter techniques

These methods use both content description or descriptive attributes of items

and user activity to make recommendations. Content-based methods works better

for new items in the system since they find the similar items based on items descriptive

attributes which are rated by active user. However, these techniques restrict models

to predict particular type of items as they rely on keywords and content of items.

They also don’t work well when making predictions for new users.

2.1.2 Collaborative filter techniques

Models based on this technique rely on the collaborative power of ratings pro-

vided by users. The key challenge in building these models is that ratings matrices

are sparse. Collaborative filtering methods predict unspecified ratings based on the

observed ratings since they are often highly correlated across various users and items.

Memory and Model based methods are commonly used techniques.

[Link] Memory based methods

These methods are also referred as neighborhood-based collaborative filtering

algorithms. Here, the predictions are based on the neighborhoods. These neighbor-

hoods can be based on one of the below.

5
∙ User-based collaborative filtering: This filtering technique will give recom-

mendations based on the ratings provided by like-minded people. For example,

if you want to provide suggestion to user A you determine all the users who

are similar to user A and recommend ratings for the missing ratings of A by

calculating the weighted averages of the ratings of similar set of users.

∙ Item-based collaborative filtering: Here, to predict the ratings for target

item B by user A, we need to calculate the set of items which are similar to

item B. The ratings in this set provided by A will help us determine if user A

will like item B, or not.

Memory based models are simple to implement and easy to understand. However,

they don’t perform well with sparse rating matrices that is they lack full coverage

of rating predictions. Nevertheless, this won’t be an when we want to predict top-k

items.

[Link] Model based methods

These methods use predictive data mining and machine learning techniques to

make recommendations. In case of parametrized models, these parameters are learned

using optimization frameworks. Decision trees, Rule-based models, Bayesian methods

and latent factor models are some examples. These models have high level of predic-

tion coverage even for sparse ratings. However, these methods tend to be heuristic

and don’t perform well under all settings.

2.1.3 Hybrid filtering techniques

This is the best recommender system when we have diverse set of input categories

in dataset. Various aspects from the above-mentioned recommender systems can be

6
used to achieve the best results. This will use the power of many machine learning

algorithms in a combined way to create a robust model.

2.2 Deep Learning and Artificial Neural Networks

Deep learning is a subset of Machine Learning family, which learns data represen-

tations, as opposed to task specific algorithms. Deep learning models use a cascade of

multi layered non-linear processing units called as neurons, which can perform feature

extraction and transformation automatically. The network of such neurons is called

an Artificial Neural Network.

Artificial Neural Network is a computational model that is inspired by the way

biological neural networks in the human brain process information. The smallest unit

of computation in neural network is neuron, often called as a node or unit. It receives

inputs from other neurons and computes an output. Each input to the node has a

associated weight (w) which signifies its relative importance to other inputs. The

node applies function f to weighted sum of inputs as shown in Figure 2:

Figure 2: Representation of neuron

[12]

The function f is non-linear and often referred to as Activation function. This

7
function is useful to learn complex patterns in data.

2.2.1 Feedforward Neural Network

This neural network contains multiple neurons arranged in layers. Neurons from

adjacent layers have connections between them and each of these connections have

weights associated with them. Figure 3 shows an example of feedforward neural

network.

Figure. 3. Feedforward neural network [12]

Feedforward neural network consists of three types of nodes:

∙ Input Nodes - These nodes provide information from external sources to net-

work and together it is referred as "Input Layer". These nodes don’t perform

any computation, they just pass the information to hidden nodes.

∙ Hidden Nodes - These nodes does not have any connection with external

world. They perform computations and transfer information from input to

8
output nodes. Feedforward network will have only single input and output

layers, but it can have zero or more hidden layers.

∙ Output Nodes - These nodes perform computations and transfer information

to outside world. The collection of Output nodes referred to as "Output Layer".

In feedforward networks, the information moves along only in one direction -

forward - from input nodes, through hidden nodes and finally to output nodes. It does

not contain any cycles or loops. Single Layer Perceptron is type of feedforward

neural network does not contain any hidden layer. Multi Layer Perceptron has

one or more hidden layers.

9
CHAPTER 3

Problem Definition

In this chapter we introduce the problem and then discuss about existing collabo-

rative filtering techniques based on implicit data. We then discuss the most popularly

used Matrix Factorization method and its limitation due to inner product of user and

item latent vectors.

3.1 Recommendations from Implicit Feedback

Let M and N represent the number of users and items respectively. We denote

interaction matrix between user and items as Y ∈ ℜ𝑀 𝑋𝑁 .

⎧
⎪
⎨1, if there is an interaction between user and item.
⎪
𝑦𝑢𝑖 = (1)
⎪
⎩0, otherwise.
⎪

If the value of 𝑦𝑢𝑖 is 1, then is an interaction between user and item. Otherwise

there is no interaction between user and item. Since, these interactions do not specify

whether actually the user likes the item or not there is possibility of noise signals.

We formulate predicting recommendations from implicit data as predicting the

scores of unobserved entries in Y. Model based techniques abstracts learning as

𝑦𝑢𝑖 = 𝑓 (𝑢, 𝑖|Θ), where 𝑦ˆ𝑢𝑖 represents the predicted score for 𝑦𝑢𝑖 , Θ represents model

parameters and 𝑓 represents a interaction function that maps model parameters to

calculated score.

Existing techniques use machine learning algorithms, which optimize objective

function to calculate model parameters Θ. These techniques commonly use two types

10
of objective functions - pointwise loss [12] and pairwise loss. Most models use point-

wise loss objective function and they learn by following a regression framework by

𝑦𝑢𝑖 ) and observed score (𝑦𝑢𝑖 ).

minimizing the squared loss between predicted score (ˆ

These models handle negative feedback either by sampling negative entries from ob-

served entries [13] or by treating all unobserved entries as negative feedback. Models

based on pairwise loss function assumes the observed entries should be ranked higher

that the unobserved entries. These models increase the difference between observed

entry 𝑦ˆ𝑢𝑖 and unobserved entry 𝑦ˆ𝑢𝑗 Ȯur proposed model based on neural network sup-

ports uses pointwise learning, but it can also extended to pairwise learning.

3.2 Matrix Factorization

Matrix Factorization (MF) is one of the most popular collaborative filter tech-

niques used in Industry for Recommender Systems. It pairs each user-item interaction

with a real valued vector of latent features. Let 𝑝𝑢 and 𝑞𝑖 represent the real valued

latent vectors are user and item, respectively. MF computes (𝑦𝑢𝑖 ) as the dot product

of 𝑝𝑢 and 𝑞𝑖 :
𝐾
∑︁
𝑦ˆ𝑢𝑖 = 𝑓 (𝑢, 𝑖|p𝑢 , q𝑖 ) = p𝑇𝑢 q𝑖 = 𝑝𝑢𝑘 𝑞𝑘𝑖 (2)
𝑘=1

where K represent latent space. The bi-directional communication of user and product

latent factors considering the each direction of latent space is not connected with each

other and linearly adding them with same load. Therefore, MF can be considered as

a one dimensional model of latent factors.

To understand the above illustration well, there are two settings to be stated

clearly beforehand. Since the latent space is the result of mapping users and products

in the same dimension the, the dot product or the cosine of angle in latent vectors gives

us the similarity between two people. The second point is the Jaccard coefficient [14]

11
Figure 3: An example explains MF’s limitation
[14] From user-item matrix (a), u4 is most identical to u1, followed by u3, and
lastly u2. However, in user latent space (b), placing p4 closet to p1 makes p4 closer
to p2 than p3, resulting greater ranking loss.

which helps MF to calculate similarity between the users without losing generality

between them.

We can see from the first three rows of user-item matrix in Fig. 3a, the cosine

similarity score of 𝑠23 (0.66) > 𝑠12 (0.5) > 𝑠13 (0.4). As such, the geometric relations of

p1, p2, and p3 in the latent space can be plotted as in Figure 3b. Now, let us consider

a new user u4, whose input is represented by dashed line in Fig. 3a. We can have

𝑠41 (0.6) > 𝑠43 (0.4) > 𝑠42 (0.2), meaning that u4 is most similar to u1, followed by u3,

and lastly u2. However, if p4 is placed closer to p1 by this model, it will result in p4

closer to p2 than p3, which unfortunately will results in greater ranking loss.

From this illustration, we can see the negative impact created by simple and fixed

inner product on model performance. Our models address this drawback by learning

user-item interactions using deep neural networks which is covered in later sections.

12
CHAPTER 4

Related Work

Past models rely on data from explicit feedback as the primary source for rec-

ommendations tasks [15], but the attention is slowly moving towards implicit data.

The implicit feedback of collaborative filtering is usually interpreted as a problem

of recommendation of the item that focuses on recommending a simple item list for

users. The problem on predicting the rating is broadly solved so far by the work

done on explicit feedback(EF) but it is more practical to solve the problem on item

recommendation but it is more challenging. To design the models of latent factor

for the item recommendation based on implicit feedback(IF) , recent works added to

a uniform weighting where proposal is made with two strategies, which considered

all the data missing to be negative instances or derived the negative instances from

the data that was missing. To weigh the missing data, dedicated models have been

proposed by He et al[ [2] and Liang et al [16]. For the models that are based on

feature based factorization, Rendle et al [17] implemented an implicit coordinate de-

scent (iCD), which achieved the cutting-edge performance for recommendation of the

item. Neural networks usage for the recommendation works is discussed in depth in

the following content.

The work done by Salakhutdenov et al. [15] involves a two layered Restricted

Boltzmann Machines for modeling the users that contain explicit ratings for the items.

This particular work was then extended to model the ratings for ordinal nature[ref].

In recent times, the mostly used choice to build the recommendation systems is au-

toencoders. A study of hidden patterns that are capable of reconstructing the ratings

of a user with the inputs of historical ratings is called user-based AutoRec [18]. Rather

13
than personalizing the user data this approach is shares a similarity with item-item

model[ref] where the rated items represent a user. For the purpose of avoiding au-

toencoders identity function learning and failure to generalize the unseen data, the

introduction of denoising autoencoders (DAE’s) has been done to study from the

inputs which are intentionally corrupted[ref]. A neural autoregressive method for col-

laborative filtering (CF) has been recently proposed by Zheng et al [19]. The effort

which has been put previously has provided a very strong support which improved

the success of neural networks (NN) to address the problem of collaborative filtering

where the focus was more on the explicit ratings and it is only modeled using observed

data. Accordingly, they could fail in learning users preferences because implicit data

is positive.

While some recent work [20] have analyzed recommendation established on im-

plicit feedback (IF) by using deep learning models, they have mainly used deep neu-

ral networks (DNN’s) to model the additional information like text description of the

items, sound properties of the music which deals with physics, behavior of users across

multiple domains, and abundant content in the knowledge areas. These particular fea-

tures derived from deep neural networks are then combined with Matrix Factorization

for collaborative filtering. The one which is more similar to the work [21], that ensures

the auto-encoder of collaborative denoising also termed as (CDAE) for collaborative

filtering with the implicit feedback (IF). Contrary to the denoising auto-encoder based

collaborative filtering, collaborative denoising autoencoder(CDAE) also pushes a node

of user into autoencoders(AE) input for reconstruction of ratings of users. According

to these authors, collaborative denoising autoencoder(CDAE) shares some similarities

with SVD++ model [20] where the activation of hidden structures of collaborative

denoising autoencoder(CDAE) can be obtained by the application of identity func-

14
tion. Though CDAE is used as neural modeling method for collaborative filtering ,

it also involves applying inner product to design or model the user and item inter-

actions(UII). This explains very well why the usage of deep layers for collaborative

denoising autoencoder will not enhance its performing ability (cf. Section 6 of [21]).

Noticing this typical behavior from collaborative denoising autoencoder the NCF por-

trays a two - way architecture where the user and item interactions are modeled with

multi-layer feedforward neural network(MFNN). This helps NCF to derive a function

which is arbitrary from the data provided which is more self-explanatory and very

much capable than the inner product function(IPF) which is constant.

Identically, grasping the relationship between two objects has been worked on

extensively in the previous works of knowledge base graphs [22]. A lot of development

has taken place like machine learning models which are relative [13]. An other method

called Neural Tensor Network has shown robust performance as it uses the neural

networks to understand the interaction between two entities which is identical to

our proposal. This targets a different aspect of collaborative filtering. Since Neural

MF combines the functionality of Matrix factorization with Multi-layer perceptron it

appears to be leveraged from NTN but Neural MF is very dynamic and general than

NTN because it allows MLP and MF to learn variable sets of embeddings.

Recently, Google published their deep neural network models which they are us-

ing for product recommendations [23]. These models used Multi-Layer Perceptron

architecture, which showed promising results and also made the model generic. Al-

though these models work on different aspects of user-item interactions, we target

at analyzing deep neural networks for only CF based recommender systems. In this

project, we explored the use of deep neural networks to model complex user-item

interactions.

15
CHAPTER 5

The Model

In this chapter we first present a general framework to learn user-item inter-

action function using neural networks with a probabilistic model which emphasizes

the implicit feedback data. We then express matrix factorization (MF) [11] as a

neural network model. To explore deep neural networks for collaborative filtering, a

multi-layer perceptron (MLP) [11] model is used to learn user-item interaction func-

tion. Finally, we present our neural network matrix factorization model, which is a

fusion of MF and MLP models. This model gets strengths of linearity of MF and

non-linearity of MLP to model user-item latent structures.

5.1 General Framework

To model user-item interaction 𝑦𝑢𝑖 we used a multi-layer representation as shown

in Figure 3, where the output of one layer serves as the input to the next layer.

The first input layer has two input vectors 𝑣𝑢𝑈 and 𝑣𝑖𝐼 that represent user u and item

i. These are sparse binary vectors with one-hot encoding. After input layer, there

is an embedding layer. This layer is fully connected one, that projects the sparse

representation to a dense vector. The resulted user/item embedding can be viewed as

the latent vector for user/item in the context of latent factor model. These embedding

layers are then fed into a multi-layer neural architecture to map the latent vectors

to prediction scores. We can also customize each hidden layer to discover new latent

structures from user-item interactions. The final layer gives the predicted score 𝑦ˆ𝑢𝑖 and

the dimension of last hidden layer determines the model’s capability. We performed

training by minimizing the pointwise loss between 𝑦ˆ𝑢𝑖 and its actual value 𝑦𝑢𝑖 .

16
Figure 4: Generalized Neural Network Framework
[14]

We now formulate the neural network predictive model as

𝑦ˆ𝑢𝑖 = 𝑓 (P𝑇 v𝑈𝑢 , Q𝑇 v𝐼𝑖 |P, Q, Θ𝑓 ) (3)

where P ∈ ℜ𝑀 𝑋𝐾 and Q ∈ ℜ𝑁 𝑋𝐾 , denoting the latent factor matrix for users and

items and Θ𝑓 represents the model parameters for interaction function. Since 𝑓 is

defined as multi-layer neural network, it can be formulated as

𝑓 (P𝑇 v𝑈𝑢 , Q𝑇 v𝐼𝑖 |P, Q, Θ𝑓 ) = 𝜑𝑜𝑢𝑡 (𝜑𝑋 (...𝜑2 (𝜑1 (P𝑇 v𝑈𝑢 , Q𝑇 v𝐼𝑖 ))...)) (4)

where 𝜑𝑜𝑢𝑡 and 𝜑𝑋 represent the mapping function for the output layers and X-th

neural network CF layer.

17
5.1.1 Learning Model Parameters

Generally to learn model parameters, existing pointwise methods perform a re-

gression task with squared loss:

∑︁
L𝑠𝑞𝑟 = 𝑤𝑢𝑖 (𝑦𝑢𝑖 − 𝑦ˆ𝑢𝑖 )2 (5)
(𝑢,𝑖)∈𝑌 𝑈 𝑌 −

where 𝑌 denotes actual observations in Y, and 𝑌 − denote the set on unobserved

observations. While the squared loss works better on data drawn from Gaussian

distribution [24] it fails to perform well on binary data [0, 1]. So to learn model

parameters on binary data, we used a probabilistic function as the activation function

for the output layer 𝜑𝑜𝑢𝑡 . We define the likelihood function as

∏︁ ∏︁
𝑝(𝑌, 𝑌 − |𝑃, 𝑄, Θ𝑓 ) = 𝑦ˆ𝑢𝑖 − (1 − 𝑦ˆ𝑢𝑗 ) (6)
(𝑢,𝑖)∈𝑌 (𝑢,𝑗)∈𝑌 −

by taking the negative logarithm of the likelihood, we reach

∑︁
𝐿=− 𝑦𝑢𝑖 𝑙𝑜𝑔 𝑦ˆ𝑢𝑖 + (1 − 𝑦𝑢𝑖 )𝑙𝑜𝑔(1 − 𝑦ˆ𝑢𝑖 ) (7)
(𝑢,𝑖)∈𝑌 𝑈 𝑌 −

Equation 7 is known as binary cross-entropy loss or log loss. We used this as our

objective function and its optimization is performed by stochastic gradient descent

(SGD).

5.2 Generalized Matrix Factorization (GMF)

In this section, we show how MF can be interpreted as a special case of neural

collaborative filtering (NCF). By modeling this in to a NCF we can cover large family

of factorization methods.

The input to this model is one-hot encoding of user/item vectors and then fol-

lowed embedding layer can be viewed as latent vector of user/item. Let us denote

18
user latent vector as p𝑢 and item latent vector as q𝑖 , respectively. We define the

mapping function to first neural CF layer as

𝜑𝑜𝑢𝑡 (p𝑢 , q𝑖 ) = p𝑢 ⊙ q𝑖 (8)

where ⊙ denotes the dot product of vectors. We then project the vector to output

layer as:

𝑦ˆ𝑢𝑖 = 𝑎𝑜𝑢𝑡 (h𝑇 (p𝑢 ⊙ q𝑖 )) (9)

where 𝑎𝑜𝑢𝑡 and h𝑇 represent activation function and edge weights of out put layer,

respectively.

We implemented a generalized version of matrix factorization the uses sigmoid

function as activation function and learns model parameters with log loss objective

function.

5.3 Multi-Layer Perceptron (MLP)

As mentioned in section 5.1, neural collaborative filtering adopts two pathways

to model user and items. It is intuitive to concatenate both these pathways [11]

to design an efficient deep learning based recommender system. However, a simple

vector concatenation is not enough to capture the interactions between user and item

latent features. To overcome this issue, we added hidden layers on the concatenated

vector, used MLP to learn the interaction between user and item latent vectors. We

formulate the model as:

𝑦ˆ𝑢𝑖 = 𝜎(h𝑇 𝜑𝐿 (𝑧𝐿−1 )) (10)

We implemented this model with ReLU [25] as activation function and to design

neural network architecture we followed a tower pattern, where the bottom is the

19
widest one and each successive layer has smaller number of neuron units as shown in

Figure 3.

5.4 Neural Matrix Factorization

So far we have seen two neural network based models - GMF that applies linear

kernel and MLP that uses a non-linear kernal, respectively to learn interaction func-

tion from data. Now, we present a hybrid model by fusing GMF and MLP so they

can mutually reinforce each other and learn the complex user-item interactions.

Figure 5: Neural Network Matrix Factorization

[14]

An obvious approach to fuse these models is to share both GMF and MLP same

embedding layer, and then combine the outputs of their interaction functions. How-

ever, sharing embeddings of GMF and MLP may limit the performance and flexibility

of fused model. So, we allowed GMF and MLP to learn separate embeddings, and

20
combine these models by concatenating their last hidden layers as shown in Figure 4.

We can formulate this model as:

𝑦ˆ𝑢𝑖 = 𝜎(h𝑇 (𝜑𝐺𝑀 𝐹 𝑀 𝐿𝑃

𝑜𝑢𝑡 .𝜑𝑜𝑢𝑡 )) (11)

This model combines linearity from MF and non-linearity from neural networks for

modeling user-item latent structures.

21
CHAPTER 6

Experimental Results

In this chapter, we cover the experiments that aim to answer the following re-

search questions.

Research Question 1 - Did our proposed models out perform existing state of

art collaborative filtering techniques for implicit feedback?

Research Question 2 - Are deeper hidden layers in neural network architecture

beneficial to learn complex user-item interactions?

6.1 Datasets

We conducted experiments on two popularly available datasets: MovieLens and

Pinterest. Table 1 shows some statistical features of these datasets.

Table 1: Characteristics of Datasets

Dataset Interactions Items Users Sparsity Percent

MovieLens 1,000,208 3,705 6,041 95.52%
Pinterest 1,500,808 9,915 55,186 99.74%

6.1.1 MovieLens

MovieLens is one of the most widely used dataset for evaluating collaborative

filtering algorithms. There are different versions of this dataset available we used

the one which contains 1,000,000 (million) movie ratings and every user has given

more than 20 ratings. These ratings given by user are explicit, we have choose this

particular dataset explicitly to evaluate the learning of implicit feedback from explicit

22
ratings. We converted it to implicit data by transforming each entry to 1 or 0 denoting

whether user has rated the movie or not.

6.1.2 Pinterest

This implicit feedback dataset [26] is originally used for analyzing the perfor-

mance of content-based image recommender systems. But, this dataset is highly

sparse and more than 25% of users has only single pin which makes it harder to ana-

lyze the performance of collaborative filtering techniques. So, we modifies the dataset

to be similar to Movielens dataset by ignoring users who doesn’t have at-least 20 pins

(interactions). Each interactions represents whether a user has pinned the image to

his or her feed or not.

6.2 Evaluation Metrics

We used leave-one-out strategy to evaluate the performance of our models. Ac-

cording to this protocol or strategy, for each user leave out last user-item interaction

which is used for testing. The remaining user-item interactions is used for training.

This protocol is widely used by other implicit feedback recommendation models. Now,

we need to rank the items for each user. Since it is very time consuming to do this,

we used a popular strategy [23] that randomly draws top-K items, which user has not

interacted and then rank the leave out item among these top-K item list. We used

Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [25]

evaluation metrics to measure the performance. Hit Ratio measures the whether the

leave out item is present in top-K ranked list and Normalized Discounted Cumulative

Gain measures the position of leave out item in the rank list by assigning high scores

if it hits at high rank. We performed our experiments on top-10 item rank list for

23
users.

6.3 Competing Methods

We compared neural network models with following state-of-art collaborative

filtering techniques. Since, our neural network models evaluates the user-items inter-

actions, we competed our models with user-item collaborative filtering models rather

then item-item models.

6.3.1 Most-Popular Item

Items are ranked by number of times they appear in user-item interactions. This

model comes under non-personalized recommendation category. We used this model

to baseline the recommendation model performance.

6.3.2 User-KNN

This is one of the popular user based neighborhood collaborative filtering tech-

nique [27]. We adapted this model to learn from implicit user-item interactions data

by following a strategy mentioned in this [12] paper.

6.3.3 Bayesian Personalized Ranking

This model [28] is a variant of Matrix Factorization which optimizes equation

present in section 3.2. using a pairwise ranking loss technique to evaluate user-

item interactions from implicit feedback. This is one of the best model for item

recommendations. We varied learning rate and then reported the best performance.

24
6.4 System Configuration

All our experiments are performed on system with the configuration in Table 2.

Table 2: System Configuration

Property Value
Operating System MacOS High Sierra
Processor 2.8 GHz Intel Core i7
Memory 16 GB 2133 MHz LPDDR3
Disk 512 GB SSD
Keras 1.0.8 version
Theano 0.9.0 version

6.5 Parameter Settings

We used Keras as our backend to implement neural network models. As these

are parametric models, to determine these hyper parameters we randomly sample one

user-item interaction for each user and used this random sample as validation data to

tune hyper parameters. These models learn by optimized log loss objective function.

We used Gaussian distribution, to randomly initialize model parameters especially for

neural networks that are trained from scratch and then optimized using a mini-batch

Adam [29]. We performed experiments by varying batch sizes and learning rates as

shown in Table 3.

Table 3: Parameter variations

Property Values
Batch Sizes 128, 256, 512 and 1024
Learning rates 0.0001, 0.0005, 0.005 and 0.001

We termed the last hidden layer in neural network model as predictive factors

as it determines model capability and evaluated factors of [8, 16, 32, 64]. We used

three hidden layers for Multi Layer Perceptron models, for example, if the number of

25
predictive factors is 16, then the architecture of neural collaborative filtering layers is

64->32->16 and the embedding layer size is 32. For our, fused neural network model

with pre-training, 𝛼 was set to 0.5 there by so that pre-trained Generalized Matrix

Factorization and Multi Layer Perceptron models to contribute equally.

6.6 Performance Comparisons

In this section, we show how our experiments answered afore-mentioned research

questions.

6.6.1 Experiments - Research Question 1

In Figures 6 and 7, we compared the performance of different models on both

MovieLens and Pinterest datasets. These plots use performance metrics Hit Ratio@10

and Normalized Discounted Cumulative Gain@10 along y-axes and size of predictive

factors along x-axes. For MF based Bayesian Personalized Ranking (BPR) model,

the size of predictive factors matches to number of user and item latent vectors. In

case of UserKNN model, we evaluated this model with different neighborhood sizes

and picked the best performed one. To highlight the performance of personalized

recommendation models, we ignored Most-Popular Item model.

We can seen clearly from figures 6 and 7, Neural-MF is the winner among all

the competing methods and it also outperformed state-of-art collaborative filtering

models by good margin. has better performance on both datasets, significantly out-

performed the state of the art. On Pinterest dataset, even small number of predictive

factors such as 8, 16 Neural-MF outperformed BPR model with larger predictive

factors of 64. This shows the expressiveness of our model which is obtained by fus-

ing Generalized Matrix Factorization and Multi Layer Perceptron models. We can

26
Figure 6: Performance of HitRatio@10 and NDCG@10 on MovieLens Dataset

also see the other neural models, Generalized MF and Multi-Layer Perceptron also

have very good performance. Among them, Multi-Layer Perceptron is sightly less

performed compared to Generalized MF. Generalized MF also showed significant im-

provements over BPR, showing the effectiveness of classification-aware log-loss for

27
Figure 7: Performance of HitRatio@10 and NDCG@10 on Pinterest Dataset

recommendation problems.

Figures 8 and 9 capture the performance evaluation of Top-K item recommen-

dations where ranking position ranges from 1 to 10. To highlight the power of neural

networks, we compared Neural-MF with other non-neural based methods rather than

28
all neural network based methods. We can see that Neural-MF shows gradual im-

provements compared to collaborative filtering methods. UserKNN model performed

better across Model-based methods. Finally, we can see that Most-Popular Item

performed the worst, indicating the importance of personalized recommendation sys-

tems.

Figure 8: Top-K item recommendation on MovieLens Dataset

29
Figure 9: Top-K item recommendation on on Pinterest Dataset

[Link] Use of Pre-training

To show the effectiveness of pre-training for Neural-MF model, we made compar-

isons between models performance with pre-training and with random initializations.

For Neural-MF with random initializations we used Adam to learn model parameters.

30
In Table 4, we compared performance of models. In most of the cases, Neural-MF

with pre-training achieves better performance compared to the one with random ini-

tializations. Thus justifying the usefulness of pre-training during initialization of

Neural-MF model.

Table 4: Neural MF performance with and with out pre-training

Pre-training model Without pre-training

Factors HR@10 NDCG@10 HR@10 NDCG@10
MovieLens Dataset
8 0.685 0.402 0.689 0.412
16 0.708 0.427 0.697 0.421
32 0.728 0.446 0.702 0.426
64 0.732 0.448 0.706 0.427
Pinterest Dataset
8 0.879 0.556 0.867 0.546
16 0.881 0.559 0.873 0.549
32 0.878 0.557 0.871 0.548
64 0.876 0.553 0.869 0.552

6.6.2 Experiments - Research Question 2

With less work on neural networks in recommender system domain, it is impor-

tant to know whether deep neural networks are really beneficial to recommendation

problems. To figure out more, we conducted experiments on MLP model by varying

number of hidden layer units in MLP. The results of this experiments are shown in

Table 5 and 6. The MLP@K indicate the MLP model with K number hidden layer

units.

In Tables 5 and 6, we calculated the performance metrics- HR and NDCG for

top-10 item recommendations on both MovieLens and Pinteret datasets. We varied

the number of hidden units in MLP model from 0 to 4 and predictive factors from 8-

>16->32->64. We can see that increasing layers are beneficial to performance. Thus

31
showing the importance of deep neural layers in neural collaborative filtering models.

Table 5: Hit Ratio@10 of MLP with different hidden layer units

Factors MLP@0 MLP@1 MLP@2 MLP@3 MLP@4

MovieLens Dataset
8 0.453 0.627 0.656 0.672 0.677
16 0.453 0.664 0.675 0.686 0.691
32 0.454 0.683 0.687 0.692 0.699
64 0.454 0.686 0.697 0.701 0.708
Pinterest Dataset
8 0.274 0.846 0.856 0.859 0.862
16 0.275 0.857 0.862 0.865 0.867
32 0.274 0.863 0.864 0.868 0.867
64 0.275 0.865 0.868 0.869 0.873

For MLP model with no hidden layers the performance is very less than non-

personalized item popularity recommendation model. This adds values to our argu-

ment, that is simply concatenating both user and item latent vectors is not enough

to model user-item interaction function and the usefulness of hidden layers.

Table 6: NDCG@10 of MLP with different hidden layer units

Factors MLP@0 MLP@1 MLP@2 MLP@3 MLP@4

MovieLens Dataset
8 0.254 0.358 0.383 0.399 0.406
16 0.253 0.390 0.402 0.410 0.415
32 0.251 0.407 0.410 0.425 0.423
64 0.252 0.408 0.417 0.426 0.432
Pinterest Dataset
8 0.142 0.526 0.534 0.536 0.539
16 0.142 0.532 0.536 0.538 0.544
32 0.143 0.537 0.538 0.542 0.546
64 0.144 0.538 0.542 0.545 0.550

32
CHAPTER 7

The Conclusion and Future Work

In this project, we used different neural network architectures to overcome the

limitations of matrix factorization collaborative filtering models. We showed these

models performed better than state-of-art existing models on real world datasets.

Our models are simple and generic that can be applied or extended to different types

of recommendation problems. This work complements the mainstream shallow mod-

els for collaborative filtering, opening up a new avenue of research possibilities for

recommendation based on deep learning.

As a future work, we want to use pairwise learners for Neural Matrix Factoriza-

tion models and broaden it by using auxiliary information such user reviews, knowl-

edge bases, and temporal signals as integral part. We want to do research in personal-

ization models which target group of users rather than individuals. These models will

be helpful in social group recommendations [30]. Apart from these models, we want to

develop neural net recommender systems for multi-media products [31] which are less

researched in recommendation domain. These products consists of richer visual ele-

ments that capture users interest. To add another dimension to deep neural network

based models we want to explore recurrent neural networks and hashing methods [32]

which further enhance the performance of recommender systems.

33
LIST OF REFERENCES

[1] J. Wei, “Collaborative filtering and deep learning based recommendation system
for cold start items,” Expert Systems with Applications, vol. 69, 2017.

[2] X. He, H. Zhang, M. Kan, and T. Chua, “Fast matrix factorization for online
recommendation with implicit feedback,” in SIGIR, 2016, pp. 549–558.

[3] Netfilx prize Competition, “Netfilx prize competition — Wikipedia, the free
encyclopedia,” 2006. [Online]. Available: [Link]
Prize

[4] Y. Koren, “Factorization meets the neighborhood: A multifaceted collaborative

filtering model.” in KDD, 2008, pp. 426–434.

[5] H. Wang, N. Wang, and D. Yeung, “Collaborative deep learning for recommender
systems.” in KDD, 2015, pp. 1235–1244.

[6] S. Rendle, “Factorization machines.” in ICDM, 2010, pp. 995–1000.

[7] L. HU, “Your neighbors affect your ratings: On geographical neighborhood in-
fluence to rating prediction.”

[8] K. H. et al., “Multilayer feedforward networks are universal approximators.” Neu-

ral Networks, vol. 5, pp. 359–366, 1989.

[9] H. Z. et al., “Start from scratch: Towards automatically identifying, modeling,

and naming visual attributes.” in MM, 2014, pp. 187–196.

[10] F. Z. et al., “Collaborative knowledge base embedding for recommender systems.”

in KDD, 2016, pp. 353–362.

[11] L. He, L. Liao, H. Zhang, H. Nie, X. Hu, and T. Chua, “Neural collaborative
filtering.” in Proceedings of the 26th International Conference on World Wide
Web. International World Wide Web Conferences Steering Committee, 2017,
pp. 173–182.

[12] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit feedback
datasets.” in ICDM, 2008, pp. 263–272.

[13] R. Socher, D. Chen, C. Manning, and A. Ng, “Reasoning with neural tensor
networks for knowledge base completion.” in NIPS, 2013, pp. 926–934.

34
[14] L. He, L. Liao, H. Zhang, H. Nie, X. Hu, and T. Chua, “Discrete collaborative
filtering.” in SIGIR, 2016, pp. 325–334.
[15] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann machines for
collaborative filtering.” in ICDM, 2007, pp. 791–798.
[16] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are
universal approximators,” Neural Networks, vol. 5, 1989.
[17] I. Bayer, X. He, B. Kanagal, and S. Rendle, “A generic coordinate descent frame-
work for learning from implicit feedback.” in WWW, 2017.
[18] S. Sedhain, A. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet
collaborative filtering.” in WWW, 2015, pp. 111–112.
[19] Y. Zheng, B. Tang, W. Ding, and H. Zhou, “A neural autoregressive approach
to collaborative filtering.” in ICML, 2016, pp. 764–773.
[20] A. Elkahky, Y. Song, and X. He, “A multi-view deep learning approach for cross
domain user modeling in recommendation systems.” in WWW, 2015, pp. 278–
288.
[21] F. Strub and J. Mary, “Collaborative filtering with stacked denoising autoen-
coders and sparse inputs.” in NIPS Workshop on Machine Learning for eCom-
merce, 2015.
[22] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Trans-
lating embeddings for modeling multi-relational data.” in NIPS, 2013, pp. 2787–
2795.
[23] T. Cheng, L. Koc, J. Harmsen, and T. Shaked, “Wide and deep learning for
recommender systems,” in WWW, 2016, pp. 2787–2795.
[24] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in NIPS,
2008, pp. 1–8.
[25] C. T. He, X, M. Kan, and X. Chen, “Trirank: Review-aware explainable recom-
mendation by modeling aspects,” in CIKM, 2001, pp. 285–295.
[26] X. Geng, H. Zhang, J. Bian, and T. Chua, “Learning image and user features for
recommendation in social networks,” in ICCV, 2015, pp. 4274–4282.
[27] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative fil-
tering recommendation algorithms,” in WWW, 2015, pp. 1661–1670.
[28] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Item-based
collaborative filtering recommendation algorithms,” in WWW, 2015, pp. 1661–
1670.

35
[29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR,
2014, pp. 1–15.

[30] X. Wang, L. Nie, X. Song, D. Zhang, and T. Chua, “Unifying virtual and physical
worlds: Learning towards local and global consistency,” ACM Transactions on
Information Systems, 2017.

[31] X. He, M. Kan, P. Xie, and X. Chen, “Comment-based multi-view clustering of

web 2.0 items,” in WWW, 2014, pp. 771–781.

[32] I. Bayer, X. He, B. Kanagal, and S. Rendle, “A generic coordinate descent frame-
work for learning from implicit feedback.” in WWW, 2017.

Common questions

Future directions for enhancing neural network-based recommendation systems include integrating pairwise learning approaches in Neural Matrix Factorization, leveraging auxiliary information such as user reviews and knowledge bases, and exploring personalization models targeting user groups. Further, developing multimedia product recommenders with rich visual elements and incorporating advanced techniques like recurrent neural networks and hashing methods are suggested to improve the performance and applicability of recommender systems .

The fusion of MLP and GMF in Neural Matrix Factorization addresses the limitations of traditional matrix factorization by incorporating both linear and non-linear interactions, allowing for a more comprehensive modeling of user-item relationships. Traditional matrix factorization focuses primarily on linear relationships, whereas the incorporation of MLP's non-linear capabilities enriches the model's ability to understand complex user-item dynamics, thereby enhancing predictive accuracy .

The simplicity and generic nature of Neural Matrix Factorization (NMF) models are beneficial because they allow easy adaptation and extension to various recommendation problems. These models are easier to implement and deploy across different datasets and contexts without needing extensive customization, making them versatile tools for a wide range of recommendation tasks. This adaptability supports the integration of additional features like auxiliary information or temporal dynamics to further enhance recommendations .

The dimensionality of the last hidden layer in a neural network model determines its capability to map user-item latent vectors to prediction scores. A higher-dimensional final hidden layer increases the model's capacity to capture complex patterns and interactions, thus improving the accuracy of recommendation systems by providing a more nuanced understanding of user behaviors .

Experimental results validate the use of deeper hidden layers by showing improved performance metrics such as Hit Ratio and NDCG with increased depth. Deeper layers capture more complex interactions and provide richer representations of user-item data, which enables better personalization and more accurate predictions. The results from MovieLens and Pinterest datasets demonstrate that models with additional hidden layers significantly outperform those with fewer or no hidden layers .

Neural Matrix Factorization (NMF) leverages both GMF and MLP by using GMF to apply a linear kernel and MLP to utilize a non-linear kernel. This approach allows NMF to mutually reinforce linearity from MF and non-linearity from neural networks, enabling the learning of complex user-item interactions. The fusion of GMF and MLP in NMF combines their outputs after learning separate embeddings, which enhances performance and flexibility of the model .

Collaborative filtering models based on deep learning architectures, such as those utilizing Neural Matrix Factorization, generally outperform traditional techniques in handling implicit feedback data. This is due to their ability to model complex, non-linear interactions in user-item dynamics, which traditional methods might overlook. Deep learning models can capture intricate patterns within sparse data, thereby improving the predictive accuracy of recommendations compared to methods like User-KNN or Most-Popular Item models .

Hit Ratio and Normalized Discounted Cumulative Gain (NDCG) provide insights into the accuracy and ranking quality of recommender systems using neural networks. Hit Ratio measures the presence of a leave-out item in a top-K ranked list, indicating the basic success of a recommendation. NDCG assesses the quality of the ranking by assigning higher scores to correct predictions near the top of the rank, thereby highlighting the system's ability to prioritize relevant items effectively .

Hidden layers in Multi-Layer Perceptron (MLP) models play a critical role in capturing the complex structure of user-item interactions. They allow the model to learn hierarchical representations and non-linear transformations, which significantly enhance the model's performance over a simple concatenation of user and item latent vectors. With more hidden layers, the MLP can model deeper interactions, leading to better performance in recommendation systems .

The experiments conducted with the MovieLens and Pinterest datasets highlight the challenge of dataset sparsity, as they show a high degree of sparsity, especially in the Pinterest dataset with 99.74% sparsity. This means that a large proportion of potential user-item interactions are missing, which complicates the evaluation and performance of collaborative filtering techniques. The sparsity necessitates the use of robust models like NMF to effectively capture latent user-item interactions from limited explicit data .

Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
22 pages
Ref 3 Recommender Systems For Learning PDF
No ratings yet
Ref 3 Recommender Systems For Learning PDF
84 pages
PyTorch Neural Network Training Guide
No ratings yet
PyTorch Neural Network Training Guide
48 pages
Comprehensive Machine Learning Notes
No ratings yet
Comprehensive Machine Learning Notes
5 pages
Deep Learning with TensorFlow Lab Manual
No ratings yet
Deep Learning with TensorFlow Lab Manual
15 pages
Image Captioning with CNN & LSTM Techniques
No ratings yet
Image Captioning with CNN & LSTM Techniques
24 pages
Enhancing EdgeAI with SLM Techniques
No ratings yet
Enhancing EdgeAI with SLM Techniques
45 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
32 pages
Enhancing Deep Learning with Bayesian Inference
No ratings yet
Enhancing Deep Learning with Bayesian Inference
28 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
6 pages
Anna University Question Paper
100% (1)
Anna University Question Paper
3 pages
Neural Network Models for Fashion MNIST
No ratings yet
Neural Network Models for Fashion MNIST
26 pages
Neural Networks Fundamentals Overview
No ratings yet
Neural Networks Fundamentals Overview
40 pages
AD3511 Deep Learning Lab Manual
No ratings yet
AD3511 Deep Learning Lab Manual
80 pages
Advanced Machine Learning Course Overview
No ratings yet
Advanced Machine Learning Course Overview
4 pages
Advanced Deep Learning Syllabus
No ratings yet
Advanced Deep Learning Syllabus
2 pages
Back Propagation in Neural Networks
No ratings yet
Back Propagation in Neural Networks
30 pages
Understanding Padding in CNNs
No ratings yet
Understanding Padding in CNNs
116 pages
Classification and Prediction Methods
100% (1)
Classification and Prediction Methods
54 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
33 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Unsupervised Learning in AI Applications
No ratings yet
Unsupervised Learning in AI Applications
11 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
75 pages
Apache Flume Architecture Overview
No ratings yet
Apache Flume Architecture Overview
23 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
9 pages
VGG-16 Transfer Learning for Image Classification
No ratings yet
VGG-16 Transfer Learning for Image Classification
9 pages
Basics of Deep Learning Course Overview
No ratings yet
Basics of Deep Learning Course Overview
69 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
30 pages
Udacity Deep Learning Nanodegree Syllabus
No ratings yet
Udacity Deep Learning Nanodegree Syllabus
15 pages
Decision Trees: Entropy, Gini, & Info Gain
No ratings yet
Decision Trees: Entropy, Gini, & Info Gain
25 pages
DNN, CNN, and RNN Overview
100% (1)
DNN, CNN, and RNN Overview
87 pages
Soft Computing Techniques Overview
No ratings yet
Soft Computing Techniques Overview
48 pages
Sales Forecasting with SVM Techniques
No ratings yet
Sales Forecasting with SVM Techniques
6 pages
MLP Overview in Soft Computing
No ratings yet
MLP Overview in Soft Computing
20 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
58 pages
Deep Learning Exam Answers and Matrix G
No ratings yet
Deep Learning Exam Answers and Matrix G
20 pages
Deep Learning & NLP Course Overview
No ratings yet
Deep Learning & NLP Course Overview
4 pages
Associative Memory Networks Explained
No ratings yet
Associative Memory Networks Explained
27 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
CS229 Deep Learning Cheatsheet
No ratings yet
CS229 Deep Learning Cheatsheet
6 pages
BDA Classification with Mahout Techniques
No ratings yet
BDA Classification with Mahout Techniques
72 pages
Machine Learning Lab Viva Questions
100% (1)
Machine Learning Lab Viva Questions
4 pages
Beginner Machine Learning Projects Guide
No ratings yet
Beginner Machine Learning Projects Guide
22 pages
Machine Learning Overview and Challenges
No ratings yet
Machine Learning Overview and Challenges
45 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Defining a Learning System in ML
No ratings yet
Defining a Learning System in ML
15 pages
Comparing Bagging, Boosting, and Stacking
No ratings yet
Comparing Bagging, Boosting, and Stacking
12 pages
Understanding Neural Networks and Fuzzy Logic
No ratings yet
Understanding Neural Networks and Fuzzy Logic
13 pages
Spark MLlib: Key Concepts and Pipelines
No ratings yet
Spark MLlib: Key Concepts and Pipelines
153 pages
Deep Learning: Concepts and Frameworks
No ratings yet
Deep Learning: Concepts and Frameworks
14 pages
Federated Learning
No ratings yet
Federated Learning
9 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Introduction to Spark SQL and Scala
No ratings yet
Introduction to Spark SQL and Scala
17 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
5 pages
Understanding BPTT in Deep Learning
No ratings yet
Understanding BPTT in Deep Learning
10 pages
Deep Learning for Context-Aware Recommendations
No ratings yet
Deep Learning for Context-Aware Recommendations
70 pages
Hybrid Recommender Systems Thesis
No ratings yet
Hybrid Recommender Systems Thesis
88 pages
Recommender Systems: Challenges & Solutions
No ratings yet
Recommender Systems: Challenges & Solutions
27 pages
Deep Learning for Personalized Recommendations
No ratings yet
Deep Learning for Personalized Recommendations
9 pages
Web Crawling Based Context Aware Recommender Syste
No ratings yet
Web Crawling Based Context Aware Recommender Syste
25 pages
e-Governance Initiatives in India 2014
No ratings yet
e-Governance Initiatives in India 2014
417 pages
Indian Agriculture Exports SCM Review
No ratings yet
Indian Agriculture Exports SCM Review
7 pages
Vegetable Supply Chain Challenges
No ratings yet
Vegetable Supply Chain Challenges
8 pages
Private Sector Aid for Smallholder Farmers
No ratings yet
Private Sector Aid for Smallholder Farmers
64 pages
EHR-Based Heart Failure Prediction Model
No ratings yet
EHR-Based Heart Failure Prediction Model
45 pages
FAO - Future of Food and Agriculture PDF
No ratings yet
FAO - Future of Food and Agriculture PDF
180 pages
Commodity Exchanges in Emerging Markets
No ratings yet
Commodity Exchanges in Emerging Markets
232 pages
Citrus Diseases: Gummosis, Scab, Canker
No ratings yet
Citrus Diseases: Gummosis, Scab, Canker
5 pages
Mahagrapes in India's Agri-Value Chains
No ratings yet
Mahagrapes in India's Agri-Value Chains
4 pages
Digitising Agrifood
No ratings yet
Digitising Agrifood
152 pages
Dairy Methane Emissions Engagement Guide
No ratings yet
Dairy Methane Emissions Engagement Guide
46 pages
Deloitte Tranformation From Agriculture To AgTech 2016 PDF
100% (1)
Deloitte Tranformation From Agriculture To AgTech 2016 PDF
24 pages
Value Creation in The Digital Agribusiness Network
100% (1)
Value Creation in The Digital Agribusiness Network
40 pages
1602 05568v1 PDF
No ratings yet
1602 05568v1 PDF
20 pages
eNRBM for EMR Analysis and Risk Stratification
No ratings yet
eNRBM for EMR Analysis and Risk Stratification
10 pages
Artificial Intelligence in The Agri-Food System: Rethinking Sustainable Business Models in The COVID-19 Scenario
No ratings yet
Artificial Intelligence in The Agri-Food System: Rethinking Sustainable Business Models in The COVID-19 Scenario
12 pages
Traditional vs Deep Learning Recommenders
No ratings yet
Traditional vs Deep Learning Recommenders
6 pages
Neural Networks in Information Retrieval
No ratings yet
Neural Networks in Information Retrieval
9 pages
Neural Networks in Information Retrieval
No ratings yet
Neural Networks in Information Retrieval
290 pages
09 Chapter 2hhj
No ratings yet
09 Chapter 2hhj
36 pages
Recommender Systems Architecture Guide
No ratings yet
Recommender Systems Architecture Guide
63 pages
Table of Bakery Industry Exhibits
No ratings yet
Table of Bakery Industry Exhibits
4 pages
Product Recommendation Beyond Collaborative Filtering - Welcome To The Twilight Zone!
No ratings yet
Product Recommendation Beyond Collaborative Filtering - Welcome To The Twilight Zone!
18 pages
09 Chapter 2hhj
No ratings yet
09 Chapter 2hhj
36 pages
Marketing Challenges for Modern Bakeries
No ratings yet
Marketing Challenges for Modern Bakeries
1 page
Prospectus: The World Class: Studied Anywhere, Valued Everywhere
No ratings yet
Prospectus: The World Class: Studied Anywhere, Valued Everywhere
64 pages
Marketing Challenges in Indian Bakeries
No ratings yet
Marketing Challenges in Indian Bakeries
1 page
History of String Theory Explained
No ratings yet
History of String Theory Explained
11 pages
Fractals and Chaos Theory Explained
No ratings yet
Fractals and Chaos Theory Explained
62 pages
Motivic Feynman Integrals Explained
No ratings yet
Motivic Feynman Integrals Explained
39 pages
Understanding Pictorial Drawings
No ratings yet
Understanding Pictorial Drawings
4 pages
Customer Segmentation through Clustering Analysis
100% (2)
Customer Segmentation through Clustering Analysis
20 pages
FE Math 2022
No ratings yet
FE Math 2022
11 pages
Classical Statistical Mechanics Overview
No ratings yet
Classical Statistical Mechanics Overview
23 pages
Transport of Intensity Equation A Tutorial
No ratings yet
Transport of Intensity Equation A Tutorial
98 pages
Fixed Point Iteration Examples
100% (1)
Fixed Point Iteration Examples
4 pages
Machine Learning for Oxygen in Silicon
No ratings yet
Machine Learning for Oxygen in Silicon
5 pages
ExploitingMultipathLowComplexityEqualis Kaya IEEE VTC
No ratings yet
ExploitingMultipathLowComplexityEqualis Kaya IEEE VTC
6 pages
Quantum Algorithms Lecture Notes
No ratings yet
Quantum Algorithms Lecture Notes
174 pages
Solid Waste Management in Robe, Ethiopia
No ratings yet
Solid Waste Management in Robe, Ethiopia
22 pages
Understanding P-Value in Statistics
No ratings yet
Understanding P-Value in Statistics
8 pages
Introduction to MATLAB Programming
No ratings yet
Introduction to MATLAB Programming
16 pages
Wave Optics: Principles and Phenomena
No ratings yet
Wave Optics: Principles and Phenomena
23 pages
Technology in 9th Grade Geometry
No ratings yet
Technology in 9th Grade Geometry
2 pages
Observation-Centric SORT: Rethinking SORT For Robust Multi-Object Tracking
No ratings yet
Observation-Centric SORT: Rethinking SORT For Robust Multi-Object Tracking
21 pages
Bridge Damage Detection via Vibration Analysis
No ratings yet
Bridge Damage Detection via Vibration Analysis
108 pages
Digital Image Processing Overview
No ratings yet
Digital Image Processing Overview
35 pages
HP Capital Structure Analysis Case Study
No ratings yet
HP Capital Structure Analysis Case Study
8 pages
Flexibility Matrix Method in Structural Analysis
No ratings yet
Flexibility Matrix Method in Structural Analysis
15 pages
Motivation's Impact on Employee Performance
No ratings yet
Motivation's Impact on Employee Performance
9 pages
MR Swing: Mean Reversion Trading System
100% (5)
MR Swing: Mean Reversion Trading System
30 pages
SS2 Computer Third Term Scheme
No ratings yet
SS2 Computer Third Term Scheme
10 pages
Understanding Physical Quantities and Units
No ratings yet
Understanding Physical Quantities and Units
3 pages
Regression Analysis in 3 Seconds
No ratings yet
Regression Analysis in 3 Seconds
21 pages
Ryan Schools XII Mathematics Exam 2024-25
No ratings yet
Ryan Schools XII Mathematics Exam 2024-25
10 pages
Numerical Analysis of Footing Pressure Distributions
No ratings yet
Numerical Analysis of Footing Pressure Distributions
1 page
Year 8 Financial Mathematics Worksheet
No ratings yet
Year 8 Financial Mathematics Worksheet
24 pages