Many people regard the recommendation system as a mysterious existence, and they feel that the recommendation system seems to know what we think. Netflix recommends movies to us, and Amazon recommends what products to buy. The recommendation system has been greatly improved and perfected since its early development to the present to continuously improve the user experience. Although many of the recommendation systems are very complex systems, the basic ideas behind them are still very simple.
What is the recommendation system?
The recommendation system is a subcategory of the information filtering system. It presents the user with items that he (or she) may be interested in based on the user's preferences and behaviors. The recommendation system will try to predict your preference for an item in order to recommend an item that you are likely to like.
How to build a recommendation system?
There are already many techniques to build a recommendation system, and I choose to introduce you to the simplest and most commonly used three. They are: first, collaborative filtering; second, content-based recommendation system; third, knowledge-based recommendation system. I will explain the weaknesses, potential pitfalls, and how to avoid them in each of the previous systems. Finally, I prepared a complete implementation of the recommendation system for you at the end of the article.
Collaborative filtering
Collaborative filtering is the first technology used in recommendation systems, and it is still the simplest and most effective. The process of collaborative filtering is divided into these three steps: at the beginning, collect user information, then use this to generate a matrix to calculate user associations, and finally make high-confidence recommendations. This technology is divided into two categories: one based on the user, and the other based on the items that make up the environment.
User-based collaborative filtering
User-based collaborative filtering is essentially looking for users with similar tastes to our target users. If Jean-Pierre and Jason have given similar ratings to several movies, then we think they are similar users, and then we can use Jean Pierre's ratings to predict Jason's unknown ratings. For example, if Jean-Pierre likes Star Wars 3: The Return of the Jedi and Star Wars V: The Empire Strikes Back, and Jason also likes the Jedi Return, then the Empire Strikes Back is a good recommendation for Jason. Generally speaking, you only need a small group of users similar to Jason to predict his evaluation.
In the following table, each row represents a user, and each column represents a movie. You can find similar users by simply looking for the similarity between rows in this matrix.
However, the implementation of user-based collaborative filtering has some problems as follows:
User preferences will change over time, and many recommendations generated by the recommendation system may become obsolete.
The greater the number of users, the longer it will take to generate recommendations.
Based on the user's sensitivity to trust attacks, this attack method refers to malicious persons bypassing the recommendation system to make specific items rank higher than other items. (Shilling Attack, which is a feature of collaborative filtering that generates recommendations based on neighbor preferences, maliciously injects a forged user model, pushes up or suppresses target rankings, and achieves an attack method that changes the results of the recommendation system)
Item-based collaborative filtering
The item-based collaborative filtering process is very simple. The similarity of two items is calculated based on the score given by the user. Let's go back to the example of Jean-Pierre and Jason. Both of them like "Return of the Jedi" and "The Empire Strikes Back". Therefore, we can infer that most users who liked the first movie might also like the second movie. Therefore, for the third person Larry who likes "Return of the Jedi", the recommendation of "The Empire Strikes Back" will be meaningful.
So, the similarity here is calculated based on the columns instead of the rows (different from what you see in the user-movie matrix above). Item-based collaborative filtering is often favored because it does not have any disadvantages of user-based collaborative filtering. First, the items in the system (the items are movies in this example) will not change over time, so recommendations will become more and more relevant. In addition, there are usually fewer items in the recommendation system than the user, which reduces the recommendation processing time. Finally, considering that no user can change the items in the system, this system is more difficult to be deceived or attacked.
Content-based recommendation system
In content-based recommendation systems, the descriptive attributes of elements are used to form recommendations. The term "Content" refers to these descriptions. For example, based on Sophie's music listening history, the recommendation system noticed that she seemed to like country music. Therefore, the system can recommend songs of the same or similar genres. A more sophisticated recommendation system can discover the relationship between multiple attributes, thereby generating higher-quality recommendations. For example, the Music Genome Project classifies each song in the database according to 450 different attributes. This project provides technical support for Pandor's song recommendation. (Pandor provides online music streaming services, similar to Spolify)
Knowledge-based recommendation system
The knowledge-based recommendation system is especially suitable when the purchase frequency of items is very low. For example, houses, cars, financial services and even expensive luxury goods. In this case, the recommendation process often lacks product evaluation. A knowledge-based recommendation system does not use evaluation to make recommendations. On the contrary, the recommendation process is based on the similarity between the customer's needs and the product description, or the use of constraints on the needs of specific users. This makes this type of system unique because it allows customers to specify exactly what they want. Regarding constraints, when applied, they are mostly implemented by experts in the field who know how to implement these constraints from the beginning. For example, when a user clearly points out to find a family house within a specific price range, the system must take into account the constraints specified by the user.
Cold start problem in recommended system
One of the main problems in recommender systems is that the number of reviews available initially is relatively small. What should we do when the new user has not rated the movie, or a new movie is added to the system? In this case, it will be more difficult to apply the traditional collaborative filtering model. Although content-based and knowledge-based recommendation algorithms are more robust than collaborative filtering when faced with cold-start problems, content-based and knowledge-based recommendation algorithms are not always available. Therefore, some new methods, such as hybrid systems, have been designed to solve this problem.
Hybrid recommendation system
The different types of recommendation systems introduced in the article so far have their own advantages and disadvantages, and they give recommendations based on different data. Some recommendation systems, such as knowledge-based recommendation systems, are most effective in cold-start environments where the amount of data is limited. Other systems, such as collaborative filtering, are more effective when large amounts of data are available. In most cases, the data is diversified, and we can flexibly adopt multiple methods for the same task. Therefore, we can combine recommendations from a variety of different technologies to improve the recommendation quality of the entire system. Many combinatorial technologies have been explored, including:
Weighting: each algorithm in the recommendation system is given a different weight, which makes the recommendation bias towards a certain algorithm
Crossover: display all the recommended results together, without emphasis
Enhancement: The recommendation of one system will be used as the input of the next system, looping until the last system
Switch: randomly choose a recommended method
One of the most famous examples of hybrid recommendation systems is the Netflix Price algorithm competition held from 2006 to 2009. The goal of this competition is to improve the algorithm accuracy of Netflix's movie recommendation system Cinematch by at least 10%. Bellkor's Pragmatix Chaos team used a solution that integrated 107 different algorithms to increase the recommendation accuracy of the Cinematch system by 10.06%, and finally received a $1 million bonus. You may be curious about the accuracy in this example, which is actually a measure of how close the predicted score of the movie is to the actual score.
Recommendation system and AI?
Recommendation systems are often used in the field of artificial intelligence. The ability of the recommendation system-insight, the ability to predict events and the ability to highlight relevance are often used in artificial intelligence. On the other hand, machine learning techniques are often used to implement recommendation systems. For example, at Arcbees, we used neural networks and data from IMdB to successfully build a movie rating prediction system. Neural networks can quickly perform complex tasks and easily process large amounts of data. By using the list of movies as the input of the neural network and comparing the output of the neural network with user ratings, the neural network can learn rules by itself to predict the future ratings of a particular user.
Expert advice
In the many materials I have read, I noticed that there are two very important suggestions that are often mentioned by experts in the field of recommender systems. First, make recommendations based on the items paid by the user. When a user has the intention to buy, you can conclude that his evaluation must be more relevant and accurate. Second, using multiple algorithms is always better than improving one algorithm. The Netflix Prize competition is a good example.
Implement an item-based recommendation system
The following code demonstrates how easy and fast it is to implement an item-based recommendation system. The language used is Python, and Pandas and Numpy, the two most popular libraries in the field of recommendation systems, are used. The data used is movie ratings, and the data set comes from MovieLens.
Step 1: Find similar movies
1. Read the data
import pandasaspd
import numpyasnp
ratings_cols=['user_id','movie_id','rating']
ratings=pd.read_csv('u.data',sep='t',names=ratings_cols,usecols=range(3))
movies_cols=['movie_id','title']
movies=pd.read_csv('u.item',sep='|',names=movies_cols,usecols=range(2))
ratings=pd.merge(ratings,movies)
2. Construct the user's movie matrix
movieRatings = ratings.pivot_table(index=['user_id'],columns=['title'],values='rating')
3. Select a movie and generate the similarity between this movie and all other movies
starWarsRatings=movieRatings['Star Wars (1977)']
similarMovies=movieRatings.corrwith(starWarsRatings)
similarMovies=similarMovies.dropna()
df=pd.DataFrame(similarMovies)
4. Remove unpopular movies to avoid inappropriate recommendations
ratingsCount=100
movieStats=ratings.groupby('title').agg({'rating':[np.size,np.mean]})
popularMovies=movieStats['rating']['size']>=ratingsCount
movieStats[popularMovies].sort_values([('rating','mean')],ascending=False)[:15]
5. Extract popular movies similar to the target movie
df=movieStats[popularMovies].join(pd.DataFrame(similarMovies,columns=['similarity']))
df.sort_values(['similarity'],ascending=False)[:15]
Step 2: Make recommendations based on all user ratings
1. Generate the similarity between every two movies, and only keep the similarity of popular movies
userRatings=ratings.pivot_table(index=['user_id'],columns=['title'],values='rating')
corrMatrix=userRatings.corr(method='pearson',min_periods=100)
2. For each movie that the user has watched and rated, generate recommendations (here we choose user 0)
myRatings=userRatings.loc[0].dropna()
simCandidates=pd.Series()
foriinrange(0,len(myRatings.index)):
#Retrieve movies similar to the rated movies
sims=corrMatrix[myRatings.index[i]].dropna()
#Measure the similarity of this movie by the user's rating
sims=sims.map(lambdax:x *myRatings[i])
#Put the results into the similarity candidate list
simCandidates=simCandidates.append(sims)
simCandidates.sort_values(inplace=True,ascending=False)
3. Sum the similarities of all the same movies
simCandidates=simCandidates.groupby(simCandidates.index).sum()
simCandidates.sort_values(inplace=True,ascending=False)
4. Only keep the movies that the user hasn't watched
filteredSims = simCandidates.drop(myRatings.index)
How to go further?
In the above example, Pandas and our CPU are sufficient to process the MovieLens data set. However, when the data set becomes larger, the processing time will also become longer. Therefore, you should switch to solutions with more powerful processing capabilities, such as Spark or MapReduce.
Power cord can be used in wide range of industries. Home appliances, charging equipment, lighting, Gym appliance, computer, tool, pump, compressor, medical equipment, and so on. All products which are driven by electricity need a power cord.DC (direct Current) power cord is used to the applicance with lower voltage mostly, so safety requirement is less stringent.
DC Power Cord, power cable, DC cable, power connector
ETOP WIREHARNESS LIMITED , https://www.wireharnessetop.com