- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to build a user - item recommendation engine based on the Stackoverflow favourite vote questions.
The objective:To build a webpage / IDE plugin where the user receives his top N recommended questions based on:
- his previous favourite votes on Stackoverflow
- the programming language he is currently using (this will be a filter using the question tag, ex. only # java questions)
The input data:I am using the Stackexchange data dump which can be found here: https://archive.org/download/stackexchange stackexchange directory listing; from there I've extracted the data that I thought would be useful:
Votes table (each User - Question pair represents a favourite vote for the question from the user):
UserId - QuestionId
Tags table:
QuestionId - TagId
I also have a lot details about each user/question which would make sense in a content-based approach. The only content I used so far are the question tags.
Problems/Properties of the data:- the data consists of implicit feedback -> a user either marked a question as favourite or he didn't (binary problem 0/1)
- the data set is quite large, training and evaluating the a model takes a lot of time (votes CSV file has a few GB)
Progress so far:So far I've tried a few different approaches, most of them are some sort of collaborative filtering:
- the first thing I tried was using cosine similarity to get top N question - question recommendations, just to test if the results are better than random
- then I've tried using Spark's Alternating Least Squares Matrix Factorisation model but the results were also mediocre, because I am using implicit feedback data and the ALS technique is built for Explicit Data
- I've also tried using another MF model with Bayesian Personalised Ranking loss function, which is better suited for implicit data. The library I used here is LightFM and the metric for evaluation is ROC AUC https://www.kaggle.com/iancuv/lightfm-demo?scriptVersionId=3670161 https://www.kaggle.com/iancuv/lightfm-demo?scriptVersionId=3670161
Open questions / suggestions:Do you have any suggestions of some other approaches I should use?
How would you approach this problem?
What preprocessing of the data makes sense to achieve better results?
Is any of the mentioned techniques a good choice for this problem?
Would a only content-based approach make sense?
If yes, how can I improve the results?
I should also mention ( you probably figured it out ) that I'm a CS student, new to the AI/machine learning field. The only applications I've done in the past are related to either simple regression or classification, nothing as complicated as implicit feedback recommendation systems. I know the problem/questions I've mentioned above are very specific but any help is very much appreciated.
Useful links:http://lyst.github.io/lightfm/docs/home.html Welcome to LightFM's documentation! — LightFM 1.14 documentation
https://spark.apache.org/docs/latest/api/python/ Welcome to Spark Python API Docs! — PySpark master documentation
https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/ Alternating Least Squares – Data Science Made Simpler
https://arxiv.org/pdf/1205.2618.pdf https://arxiv.org/pdf/1205.2618.pdf - Bayesian Personalised Ranking MF http://stanford.edu/~rezab/classes/cme323/S15/notes/lec14.pdf
- Tags:
- Python
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please note that this forum is primarily intended to address problems & information related to Intel AI frameworks, tools and other offerings like Intel® AI DevCloud.
However, since you have reached out to us, would recommend the following links for reference:
https://www.aaai.org/Papers/Workshops/2007/WS-07-08/WS07-08-002.pdf https://www.aaai.org/Papers/Workshops/2007/WS-07-08/WS07-08-002.pdf
http://ijcsit.com/docs/Volume%207/vol7issue4/ijcsit2016070424.pdf http://ijcsit.com/docs/Volume%207/vol7issue4/ijcsit2016070424.pdf
https://getstream.io/blog/best-practices-feed-personalization/ https://getstream.io/blog/best-practices-feed-personalization/
https://blog.statsbot.co/recommendation-system-algorithms-ba67f39ac9a3 https://blog.statsbot.co/recommendation-system-algorithms-ba67f39ac9a3
https://www.marutitech.com/recommendation-engine-benefits/ https://www.marutitech.com/recommendation-engine-benefits/
Thanks & Regards,
Sandhiya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are closing this discussion since we do not handle these types of question in our community.
If you have a question about Intel specific AI frameworks/tools, we would be happy to address your queries.
Thanks & Regards,
Sandhiya

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page