skip to main content
research-article

Sampling-bias-corrected neural modeling for large corpus item recommendations

Published: 10 September 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Many recommendation systems retrieve and score items from a very large corpus. A common recipe to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, we consider a modeling framework using two-tower neural net, with one of the towers (item tower) encoding a wide variety of item content features. A general recipe of training such two-tower models is to optimize loss functions calculated from in-batch negatives, which are items sampled from a random mini-batch. However, in-batch loss is subject to sampling biases, potentially hurting model performance, particularly in the case of highly skewed distribution. In this paper, we present a novel algorithm for estimating item frequency from streaming data. Through theoretical analysis and simulation, we show that the proposed algorithm can work without requiring fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale neural retrieval system for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus with tens of millions of videos. We demonstrate the effectiveness of sampling-bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the neural retrieval system leads to improved recommendation quality for YouTube.

    References

    [1]
    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
    [2]
    Alexandr Andoni and Piotr Indyk. 2008. Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Commun. ACM 51, 1 (Jan. 2008), 117--122.
    [3]
    Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. 2017. A Generic Coordinate Descent Framework for Learning from Implicit Feedback. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). 1341--1350.
    [4]
    Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).
    [5]
    Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713--722.
    [6]
    Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). ACM, New York, NY, USA, 46--54.
    [7]
    Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018. 589--598. http://proceedings.mlr.press/v80/blanc18a.html
    [8]
    Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: A Toolkit for Feature-based Collaborative Filtering. J. Mach. Learn. Res. 13, 1 (Dec. 2012), 3619--3622. http://dl.acm.org/citation.cfm?id=2503308.2503357
    [9]
    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide Deep Learning for Recommender Systems. arXiv:1606.07792 (2016). http://arxiv.org/abs/1606.07792
    [10]
    Edith Cohen and David D. Lewis. 1997. Approximating Matrix Multiplication for Pattern Recognition Tasks. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '97). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 682--691. http://dl.acm.org/citation.cfm?id=314161.314415
    [11]
    Graham Cormode and S. Muthukrishnan. 2005. An Improved Data Stream Summary: The Count-min Sketch and Its Applications. J. Algorithms 55, 1 (April 2005), 58--75.
    [12]
    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. New York, NY, USA.
    [13]
    Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In NIPS.
    [14]
    Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential User-based Recurrent Neural Network Recommendations. In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys '17). ACM, New York, NY, USA, 152--160.
    [15]
    John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12 (July 2011), 2121--2159. http://dl.acm.org/citation.cfm?id=1953048.2021068
    [16]
    Wikimedia Foundation, {n.d.}. Wikimedia Downloads. https://dumps.wikimedia.org/
    [17]
    Daniel Gillick, Alessandro Presta, and Gaurav Singh Tomar. 2018. End-to-End Retrieval in Continuous Space. CoRR abs/1811.08008 (2018). arXiv:1811.08008 http://arxiv.org/abs/1811.08008
    [18]
    Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (Dec. 2015), 19 pages.
    [19]
    Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
    [20]
    Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based Fast Inner Product Search. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Arthur Gretton and Christian C. Robert (Eds.), Vol. 51. PMLR, Cadiz, Spain, 482--490. http://proceedings.mlr.press/v51/guo16a.html
    [21]
    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173--182.
    [22]
    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In The International Conference on Learning Representations (ICLR 2016).
    [23]
    Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In 2008 Eighth IEEE International Conference on Data Mining. 263--272.
    [24]
    Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufman, Balint Miklos, Greg Corrado, Andrew Tomkins, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2016). https://arxiv.org/pdf/1606.04870vl.pdf
    [25]
    Noam Koenigstein, Parikshit Ram, and Yuval Shavitt. 2012. Efficient Retrieval of Recommendations in a Matrix Factorization Framework. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM '12). ACM, New York, NY, USA, 535--544.
    [26]
    Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, and John Anderson. 2019. Efficient Training on Very Large Corpora via Gramian Estimation. In 7th International Conference on Learning Representations.
    [27]
    Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, and Ed H. Chi. 2019. SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-task Learning. In AAAI 2019. http://www.jiaqima.com/papers/SNR.pdf
    [28]
    Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 1930--1939.
    [29]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 3111--3119. http://dl.acm.org/citation.cfm?id=2999792.2999959
    [30]
    Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In AISTATS'05. 246--252.
    [31]
    Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning Text Similarity with Siamese Recurrent Networks. In Rep4NLP@ACL.
    [32]
    The Netflix Prize. 2012. The Netflix Prize. http://www.netflixprize.com/.
    [33]
    S. Rendle. 2010. Factorization Machines. In 2010 IEEE International Conference on Data Mining. 995--1000.
    [34]
    Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. DropoutNet: Addressing Cold Start in Recommender Systems. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4957--4966. http://papers.nips.cc/paper/7081-dropoutnet-addressing-cold-start-in-recommender-systems.pdf
    [35]
    Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD'17 (ADKDD'17). ACM, New York, NY, USA, Article 12, 7 pages.
    [36]
    Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N Holtmann-Rice, David Simcha, and Felix X Yu. 2017. Multiscale Quantization for Fast Similarity Search. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5749--5757. http://papers.nips.cc/paper/7157-multiscale-quantization-for-fast-similarity-search.pdf
    [37]
    Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-Yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning Semantic Textual Similarity from Conversations. In Proceedings of The Third Workshop on Representation Learning for NLP. Association for Computational Linguistics, Melbourne, Australia, 164--174. https://www.aclweb.org/anthology/W18-3022
    [38]
    Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 1079--1088.

    Cited By

    View all
    • (2024)A comparison of embedding aggregation strategies in drug–target interaction predictionBMC Bioinformatics10.1186/s12859-024-05684-y25:1Online publication date: 6-Feb-2024
    • (2024)Discovering Personalized Semantics for Soft Attributes in Recommender Systems Using Concept Activation VectorsACM Transactions on Recommender Systems10.1145/36586752:4(1-37)Online publication date: 16-Apr-2024
    • (2024)CMCLRec: Cross-modal Contrastive Learning for User Cold-start Sequential RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657839(1589-1598)Online publication date: 10-Jul-2024
    • Show More Cited By

    Index Terms

    1. Sampling-bias-corrected neural modeling for large corpus item recommendations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems
      September 2019
      635 pages
      ISBN:9781450362436
      DOI:10.1145/3298689
      This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 September 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information retrieval
      2. neural networks
      3. recommender systems

      Qualifiers

      • Research-article

      Conference

      RecSys '19
      RecSys '19: Thirteenth ACM Conference on Recommender Systems
      September 16 - 20, 2019
      Copenhagen, Denmark

      Acceptance Rates

      RecSys '19 Paper Acceptance Rate 36 of 189 submissions, 19%;
      Overall Acceptance Rate 254 of 1,295 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)580
      • Downloads (Last 6 weeks)88
      Reflects downloads up to 28 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A comparison of embedding aggregation strategies in drug–target interaction predictionBMC Bioinformatics10.1186/s12859-024-05684-y25:1Online publication date: 6-Feb-2024
      • (2024)Discovering Personalized Semantics for Soft Attributes in Recommender Systems Using Concept Activation VectorsACM Transactions on Recommender Systems10.1145/36586752:4(1-37)Online publication date: 16-Apr-2024
      • (2024)CMCLRec: Cross-modal Contrastive Learning for User Cold-start Sequential RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657839(1589-1598)Online publication date: 10-Jul-2024
      • (2024)Scaling Sequential Recommendation Models with TransformersProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657816(1567-1577)Online publication date: 10-Jul-2024
      • (2024)IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFTProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657725(687-697)Online publication date: 10-Jul-2024
      • (2024)Attribute Simulation for Item Embedding Enhancement in Multi-interest RecommendationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635841(482-491)Online publication date: 4-Mar-2024
      • (2024)Towards Graph Foundation Models for PersonalizationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651980(1798-1802)Online publication date: 13-May-2024
      • (2024)Personalized Audiobook Recommendations at Spotify Through Graph Neural NetworksCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648339(403-412)Online publication date: 13-May-2024
      • (2024)OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest SearchCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648309(121-130)Online publication date: 13-May-2024
      • (2024)Does Negative Sampling Matter? a Review With Insights Into its Theory and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.337147346:8(5692-5711)Online publication date: Aug-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media