skip to main content
research-article

WTF: the who to follow service at Twitter

Published: 13 May 2013 Publication History
  • Get Citation Alerts
  • Abstract

    WTF ("Who to Follow") is Twitter's user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural overview and shares lessons we learned in building and running the service over the past few years. Particularly noteworthy was our design decision to process the entire Twitter graph in memory on a single server, which significantly reduced architectural complexity and allowed us to develop and deploy the service in only a few months. At the core of our architecture is Cassovary, an open-source in-memory graph processing engine we built from scratch for WTF. Besides powering Twitter's user recommendations, Cassovary is also used for search, discovery, promoted products, and other services as well. We describe and evaluate a few graph recommendation algorithms implemented in Cassovary, including a novel approach based on a combination of random walks and SALSA. Looking into the future, we revisit the design of our architecture and comment on its limitations, which are presently being addressed in a second-generation system under development.

    References

    [1]
    L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. WSDM, 2011.
    [2]
    B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. VLDB, 2010.
    [3]
    Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. VLDB, 2010.
    [4]
    R. Chen, M. Yang, X. Weng, B. Choi, B. He, and X. Li. Improving large graph processing on partitioned graphs in the cloud. SoCC, 2012.
    [5]
    F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. KDD, 2009.
    [6]
    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
    [7]
    J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. MAPREDUCE Workshop, 2010.
    [8]
    D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.
    [9]
    A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience. VLDB, 2009.
    [10]
    M. Hasan and M. Zaki. A survey of link prediction in social networks. In C. Aggarwal, editor, Social Network Data Analytics. Springer, 2011.
    [11]
    J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. VLDB, 2011.
    [12]
    G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96--129, 1998.
    [13]
    J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
    [14]
    R. Kohavi, R. Henne, and D. Sommerfield. Practical guide to controlled experiments on the web: Listen to your customers not to the HiPPO. SIGKDD, 2007.
    [15]
    G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy. The unified logging infrastructure for data analytics at Twitter. VLDB, 2012.
    [16]
    F. Leibert, J. Mannix, J. Lin, and B. Hamadani. Automatic management of partitioned, replicated search services. SoCC, 2011.
    [17]
    R. Lempel and S. Moran. SALSA: The Stochastic Approach for Link-Structure Analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001.
    [18]
    J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.
    [19]
    J. Lin. MapReduce is good enough? If all you have is a hammer, throw away everything that's not a nail! arXiv:1209.2191, 2012.
    [20]
    J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan & Claypool Publishers, 2010.
    [21]
    J. Lin and A. Kolcz. Large-scale machine learning at Twitter. SIGMOD, 2012.
    [22]
    J. Lin and M. Schatz. Design patterns for efficient graph algorithms in MapReduce. MLG Workshop, 2010.
    [23]
    Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. GraphLab: A new parallel framework for machine learning. UAI, 2010.
    [24]
    Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. VLDB, 2012.
    [25]
    G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. SIGMOD, 2010.
    [26]
    J. Mondal and A. Deshpande. Managing large dynamic graphs efficiently. SIGMOD, 2012.
    [27]
    C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.
    [28]
    L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1999.
    [29]
    L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
    [30]
    R. Zadeh and A. Goel. Dimension independent similarity computation. arXiv:1206.2082, 2012.
    [31]
    Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. SoCC, 2011.

    Cited By

    View all
    • (2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
    • (2024)Modeling disinformation networks on Twitter: structure, behavior, and impactApplied Network Science10.1007/s41109-024-00610-w9:1Online publication date: 26-Jan-2024
    • (2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
    • Show More Cited By

    Index Terms

    1. WTF: the who to follow service at Twitter

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '13: Proceedings of the 22nd international conference on World Wide Web
      May 2013
      1628 pages
      ISBN:9781450320351
      DOI:10.1145/2488388

      Sponsors

      • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
      • CGIBR: Comite Gestor da Internet no Brazil

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. graph processing
      2. hadoop
      3. link prediction

      Qualifiers

      • Research-article

      Conference

      WWW '13
      Sponsor:
      • NICBR
      • CGIBR
      WWW '13: 22nd International World Wide Web Conference
      May 13 - 17, 2013
      Rio de Janeiro, Brazil

      Acceptance Rates

      WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
      • (2024)Modeling disinformation networks on Twitter: structure, behavior, and impactApplied Network Science10.1007/s41109-024-00610-w9:1Online publication date: 26-Jan-2024
      • (2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
      • (2023)Evaluating graph neural networks for link predictionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666291(3853-3866)Online publication date: 10-Dec-2023
      • (2023)Cross-community shortcut detection based on network representation learning and structural featuresIntelligent Data Analysis10.3233/IDA-21651327:3(709-732)Online publication date: 18-May-2023
      • (2023)Efficient Dynamic Weighted Set Sampling and Its ExtensionProceedings of the VLDB Endowment10.14778/3617838.361784017:1(15-27)Online publication date: 4-Dec-2023
      • (2023)Estimating Single-Node PageRank in Õ (min{d, √m}) TimeProceedings of the VLDB Endowment10.14778/3611479.361150016:11(2949-2961)Online publication date: 24-Aug-2023
      • (2023)Distributed Graph Embedding with Information-Oriented Random WalksProceedings of the VLDB Endowment10.14778/3587136.358714016:7(1643-1656)Online publication date: 8-May-2023
      • (2023)FPLV: Enhancing recommender systems with fuzzy preference, vector similarity, and user community for rating predictionPLOS ONE10.1371/journal.pone.029062218:8(e0290622)Online publication date: 28-Aug-2023
      • (2023)RecQR: Using Recommendation Systems for Query Reformulation to correct unseen errors in spoken dialog systemsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610235(1019-1022)Online publication date: 14-Sep-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media