skip to main content
article

A simple algorithm for finding frequent elements in streams and bags

Published: 01 March 2003 Publication History
  • Get Citation Alerts
  • Abstract

    We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algorithm for finding a majority element, for identifying a set of at most 1/θ items that includes, possibly among others, all items with frequency greater than θ.

    References

    [1]
    Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the ACM Symposium on Theory of Computing. ACM, New York.
    [2]
    Beyer, K. S. and Ramakrishnan, R. 1999. Bottom-up computation of sparse and iceberg cubes. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York.
    [3]
    Charikar, M., Chen, K., and Farach-Colton, M. 2002. Finding frequent items in data streams. In ICALP 2002. Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, Germany, 693--703.
    [4]
    Demaine, E. D., Munro, J. I., and Lopez-Ortiz, A. 2002. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms (ESA). Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, Germany.
    [5]
    Estan, C. and Varghese, G. 2001. New directions in traffic measurement and accounting. In Proceedings of the SIGCOMM Internet Measurement Workshop. ACM, New York.
    [6]
    Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., and Ullman, J. 1998. Computing iceberg queries efficiently. In Proceedings of the 24th International Conference on Very Large Data Bases, VLDB. Morgan-Kaufmann, San Mateo, Calif., 299--310.
    [7]
    Gibbons, P. B. and Matias, Y. 1999. Synopsis data structures for massive data sets. In DIMACS: Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on Eternal Memory Algorithms and Visualization, vol. A. AMS, Providence, R.I., 39--70.
    [8]
    Pan, R., Breslau, L., Prabhakar, B., and Shenker, S. Approximate fairness through differential dropping. preprint, 2001.

    Cited By

    View all
    • (2024)ComPipe: A Novel Flow Placement and Measurement Algorithm for Programmable Composite PipelinesElectronics10.3390/electronics1306102213:6(1022)Online publication date: 8-Mar-2024
    • (2024)CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload ProtectionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330185235:4(616-633)Online publication date: 1-Apr-2024
    • (2024)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: May-2024
    • Show More Cited By

    Index Terms

    1. A simple algorithm for finding frequent elements in streams and bags

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 28, Issue 1
      March 2003
      99 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/762471
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 March 2003
      Published in TODS Volume 28, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Data stream
      2. frequent elements

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)60
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ComPipe: A Novel Flow Placement and Measurement Algorithm for Programmable Composite PipelinesElectronics10.3390/electronics1306102213:6(1022)Online publication date: 8-Mar-2024
      • (2024)CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload ProtectionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330185235:4(616-633)Online publication date: 1-Apr-2024
      • (2024)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: May-2024
      • (2024)State Disaggregation for Dynamic Scaling of Network FunctionsIEEE/ACM Transactions on Networking10.1109/TNET.2023.328256232:1(81-95)Online publication date: Mar-2024
      • (2024) Randomized counter-based algorithms for frequency estimation over data streams in space Theoretical Computer Science10.1016/j.tcs.2023.114317984(114317)Online publication date: Mar-2024
      • (2024)WavingSketch: an unbiased and generic sketch for finding top-k items in data streamsThe VLDB Journal10.1007/s00778-024-00869-6Online publication date: 29-Jul-2024
      • (2023)LONE SAMPLERProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26014(8413-8420)Online publication date: 7-Feb-2023
      • (2023)Cache-Efficient Top-k Aggregation over High Cardinality Large DatasetsProceedings of the VLDB Endowment10.14778/3636218.363622217:4(644-656)Online publication date: 1-Dec-2023
      • (2023)Panakos: Chasing the Tails for Multidimensional Data StreamsProceedings of the VLDB Endowment10.14778/3583140.358314716:6(1291-1304)Online publication date: 20-Apr-2023
      • (2023)Blast from the Past: Least Expected Use (LEU) Cache Replacement with Statistical HistoryProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595267(124-136)Online publication date: 6-Jun-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media