## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Learning Based Distributed Tracking

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.2040-2050, (2020)

EI

Full Text

Weibo

Abstract

Inspired by the great success of machine learning in the past decade, people have been thinking about the possibility of improving the theoretical results by exploring data distribution. In this paper, we revisit a fundamental problem called Distributed Tracking (DT) under an assumption that the data follows a certain (known or unknown) d...More

Code:

Data:

Introduction

- The great success of machine learning in the past decade has proven the assumption that data in practice follows certain patterns (e.g., The DT problem setting.
- In the DT problem, there is a coordinator and k players (a.k.a. sites); between each player and the coordinator, there is a two-way communication channel.
- There is at most one player having its counter increased by one, conceptually representing an item arrives at the player.
- The job of the coordinator is to raise an alarm at the exact moment that the N -th item arrives, equivalently, the moment that the sum of the counters of all the players reaches.
- The efficiency of an algorithm for solving the DT problem is measured by the communication cost, i.e., the total number of messages that received and sent by the coordinator, where each message can only carry at most O(1) words

Highlights

- The great success of machine learning in the past decade has proven the assumption that data in practice follows certain patterns (e.g., The Distributed Tracking problem setting
- The experimental results show that the communication cost of our algorithms is as least as
- The efficiency of an algorithm for solving the Distributed Tracking problem is measured by the communication cost, i.e., the total number of messages that received and sent by the coordinator, where each message can only carry at most O(1) words
- When the datasets are generated from some distribution, our algorithms perform consistently better than the CMY algorithms, regardless of the distribution
- Our algorithms are equipped with backup mechanism that guarantees comparable performance as the data-independent CMY algorithm when the distribution fluctuates
- We experimentally evaluate our algorithms against the state-of-the-art competitors, using both real and synthetic datasets

Results

**Results on the Real Datasets**

Figure 2-4 illustrate the results of the algorithms on three real datasets.- All plots in the figures refer to the percentage of untracked items as a function of the number of used communications.
- StcSlk-KwnDst and DynSlk-KwnDst perform the best, with DynSlk-KwnDst slightly better in most cases
- This is as expected because they know the frequency information and have more knowledge than the other algorithms.
- Figure 5 shows the efficiency of the algorithms under various distributions
- It plots the percentage of untracked items as a function of the number of communications.
- When the distribution is stable, the static slacks capture more accurately the number of items a player will receive

Conclusion

- The paper exploits the counter increment distribution and presents four data-dependent algorithms that utilize knowledge on the data distribution.
- All the algorithms have communication cost O(k log log.
- +k log log k δ δ is a parameter controls the failure probability, improving the state-of-the-art O(k log.
- The authors experimentally evaluate the algorithms against the state-of-the-art competitors, using both real and synthetic datasets.
- The authors' experimental results show the efficiency and robustness of the algorithms

- Table1: Frequently used notations we mean here is the Multinomial Distribution of the counter increments, more specifically, the probability distribution of a counter increment happening in the players. For the example shown in
- Table2: Dataset Characteristics (M = 106 and K = 103)

Related work

- The distributed tracking (DT) problem has been well studied in terms of both upper bounds and lower bounds, since it was first proposed. Prior to the CMY algorithm, an uniform slack (called UniSlk) algorithm was proposed by Cormode et al [5]. The communication cost of

UniSlk is bounded by O (k 2 log The design

UniSlk is based on the observation that for N items arriving at k players, there must be at least one of player received N /k items. The

UniSlk algorithm runs in rounds. At the at the beginning of the first round, the coordinator broadcasts a slack N /k to each players.

The player notifies the coordinator when its counter exceeds N /k.

Funding

- Junhao Gan is supported by Australian Research Council (ARC) DECRA DE190101118

Reference

- Anders Aamand, Piotr Indyk, and Ali Vakilian. 2019. (Learned) Frequency Estimation Algorithms under Zipfian Distribution. CoRR abs/1908.05198 (2019). arXiv:1908.05198
- Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. 2009. Explorationexploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410, 19 (2009), 1876–1902.
- Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings.
- Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Mathematics 3, 1 (2006), 79–127.
- Graham Cormode. 2013. The continuous distributed monitoring model. SIGMOD Record 42, 1 (2013), 5–14.
- Graham Cormode, Minos N. Garofalakis, S. Muthukrishnan, and Rajeev Rastogi.
- Graham Cormode, S. Muthukrishnan, and Ke Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7, 2 (2011), 21:1–21:20.
- Nikos Giatrakos, Antonios Deligiannakis, Minos N. Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In Proceedings of the International Conference on Management of Data, SIGMOD, Scottsdale, AZ, USA, May 20-24, 2012. 265–276.
- Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 201Learning-Based Frequency Estimation Algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
- Zengfeng Huang, Ke Yi, and Qin Zhang. 2019. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks. Algorithmica 81, 6 (2019), 2222–2243.
- Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006. 289–300.
- Elias B. Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 6348–6358.
- David Kotz, Tristan Henderson, Ilya Abyzov, and Jihwang Yeo. 2009. CRAWDAD
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018. 489–504.
- Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. 462–471.
- Miao Qiao, Junhao Gan, and Yufei Tao. 20Range Thresholding on Streams. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 571–582.
- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Annual Conference on Neural Information Processing Systems (NeuIPS), December 7-12, 2015, Montreal, Quebec, Canada. 2692–2700.
- 1. Therefore, f (t) ≤ t +

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn