Journal articles

A Network-aware and Partition-based Resource Management Scheme for Data Stream Processing

In this study, we propose a network-aware and partition-based resource management scheme to deal with the ever-changing network condition and data communication in stream processing.

Jun 8, 2019

With the increasing demand for data-driven decision making, there is an urgent need for processing geographically distributed data streams in real-time. The existing scheduling and resource manage ment schemes efficiently optimize stream processing performance with the awareness of resource, quality-of-service, and network traffic. However, the correlation between network delay and inter operator communication pattern is not well-understood. In this study, we propose a network-aware and partition-based resource management scheme to deal with the ever-changing network con dition and data communication in stream processing. The proposed approach applies operator fusion by considering the computational demand of individual operators and the inter-operator communica tion patterns. It maps the fused operators to the clustered hosts with the weighted shortest processing time heuristic. Meanwhile, we es tablished a 3-dimensional coordinate system for prompt reflection of the network condition, real-time traffic, and resource availability. We evaluated the proposed approach against two benchmarks, and the results demonstrate the efficiency in throughput and resource utilization. We also conducted a case study and implemented a prototype system supported by the proposed approach that aims to utilize the stream processing paradigm for pedestrian behavior anal ysis. The prototype application estimates walking time for a given path according to the real crowd traffic. The promising evaluation results of processing performance further illustrate the efficiency of the proposed approach.