A General Framework for Efficient Continuous Multidimensional Top-k Query Processing in Sensor Networks

ABSTRACT:

Top-k query has long been a crucial problem in multiple fields of computer science, such as data processing and information retrieval. In emerging cyber-physical systems, where there can be a large number of users searching information directly into the physical world, many new challenges arise for top-k query processing. From the client’s perspective, users may request different sets of information, with different priorities and at different times. Thus, top-k search should not only be multidimensional, but also be across time domain. From the system’s perspective, data collection is usually carried out by small sensing devices. Unlike the data centers used for searching in the cyber-space, these devices are often extremely resource constrained and system efficiency is of paramount importance. In this paper, we develop a framework that can effectively satisfy demands from the two aspects. The sensor network maintains an efficient dominant graph data structure for data readings. A simple top-k extraction algorithm is used for user query processing and two schemes are proposed to further reduce communication cost. Our methods can be used for top-k query with
any linear convex query function. The framework is adaptive enough to incorporate some advanced features; for example, we show how approximate queries and data aging can be applied. To the best of our knowledge, this is the first work for continuous multidimensional top-k query processing in sensor networks. Simulation results show that our schemes can reduce the total communication cost by up to 90 percent, compared with a centralized scheme or a straightforward extension from previous top-k algorithm on 1D sensor data.
EXISTING SYSTEM:
Many studies have explored various data aggregation techniques for query processing in sensor networks in order to reduce communication cost and energy consumption. Some studies strive to design energy-efficient processing methods for top-k queries. Silberstein et al.  Proposed to use the samples of past sensor readings for query optimization and developed a series of top-k query planning algorithms with linear programming. Zeinalipour et al.  Proposed to use nonuniform thresholds on the queried attribute in order to minimize the number of tuples transferred toward the querying node. Similarly, Wu et al. proposed to use a
filter at sensor nodes to suppress unnecessary sensor readings. Aiming at continuous top-k monitoring, ZeinalipourYazti et al. proposed MINT , a framework that maintains the complete results of a query and calculate the upper bounds, so as to reduce the cost of future queries by pruning phase. Instead of using a shortest-path tree for routing, XP  employs the cluster tree as the routing infrastructure, in order to aggregate many data points locally before further transmissions. Chen et al. strived to return k highest data points generated within a specified time interval.
PROPOSED SYSTEM:
We propose a novel framework that efficiently realizes DG in a distributed sensor network. The framework supports top-k queries for arbitrary user defined linear convex query functions over multidimensional sensor data. . For each sensor, where the query function is unknown, we develop a top-k extraction algorithm to efficiently retrieve necessary data points that will be sent to the sink. . We propose schemes working with the top-k extraction algorithm for reducing communication
traffic. Our key observation is that the continuously collected sensor data does not change abruptly. With previously generated global top-k results, we develop novel scheme for each node to update and maintain its local dominant graph for later data
suppression. In addition, we propose local filters to further reduce the communication traffic. . We develop a comprehensive set of modules and interactions. Our framework is general enough so that many modules can be added piece by piece. We show two examples on how to handle data aging and accommodate approximate queries so as to adapt to different user requirements and system performance. . We evaluate our framework on both real and synthetic data sets. Simulation results show that our schemes can reduce the total communication cost by up to 90 percent, compared with the centralized scheme or a straightforward extension from previous top-k algorithm designed for 1D sensor data.
ARCHITECTURE DIAGRAM
MODULES:
DATA AGING
Users may be more interested in recent data than in past ones and thus a data aging (called information aging in function can be applied to each data point. The usage of data aging represents that data vectors are contracted toward the origin (shortened) as they age. In particular, each data point can be multiplied by a factor fac , where cur is the current time and d:t is the generation time of the data point. fac represents the degree of the data aging. For example, if the data point is considered to account for half of original value from 6 am to 12 pm and the sensor reading is generated every 30 seconds like [1], the aging factor fac is around 0:5 1=720 _ 0:999
APPROXIMATE QUERIES
When approximate results are acceptable, a sampling-based approach can be used that reduces communication cost by avoiding data transmission from nodes whose data points are unlikely in the query results. To this end, we propose a sampling-based query planning using linear programming, which can be easily integrated to our basic or enhanced scheme. The goal is to minimize the communication cost, while satisfying the user predefined top-k accuracy. In short, our algorithms can be easily extended to allow a tradeoff between the accuracy of the results and the communication cost Deployment Issues An important issue related to the deployment is the flexibility for multiple users whose interests on k may vary, say from top-2 query to top-20 queries. According to our design, a large amount of various k values could affect the system performance greatly. Recall that each node will extract top-k results and send to its parent based on Algorithm 1. The size of diffusion messages from the sink, no matter for the basic scheme or for the enhanced scheme, are highly dependent on the value of k (the upper bound of the message complexity highly relies on RS sink or  Jrs j according to the proof of Theorem 4). One approach to handling this problem is that the sink estimates from the historical queries an appropriate k value. For example, if there are more that 90 percent users request only up to top-10 queries, the sink will use 10 as the maximum k value, denoted by K, and each node extracts top10 query results to send to its parent.
DEPLOYMENT ISSUES
An important issue related to the deployment is the flexibility for multiple users whose interests on k may vary, say from top-2 query to top-20 queries. According to our design, a large amount of various k values could affect the system performance greatly. Recall that each node will extract top-k results and send to its parent based on Algorithm 1. The size of diffusion messages from the sink, no matter for the basic scheme or for the enhanced scheme, are highly dependent on the value of k (the upper bound of the message complexity highly relies on RS sink
or Jrs j according to the proof of Theorem 4). One approach to handling this problem is that the sink estimates from the historical queries an appropriate k value. For example, if there are more that 90 percent users request only up to top-10 queries, the sink will use 10 as the maximum k value, denoted by K, and each node extracts top10 query results to send to its parent.

CONCLUSION:
This paper presents the first work on continuous multi- dimensional top-k query processing in wireless sensor networks. We have developed a framework which effectively monitors user queries and in-network process. More importantly, it incorporates a special data structure, the dominant graph, to maintain top-k query results. Thedominant information can facilitate identifying the potential top-k query results for any given preference function and filtering out nontop-k results. The framework is general enough such that optional modules such as approximate
query processing, data aging, etc., can be handled. We prove that our algorithms are scalable as the message complexity is proportional to the total number of sensor nodes. The simulation shows that our algorithms reduce the communication cost by up to 90 percent as compared to the centralized scheme and a straightforward extension from previous top-k algorithm on 1D sensor data.We are interested in several directions in future work. First, we will seek more efficient algorithms to reduce the traffic overhead of our proposed framework. Second, evaluation of our framework on larger scale networks will be carried out. Finally, besides simulations in this paper, a tested evaluation will be performed.

No comments:

Post a Comment