The Challenge

The Challenge

This hackathon will take on the same format as the DEBS 2014 Grand Challenge, posing a real world data set and a set of problems, which this time around would be run on the WSO2 data analytics platform.

The data set that needs to be processed will originate from 2125 smart plugs deployed across 40 houses. Data will be collected roughly every second for each sensor in each smart plug. It has to be noted that the data set is collected in an uncontrolled, real-world environment, which implies the possibility of malformed data as well as missing measurements.

The hierarchical structure for the challenge will consist of a house (identified with a unique house_id), with a few households in each house (identified with a unique household_id). Each household will contain several smartplugs (identified with a unique plug_id). Each smart plug contains two sensors:

1. A load sensor measuring current load with Watt as unit
2. A work sensor measuring total accumulated work since the start (or reset) with kWh as unit

Data must be consumed in a streaming fashion, and no precomputation based on the whole data set must be made. The solution needs to complete Query 01 and Query 02.

Query 01: Load Prediction

The goal of this query is to make load forecasts based on current load measurements and those of recorded historical data. Such forecasts can be used to proactively influence load and adapt it to the supply situation, e.g. current production of renewable energy sources.

You must use following algorithm for prediction and implement it using WSO2 CEP.

The query should forecast the load for each house. Starting from September 1, 2013, we group each 24 hours into 12X24 5 minutes slices. For example, time t belongs to the slice ceil(time_of_day(t)/300). Here Ceil is the Ceiling function and time_of_day(timestamp) function, and should return the number of seconds elapsed in the given day for the given timestamp.

We predict load at time t as following where t is a multiple of 5 minutes.

average correction = average of all reading take at the same slice as the slice of t+5m.

L(t+10m) = average load of time between [t, t+5m] /2 + average_correction(t+5m)/2

WSO2

The output must follow the following format:

ts – timestamp of the starting time of the slice that the prediction is made for
house_id – id of the house for which the prediction is made
predicted_load – the predicted load for the time slice starting at ts

The output streams for plug-based prediction values should contain the following information:

ts – timestamp of the starting time of the slice that the prediction is made for
house_id – id of the house where the plug is located
household_id – the id of the household where the plug is located
plug_id – the id of the plug for which the prediction is made
predicted_load – the predicted load for the time slice starting at ts

The output streams should be updated every 30 seconds as specified by the input event timestamp. The purpose of the update is to reflect the latest value of the avgLoad(s_i) for the given slice.

Query 02: Outlier

The goal of this query is to find devices that have very high data (outlying) readings. The calculation is done every 15 minutes, and given a time t( a multiple of 15 minutes) an outlier is a device that has power consumption of more than Mean(D[t,t-15m] ) + 2*variance(D[t,t-15m] )) where D[t,t-15m] is data collected between time t-15m and t across all devices in the system.

You must use WSO2 DAS SparkSQL support to implement this query. For every 15m interval, output should be written to a database table in the following format:

timestamp – timestamp of the time output generated
house_id – house id
plug_id – device ID
value – value of the reading

Pre configured cartridges provided in Apache Stratos

WSO2 DAS Data Receivers
WSO2 CEP
WSO2 DAS (Spark)
Apache Hbase
MySQL
Apache Storm
Zookeeper
Nimbus
HDFS