Professional Documents
Culture Documents
Pyspark Questions
Pyspark Questions
We have only one data source with signal range signal which is producing the data every 1 minute as kafka streams and we have below 3
or n number of tags with their data type
# Note – for each rule break definition , we should use only one tag , such as tag t1 > 4 occurs for 4 times
Example :
Kafka Streaming Data Sample would be like given below :- You can generate your own data for testing
{ timestamp : 1571053218000 , { t1 : 55.23 , t2 : 10 , t3 :'ON' } }
{ timestamp : 1571053278000 , { t1 : 63.23 , t2 : 11 , t3 :'OFF' } }
{ timestamp : 1571053338000 , { t1 : 73.23 , t2 : 12 , t3 :'ON' } }
{ timestamp : 1571053398000 , { t1 : 83.23 , t2 : 13 , t3 :'ON' } }
{ timestamp : 1571053458000 , { t1 : 20.23 , t2 : 14 , t3 :'ON' } }
{ timestamp : 1571053518000 , { t1 : 30.23 , t2 : 25 , t3 :'OFF' } }
So on . . .
Note:-
We only create entry to the RULE_BREAK table , only if the condition satisfied for n number of consecutive rule break for the streaming
records.
if the condition is not satisfied in current record , we should reset it and again apply the pattern/rule break conditions
▪ Briefly describe the conceptual approach you chose! What are the trade-offs?
▪ What's the runtime performance? What is the complexity? Where are the bottlenecks?
▪ If you had more time, what improvements would you make, and in what order of priority?