Pig Cassandra Interactions

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Pig Cassandra Interactions

Schema

UDM Envelope

"sha256":"d89fd38a5a9fb706622322de165406dda042810b344d007
59accc10b5a9de9fb",
"envelope":"c2a0b690-a1e5-11e0-946d-002312554f17"
"customer":4352123
"product":"PROD-1"
"udms":[
{
"type":"test",
"policy":"POLICY-71",
"status":"SUCCESS"
},
{
"type":"result",
"subtype":"COMPLIANCE",
"score":57
},
{
"type":"test",
"policy":"POLICY-29",
"status":"SUCCESS"
}
]
}
UDM Table

UDM: {// Column Family


PROD-1: CUST-55: {//Row key [product: customer]
UDMID:{// supercolumn
"type":"test",
"policy":"POLICY-71",
"status":"SUCCESS"
},
}
}

UDMStats Table
UDMStats: {// Column Family
PROD-1: CUST-55: {//Row key [product: customer]
POLICY-71: SUCCESS: {// column name
[count-type: count-subtype]
2 //long value
},
}
}

Load

Extract test udms:

new_udms = LOAD 'cassandra://SPC/UDM' USING CassandraStorage()


AS ('UDM', subcolumns: bag {T: tuple(‘UDMID: UDM: map[])});

Process

success_udms = FILTER new_udms by subcolumns.status =


‘SUCCESS’;

success_counts = FOREACH success_udms GENERATE


“subcolumns.product” as product, “subcolumns.customer” as
customer, “subcolumns.policy” as policy, “subcolumns.status”
as status, “COUNT(subcolumns.status) as count PARALLEL 3,
WHERE policy.status = ‘SUCCESS”;

Dump

STORE policy_counts.product, policy_counts.customer,


policy_counts.status, policy_counts.count INTO
'cassandra://SPC/UDMStats USING CassandraStorage();

You might also like