TM3 ch06 Frequent Itemsets

Tutorial Meeting #3
Data Science and Machine Learning
Frequent Itemsets
(Chapter 6 of MMDS book)
DAMA60 Group
School of Science and Technology
Hellenic Open University
Content
• Preliminaries
• Challenges
• Reminder: Apriori
• Methods and Algorithms

• PCY: The algorithm of Park, Chen and Yu
• PCY refinements
• Random Sampling
• SON: The Algorithm of Savasere, Omiecinski, and Navathe
2
Finding Frequent Itemsets
Itemsets: Computation Model
• Typically, data is kept in flat files Item
rather than in a database system: Item
Item
• Stored on disk Item
• Stored basket-by-basket Item
• Baskets are small but we have Item
many baskets and many items Item
Item
• Expand baskets into pairs, triples, etc.
Item
as you read baskets
Item
• Use k nested loops to generate all
Item
sets of size k
Item
Etc.
Note: We want to find frequent itemsets. To find
them, we have to count them. To count them, we
have to generate them.
Items are positive integers,
and boundaries between
baskets are –1.
4
Computation Model
• The true cost of mining disk-resident data is usually the number of

disk I/Os
• In practice, association-rule mining algorithms read the data in

passes – all baskets read in turn
• We measure the cost by the number of passes an algorithm makes

over the data
5
Main-Memory Bottleneck
• For many frequent-itemset algorithms, main-memory is the

critical resource
• As we read baskets, we need to count something, e.g., occurrences of

pairs of items
• The number of different things we can count is limited by main

memory
• Moving counts between memory and disc is very inefficient
6
Finding Frequent Pairs
• The hardest problem often turns out to be finding the
frequent pairs of items {i1, i2}
• Why? Freq. pairs are common, freq. triples are rare
• Why? Probability of being frequent drops exponentially
with size; number of sets grows more slowly with size
• Let’s first concentrate on pairs, then extend to larger sets
• The approach:
• We always need to generate all the itemsets.
• But we would only like to count (keep track) of those
itemsets that in the end turn out to be frequent.
7
Naïve Algorithm
• Read file once, counting in main memory
the occurrences of each pair:
• From each basket of n items, generate its n(n-1)/2 pairs by two nested
loops
• Fails if (#items)2 exceeds main memory
• Remember: #items can be 100K (Wal-Mart) or 10B (Web pages)
• Suppose 105 items, counts are 4-byte integers
• Number of pairs of items: 105(105-1)/2 = 5*109
• Therefore, 2*1010 (20 gigabytes) of memory needed
• Suppose 5 items, {I1, I2, I3, I4, I5}

• Number of pairs of items: n(n-1)/2 = 5 * 4 / 2 = 10
• {I1, I2}, {I1, I3}, {I1, I4}, {I1, I5},
• {I2, I3}, {I2, I4}, {I2, I5},
• {I3, I4}, {I3, I5},
• {I4, I5}
8
Counting Pairs in Memory
Two approaches:
• Approach 1: Count all pairs using an item-item frequency
matrix
• Approach 2: Keep a table of triples [i, j, c] = “the count of the
pair of items {i, j} is c.”
• If integers and item ids are 4 bytes, we need approximately 12 bytes
for pairs with count > 0
• Plus some additional overhead for the hash table
Note:
• Approach 1 requires only 4 bytes per pair
• Approach 2 uses 12 bytes per pair
(but only for pairs with count > 0)
9
Comparing the 2 Approaches
4 bytes per pair 12 per

occurring pair
Triangular Matrix Triples Method
10
Comparing the two approaches
• Approach 1: Triangular Matrix
• n = total number items
• Count pair of items {i, j} only if i < j
• Keep pair counts in lexicographic order:
• {1,2}, {1,3},…, {1,n}, {2,3}, {2,4},…,{2,n}, {3,4},…
𝑖
• Pair {i, j} is at position (𝑖 – 1)(𝑛– 2) + 𝑗 – 𝑖
• Total number of pairs 𝒏(𝒏−𝟏)ൗ𝟐; For large n, we can assume that this
𝟐
number equals to 𝒏 ൗ𝟐
• Triangular Matrix requires 4 bytes per pair, thus the total bytes it
consumers equals to 𝟐𝒏𝟐
• Approach 2: Triples Method

• It uses 12 bytes per occurring pair (but only for pairs with count > 0)
• Beats Approach 1 if less than 1/3 of possible pairs actually occur
11
Comparing the two approaches
Problem is if we have too

many items so the pairs
do not fit into memory.
Can we do better?
12
Reminder: A-Priori Algorithm
A-Priori Algorithm – (1)
• A two-pass approach called

A-Priori reduces the need for
large main memory requirements
• Key idea: monotonicity

• If a set of items I appears at
least s times, so does every subset J of I
• Contrapositive for pairs:

If item i does not appear in s baskets, then no pair that includes i can
appear in s baskets
• So, how does A-Priori find freq. pairs?
14
A-Priori Algorithm – (2)
• Pass 1: Read baskets and count in main memory the

occurrences of each individual item
• Requires only memory proportional to items
• Items that appear greater or equal than s times, constitute the
frequent items
• Typical value for s is 1% of the baskets
• Pass 2: Read baskets again and count in main

memory only those pairs where both elements are
frequent (from Pass 1)
• Requires memory proportional to square of frequent items only (for
counts)
• Plus a list of the frequent items (so you know what must be counted)
15
Main-Memory: Picture of A-Priori
Frequent items
Item counts
Main memory
Counts of
pairs of
frequent items
(candidate
pairs)
Pass 1 Pass 2
16
Detail for A-Priori
• You can use the triangular matrix method with the number of frequent
items (m < n).
• May save space compared with storing triples
• Trick: re-number frequent items 1,2,…, m and keep a table relating new
numbers to original item numbers
• The new space required is 𝟐𝒎𝟐
Item counts Frequent Old

items item
numbers
Main memory
Counts of pairs
Counts
of of
frequent
pairs of
items
frequent items
Pass 1 Pass 2
17
A-Priori Algorithm
• Let k=1 Fk: frequent k-itemsets
Lk: candidate k-itemsets
• Generate F1 = {frequent 1-itemsets}
• Repeat until Fk is empty

• Candidate Generation: Generate Lk+1 from Fk
• Candidate Pruning: Prune candidate itemsets in Lk+1 containing

subsets of length k that are infrequent
• Support Counting: Count the support of each candidate in Lk+1 by

scanning the Database
• Candidate Elimination: Eliminate candidates in Lk+1 that are

infrequent, leaving only those that are frequent ⇒ Fk+1
18
Frequent Triples and more
• For each k, we construct two sets of k-itemsets (sets of size k):
• Ck (candidate k-itemsets) : those that might be frequent sets (support > s)

based on information from the pass for k–1
• Lk = the set of truly frequent k-tuples
Count All pairs Count

All of items To be
the items the pairs explained
items from L1
C1 Filter L1 Construct C2 Filter L2 Construct C3
19
The A-Priori Algorithm — Example
Minimum Support = 2
20
PCY (Park-Chen-Yu) Algorithm
PCY (Park-Chen-Yu) Algorithm
• Observation:
In pass 1 of A-Priori, most memory is not used
• We store only individual item counts
• Can we use the unused memory to reduce memory required in pass
2?
• Pass 1: In addition to item counts, maintain a hash

table with as many buckets as fit in memory
• Keep a count for each bucket into which
pairs of items are hashed
• For each bucket just keep the count, not the actual
pairs that hash to the bucket!
23
PCY Algorithm – First Pass
FOR (each basket) :
FOR (each item in the basket) :
add 1 to item’s count;
FOR (each pair of items) :
New
in hash the pair to a bucket;
PCY
add 1 to the count for that bucket;
• Few things to note:

• Pairs of items need to be generated from the input file; they are not
present in the file (double loop)
• We are not just interested in the presence of a pair, but we need to see
whether it is present at least s (support) times (frequent bucket)
24
Observations about Buckets
• If a bucket contains a frequent pair, then the bucket is
surely frequent
• However, even without any frequent pair,
a bucket can still be frequent 
• So, we cannot use the hash to eliminate any
member (pair) of a “frequent” bucket
• But, for a bucket with total count less than s,
none of its pairs can be frequent ☺
• Pairs that hash to this bucket can be eliminated as
candidates (even if the pair consists of 2 frequent
items)
• Pass 2:
Only count pairs that hash to frequent buckets
25
PCY Algorithm – Between Passes
• Replace the buckets by a Bitmap:
• 1 means the bucket count exceeded the support s (call it a
frequent bucket);
• 0 means it did not
• 4-byte integer counts are replaced by bits,

so the bit-vector requires 1/32 of memory
• Also, decide which items are frequent

and list them for the second pass
26
PCY Algorithm – Pass 2
• Count all pairs {i, j} that meet the
conditions for being a candidate pair:
1. Both i and j are frequent items

2. The pair {i, j} hashes to a bucket whose bit in the bit
vector is 1 (i.e., a frequent bucket)
• Both conditions are necessary for the

pair to have a chance of being frequent!
27
Main-Memory: Picture of PCY
Item counts Frequent items
Bitmap
Main memory
Hash
Hash table
table Counts of
for pairs candidate
pairs
Pass 1 Pass 2
28
Main-Memory Details
•Buckets require a few bytes each:
• Note: we do not have to count past s
• #buckets is O(main-memory size)
•On second pass, a table of (item, item, count)

triples is essential
• Thus, hash table must eliminate approx. 2/3
of the candidate pairs for PCY to beat A-Priori
29
Example of PCY
● Here is a collection of twelve {1, 2, 3}
baskets. {1, 3, 5}
{3, 5, 6}
● Each contains three of the six items {2, 3, 4}
{2, 4, 6}
1 through 6. {1, 2, 4}
● Suppose the support threshold is 4. {3, 4, 5}
{1, 3, 4}
● On the first pass of the PCY {2, 3, 5}
Algorithm we use a hash table with {4, 5, 6}
{2, 4, 5}
11 buckets {3, 4, 6}
○ The set {i, j} is hashed to bucket
using formula (i × j) mod 11.

30
Support of items and pairs
Item 1 2 3 4 5 6
{1, 2, 3}
support 4 6 8 8 6 4
{1, 3, 5}
{3, 5, 6}
pair {1, 2} {1, 3} {1, 4} {1, 5} {2, 3} {2, 4} {2, 5}
{2, 3, 4}
support 2 3 2 1 3 4 2
{2, 4, 6}
{1, 2, 4}
pair {2, 6} {3, 4} {3, 5} {3, 6} {4, 5} {4, 6} {5, 6}
{3, 4, 5}
support 1 4 4 2 3 3 2
{1, 3, 4}
{2, 3, 5}
{4, 5, 6}
{2, 4, 5}
{3, 4, 6}
31
Item 1 2 3 4 5 6 {1, 2, 3}
Support 4 6 8 8 6 4
{1, 3, 5}
{3, 5, 6}
{2, 3, 4}
Support of pairs j {2, 4, 6}
(i, j) 2 3 4 5 6 {1, 2, 4}
1 2 3 2 1 0 {3, 4, 5}
2 3 4 2 1
{1, 3, 4}
{2, 3, 5}
i 3 4 4 2
{4, 5, 6}
4 3 3 {2, 4, 5}
5 2 {3, 4, 6}
32
Pairs hashing to buckets
{1, 2, 3}
{1, 3, 5}
{3, 5, 6}
pair {1, 2} {1, 3} {1, 4} {1, 5} {1, 6} {2, 3} {2, 4} {2, 5}
{2, 3, 4}
bucket 2 3 4 5 6 6 8 10
{2, 4, 6}
{1, 2, 4}
pair {2, 6} {3, 4} {3, 5} {3, 6} {4, 5} {4, 6} {5, 6}
{3, 4, 5}
bucket 1 1 4 7 9 2 8 {1, 3, 4}
{2, 3, 5}
{4, 5, 6}
{2, 4, 5}
{3, 4, 6}
Support of pairs j
(i, j) 2 3 4 5 6
1 2 3 2 1 0
{1, 2, 3}
{1, 3, 5}
2 3 4 2 1
{3, 5, 6}
i 3 4 4 2 {2, 3, 4}
4 3 3 {2, 4, 6}
5 2 {1, 2, 4}
{3, 4, 5}
j
Bucket of pairs
(i, j)
{1, 3, 4}
2 3 4 5 6 {2, 3, 5}
1
(1x2) mod
11 = 2 3 4 5 6 {4, 5, 6}
2 6 8 10 1 {2, 4, 5}
i 3 1 4 7
{3, 4, 6}
4 9 2
5 8
34
Frequent buckets
(i, j) j (i, j) j
Bucket Support 2 3 4 5 6
{1, 2, 3}
2 3 4 5 6
{1, 3, 5}
1 2 3 4 5 6 1 2 3 2 1 0
{3, 5, 6}
2 3 4 2 1
2 6 8 10 1 {2, 3, 4}
i 3 1 4 7 i 3 4 4 2 {2, 4, 6}
4 9 2 4 3 3 {1, 2, 4}
5 8 5 2 {3, 4, 5}
{1, 3, 4}
{2, 3, 5}
bucket 0 1 2 3 4 5 6 7 8 9 10 {4, 5, 6}
pairs {2, 6}
{3, 4}
{1, 2}
{4, 6}
{1, 3} {1, 4}
{3, 5}
{1, 5} {1, 6}
{2, 3}
{3, 6} {2, 4}
{5, 6}
{4, 5} {2, 5} {2, 4, 5}
support 0 (4+1) 5 3 6 1 3 2 6 3 2
{3, 4, 6}
5
Pairs counted on the second pass of PCY
Support threshold is 4
bucket 0 1 2 3 4 5 6 7 8 9 10
pairs {2, 6} {1, 2} {1, 3} {1, 4} {1, 5} {1, 6} {3, 6} {2, 4} {4, 5} {2, 5}
{3, 4} {4, 6} {3, 5} {2, 3} {5, 6}
support 0 5 5 3 6 1 3 2 6 3 2
Only pairs in frequent (green) buckets will be counted on the second pass of PCY:
{1, 2}, {1, 4}, {2, 4}, {2, 6}, {3, 4}, {3, 5}, {4, 6}, {5, 6}
bucket 0 1 2 3 4 5 6 7 8 9 10
pairs {2, 6} {1, 2} {1, 3} {1, 4} {1, 5} {1, 6} {3, 6} {2, 4} {4, 5} {2, 5}
{3, 4} {4, 6} {3, 5} {2, 3} {5, 6}
support 0 5 5 3 6 1 3 2 6 3 2
{1, 2}, {1, 4}, {2, 4}, {2, 6}, {3, 4}, {3, 5}, {4, 6}, {5, 6}
Note: A pair is frequent if both its items are frequent!

Item 1 2 3 4 5 6
support 4 6 8 8 6 4
All items are frequent, so all the selected pairs pass

What will happen if support threshold is set to 5?
bucket 0 1 2 3 4 5 6 7 8 9 10
pairs {2, 6} {1, 2} {1, 3} {1, 4} {1, 5} {1, 6} {3, 6} {2, 4} {4, 5} {2, 5}
{3, 4} {4, 6} {3, 5} {2, 3} {5, 6}
support 0 5 5 3 6 1 3 2 6 3 2
{1, 2}, {1, 4}, {2, 4}, {2, 6}, {3, 4}, {3, 5}, {4, 6}, {5, 6}
Note: A pair is frequent if both its items are frequent!

Item 1 2 3 4 5 6
support 4 6 8 8 6 4
Pairs whose items are not frequent (1 and 6) will not be counted on the second
pass of PCY:
{2, 4}, {3, 4}, {3, 5}
Refinement: Multistage Algorithm
• Limit the number of candidates to be
counted
• Remember: Memory is the bottleneck
• Still need to generate all the itemsets but we only
want to count/keep track of the ones that are
frequent
• Key idea: After Pass 1 of PCY, rehash only
those pairs that qualify for Pass 2 of PCY
• i and j are frequent, and
• {i, j} hashes to a frequent bucket from Pass 1
• On middle pass, fewer pairs contribute to
buckets, so fewer false positives
• Requires 3 passes over the data
40
Main-Memory: Multistage
Item counts Freq. items Freq. items
Main memory
Bitmap 1 Bitmap 1
First Bitmap 2
hash table
First
Second Counts
hash table Counts of
of
hash table candidate
candidate
pairs
pairs
Pass 1 Pass 2 Pass 3

Hash pairs {i,j}
Count items into 2nd Hash function iff:
Hash pairs {i,j} i,j are frequent
and
{i,j} hashes to
frequent bucket in Bitmap1
41
Multistage – Pass 3
Count only those pairs {i, j} that satisfy these
candidate pair conditions:
1. Both i and j are frequent items

2. Using the first hash function, the pair hashes to
a bucket whose bit in the first Bitmap is 1
3. Using the second hash function, the pair
hashes to a bucket whose bit in the second
Bitmap is 1
42
Important Points
1. The two hash functions have to be
independent
2. We need to check both hashes on the

third pass
• If not, we would end up counting pairs of

frequent items that hashed first to an
infrequent bucket (1st pass) but happened to
hash second to a frequent bucket (2nd pass).
43
Refinement: Multihash Algorithm
• Key idea: Use several independent hash

tables on the first pass
• Risk: Halving the number of buckets doubles
the average count
• We have to be sure most buckets will still not
reach count s
• If so, we can get a benefit like multistage, but

in only 2 passes
44
Main-Memory: Multihash
Item counts Freq. items
Bitmap 1
Main memory
First
First hash
hash table
table Bitmap 2
Counts
Countsofof
Second
Second candidate
candidate
hash table
hash table pairs
pairs
Pass 1 Pass 2
45
PCY: Summary
• Either multistage or multihash can use more than
two hash functions
• In multistage, there is a point of diminishing

returns, since the bit-vectors eventually consume all
of main memory
• For multihash, the bit-vectors occupy exactly what

one PCY bitmap does, but too many hash functions
makes all counts > s
46
Limited Pass Algorithms
Frequent Itemsets
in < 2 Passes
Frequent Itemsets in < 2 Passes
• A-Priori, PCY, etc., take k passes to find frequent

itemsets of size k
• Can we use fewer passes?
• Use 2 or fewer passes for all sizes,

but may miss some frequent itemsets
• Simple Randomized Algorithm
• SON (Savasere, Omiecinski, and Navathe)
48
Simple Randomized Algorithm (1)
•Take a random sample of the market baskets
•Run a-priori or one of its improvements

in main memory
• So we don’t pay for disk I/O each Copy of
sample
time we increase the size of itemsets
Main memory
baskets
• Reduce support threshold
proportionally Space
to match the sample size for
counts
49
Simple Randomized Algorithm (2)
•Optionally, verify that the candidate pairs are

truly frequent in the entire data set by a
second pass (avoid false positives)
•But you don’t catch sets frequent in the

whole, but not in the sample
• Smaller threshold, e.g., s/125, helps catch more truly
frequent itemsets
• But requires more space
50
SON: The Algorithm of
Savasere, Omiecinski, and Navathe (1)
• Repeatedly read small subsets of the baskets into main memory
and run an in-memory algorithm to find all frequent itemsets
• Note: we are not sampling, but processing the entire file in memory-
sized chunks
• An itemset becomes a candidate if it is found to be frequent in

any one or more subsets of the baskets.
• On a second pass, count all the candidate itemsets and determine

which are frequent in the entire set
• Key “monotonicity” idea: an itemset cannot be frequent in the

entire set of baskets unless it is frequent in at least one subset
51
SON Algorithm – Example
• Let the support threshold s=4 and the following set of baskets:
{A, B, C} {A, B, C}
{A, B} {A, B} Chunk #1
{C, D} {C, D} s=2
{B, C} {B, C}
{A, B, D, E} {A, B, D, E}
{A, B, E} {A, B, E} Chunk #2
{A, C, E} {A, C, E} s=2
Candidates from Chunk #1 (s≥2): Candidates from Chunk #2 (s≥2):

{A}=2, {B}=3, {C}=3, {A,B}=2, {B,C}=2 {A}=3, {B}=2, {E}=3, {A,B}=2, {A,E}=3, {B,E}=2
Union of Candidates:
{A}, {B}, {C}, {E}, {A,B}, {A,E}, {B,C}, {B,E}
SON’s 2nd Pass:
{A}, {B}, {C}, {E}, {A,B}, {A,E}, {B,C}, {B,E}
Not Frequent in Chunk #2, frequent in Chunk #1, frequent overall

52
SON – Distributed Version
•SON lends itself to distributed data mining
•Baskets distributed among many nodes

• Compute frequent itemsets at each node
• Distribute candidates to all nodes
• Accumulate the counts of all candidates
53
SON: Map/Reduce
● Phase 1: Find candidate itemsets
○ Map :
■ Take the assigned subset of the baskets and find the
itemsets frequent in the subset
■ Lower the support threshold from s to (p * s) if each
Map task gets fraction p of the total input file.
■ Apply sampling algorithm
■ Output: set of key-value pairs (F, 1), F: frequent
itemset from the sample.
○ Reduce :
■ Each Reduce task is assigned a set of keys, which are
itemsets.
■ Produces those keys (itemsets) that appear one or
more times. (candidate itemsets).
54
SON: Map/Reduce
Phase 2: Find true frequent itemsets
Map
● Input:
○ All the output from the first Reduce Function (candidate
itemsets)
○ Portion of the input data file.
● Count the number of occurrences of each of the candidate

itemsets among the baskets in the portion of the dataset that
it was assigned.
● Output: set of key-value pairs (C, v)
○ C: one of the candidate sets
○ v : the support for that itemset among the baskets that
were input to this Map task.
55
SON: Map/Reduce
Phase 2: Find true frequent itemsets
Reduce:
● Input:
○ itemsets as keys and sum the associated values.
● Output:
○ total support for each of the itemsets that the Reduce task was
assigned to handle.
● Itemsets whose sum of values is at least s are frequent in the whole

dataset
○ the Reduce task outputs these itemsets with their counts.
● Itemsets that do not have total support at least s are not transmitted
to the output of the Reduce task
56

TM3 ch06 Frequent Itemsets

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TM3 ch06 Frequent Itemsets

Uploaded by

Copyright:

Available Formats

Tutorial Meeting #3

Data Science and Machine Learning

• Methods and Algorithms

• The true cost of mining disk-resident data is usually the number of

• In practice, association-rule mining algorithms read the data in

• We measure the cost by the number of passes an algorithm makes

• For many frequent-itemset algorithms, main-memory is the

• As we read baskets, we need to count something, e.g., occurrences of

• The number of different things we can count is limited by main

• Moving counts between memory and disc is very inefficient

• Let’s first concentrate on pairs, then extend to larger sets

• Suppose 5 items, {I1, I2, I3, I4, I5}

4 bytes per pair 12 per

Triangular Matrix Triples Method

• Approach 2: Triples Method

Problem is if we have too

• A two-pass approach called

• Key idea: monotonicity

• Contrapositive for pairs:

• So, how does A-Priori find freq. pairs?

• Pass 1: Read baskets and count in main memory the

• Pass 2: Read baskets again and count in main

Item counts Frequent Old

• Repeat until Fk is empty

• Candidate Pruning: Prune candidate itemsets in Lk+1 containing

• Support Counting: Count the support of each candidate in Lk+1 by

• Candidate Elimination: Eliminate candidates in Lk+1 that are

• For each k, we construct two sets of k-itemsets (sets of size k):

• Ck (candidate k-itemsets) : those that might be frequent sets (support > s)

• Lk = the set of truly frequent k-tuples

Count All pairs Count

C1 Filter L1 Construct C2 Filter L2 Construct C3

• Pass 1: In addition to item counts, maintain a hash

• Few things to note:

• 4-byte integer counts are replaced by bits,

• Also, decide which items are frequent

1. Both i and j are frequent items

• Both conditions are necessary for the

Item counts Frequent items

•On second pass, a table of (item, item, count)

using formula (i × j) mod 11.

Note: A pair is frequent if both its items are frequent!

All items are frequent, so all the selected pairs pass

Note: A pair is frequent if both its items are frequent!

Pass 1 Pass 2 Pass 3

1. Both i and j are frequent items

2. We need to check both hashes on the

• If not, we would end up counting pairs of

• Key idea: Use several independent hash

• If so, we can get a benefit like multistage, but

Item counts Freq. items

• In multistage, there is a point of diminishing

• For multihash, the bit-vectors occupy exactly what

• A-Priori, PCY, etc., take k passes to find frequent

• Can we use fewer passes?

• Use 2 or fewer passes for all sizes,

•Run a-priori or one of its improvements

•Optionally, verify that the candidate pairs are

•But you don’t catch sets frequent in the

• An itemset becomes a candidate if it is found to be frequent in

• On a second pass, count all the candidate itemsets and determine

• Key “monotonicity” idea: an itemset cannot be frequent in the

Candidates from Chunk #1 (s≥2): Candidates from Chunk #2 (s≥2):