Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now · Maps

Workshop

: On a Parallel Spark Workflow for Frequent Itemset Mining Based on Array Prefix-Tree

Presenters

Event Type

Workshop

Registration Categories

Tags

TimeSunday, 17 November 20194:30pm - 5pm

Location708

DescriptionFrequent Itemset Mining (FIM) is a fundamental procedure in various data mining techniques such as association rule mining. Among many existing algorithms, FP-Growth is considered as a milestone achievement that discovers frequent itemsets without generating candidates. However, due to the high complexity of its mining process and the high cost of its memory usage, FP-Growth still suffers from a performance bottleneck when dealing with large datasets. In this paper, we design a new Array Prefix-Tree structure, and based on that, propose an Array Prefix-Tree Growth (APT-Growth) algorithm, which explicitly obviates the need of recursively constructing conditional FP-Tree as required by FP-Growth. To support big data analytics, we further design and implement a parallel version of APTGrowth, referred to as PAPT-Growth, as a Spark workflow. We conduct FIM workflow experiments on both real-life and synthetic datasets for performance evaluation, and extensive results show that PAPT-Growth outperforms other representative parallel FIM algorithms in terms of execution time, which sheds light on its potential applications to big data mining.

Presenters

Xinzheng Niu

University of Electronic Science and Technology of China

Mideng Qian

University of Electronic Science and Technology of China

Chase Wu

New Jersey Institute of Technology

Aiqin Hou

Northwest University, China