欢迎访问《大连理工大学学报—

文章摘要

黄坤,吴玉佳.一种垂直结构的高效用项集挖掘算法[J].,2017,57(5):524-530

一种垂直结构的高效用项集挖掘算法

An algorithm of mining high utility itemsets with vertical structures

DOI：10.7511/dllgxb201705013

英文关键词: data mining association analysis frequent itemsets high utility itemsets

基金项目:国家自然科学基金资助项目(61303046).

作者	单位
黄坤,吴玉佳

摘要点击次数: 1236

全文下载次数: 1566

中文摘要:

挖掘高效用项集已成为关联分析中的热点问题之一．多数高效用项集挖掘算法需要产生大量的候选项集，影响了算法性能．HUI-Miner是一个不需要产生候选项集就能发现事务数据库中所有高效用项集的算法．但其需要产生大量效用列表，不仅消耗了过多的存储空间，而且影响了算法的运行性能．针对此问题，提出一个新的数据结构，称为项集列表，用于存储事务和项的效用信息．提出3种剪枝策略，减少项集列表的数量，通过扫描一次事务数据库完成所有项集列表的构建．提出算法MHUI，直接从项集列表中挖掘所有的高效用项集而不产生任何候选项集．在3个不同的稀疏数据集上和最新的算法进行对比实验证明，MHUI算法的运行时间和内存消耗优于其他算法．

英文摘要:

Mining high utility itemsets (HUIs) is one of popular tasks in field of association analysis. Most of HUIs mining algorithms need to generate a lot of candidate itemsets (CIs) which will affect the performance of algorithm. HUI-Miner can mine all the HUIs from a transaction database without generating CIs. However, this algorithm generates a large number of utility lists (ULs) and so many ULs not only consume too much storage space but also affect the operation performance. To solve this problem, itemsets lists (ILs), new data structures are proposed to maintain information of transaction and item utility. Three pruning strategies are proposed to reduce the number of ILs and can build the ILs just scanning the transaction database only once. A new algorithm namely MHUI is proposed which mines all the HUIs directly from the ILs without generating any CIs. The experimental results show that the proposed method outperforms the state-of-the-art algorithms in terms of runtime and memory consumption on three different sparse datasets.

查看全文查看/发表评论下载PDF阅读器

关闭