文章摘要
黄坤,吴玉佳.一种垂直结构的高效用项集挖掘算法[J].,2017,57(5):524-530
一种垂直结构的高效用项集挖掘算法
An algorithm of mining high utility itemsets with vertical structures
  
DOI:10.7511/dllgxb201705013
中文关键词: 数据挖掘;关联分析;频繁项集  高效用项集
英文关键词: data mining  association analysis  frequent itemsets  high utility itemsets
基金项目:国家自然科学基金资助项目(61303046).
作者单位
黄坤,吴玉佳  
摘要点击次数: 1236
全文下载次数: 1566
中文摘要:
      挖掘高效用项集已成为关联分析中的热点问题之一.多数高效用项集挖掘算法需要产生大量的候选项集,影响了算法性能.HUI-Miner是一个不需要产生候选项集就能发现事务数据库中所有高效用项集的算法.但其需要产生大量效用列表,不仅消耗了过多的存储空间,而且影响了算法的运行性能.针对此问题,提出一个新的数据结构,称为项集列表,用于存储事务和项的效用信息.提出3种剪枝策略,减少项集列表的数量,通过扫描一次事务数据库完成所有项集列表的构建.提出算法MHUI,直接从项集列表中挖掘所有的高效用项集而不产生任何候选项集.在3个不同的稀疏数据集上和最新的算法进行对比实验证明,MHUI算法的运行时间和内存消耗优于其他算法.
英文摘要:
      Mining high utility itemsets (HUIs) is one of popular tasks in field of association analysis. Most of HUIs mining algorithms need to generate a lot of candidate itemsets (CIs) which will affect the performance of algorithm. HUI-Miner can mine all the HUIs from a transaction database without generating CIs. However, this algorithm generates a large number of utility lists (ULs) and so many ULs not only consume too much storage space but also affect the operation performance. To solve this problem, itemsets lists (ILs), new data structures are proposed to maintain information of transaction and item utility. Three pruning strategies are proposed to reduce the number of ILs and can build the ILs just scanning the transaction database only once. A new algorithm namely MHUI is proposed which mines all the HUIs directly from the ILs without generating any CIs. The experimental results show that the proposed method outperforms the state-of-the-art algorithms in terms of runtime and memory consumption on three different sparse datasets.
查看全文   查看/发表评论  下载PDF阅读器
关闭