數據挖掘,也可以稱為數據庫中的知識發現(Knowledge Discovery in Database,KDD),是從大量數據中提取出可信、新穎、有效并能被人理解的模式的高級處理過程。分類是數據挖掘的一種非常重要的方法。分類的概念是在已有數據的基礎上學會一個分類函數或構造出一個分類模型。該函數或模型能夠把數據庫中的數據映射到給定類中的某一個,從而可以應用與數據預測。大部分數據挖掘工具采用規則發現或決策樹分類技術來發現數據模式和規則,其核心是某種歸納算法。這類工具通常是對數據庫的數據進行開采,生產規則和決策樹,然后對新數據進行分析和預測。本文針對于決策樹算法中的ID3和C4.5算法,研究算法的實現與應用。
關鍵詞:分類 決策樹 ID3算法 C4.5算法
Abstract
Data mining, also named as KDD (Knowledge Discovery in Database), is an advanced process, in which we can pick up many trustful, novel, useful and readable patterns from very large amounts of data. Classification is one of the most important branches of data mining research Classification is one of the most important branches of data mining research works. Classification is to learn to find out a classification function or model on the basis of original data.The model can map a single record in database to a pre-assumed class. Thus,classification can be used to forecast.Most of data mining tool kits use the regular discovery or the decision tree classification techniques to find the new data model and rules,the nuclear target is a certain summarizing calculation.This tool kit usually mines the data in storage,produces rules and decision tree, and then analyzes and forecasts new data.This paper studies data mining classification calculation of ID3 and C4.5 .