Rough Sets in KDD - Tutorial Notes

Andrzej Skowron & Ning Zhong

Presented during PAKDD 2000, Kyoto, Japan
Pacific-Asia Conference on Knowledge Discovery and Datamining
Copyright by N. Zhong and A. Skowron

Click to start HTML presentation

 

Slide index

Rough Sets in KDD Tutorial Notes

About the Speakers

About the Speakers (2)

Contents

Introduction

Introduction (2)

Introduction (3)

Basic Concepts of Rough Sets

Information Systems/Tables

Decision Systems/Tables

Issues in the Decision Table

Indiscernibility

Indiscernibility (2)

An Example of Indiscernibility

Observations

Set Approximation

Set Approximation (2)

An Example of Set Approximation

An Example of Set Approximation (2)

slide PPT

slide PPT

slide PPT

slide PPT

Properties of Approximations

Properties of Approximations (2)

Four Basic Classes of Rough Sets

Accuracy of Approximation

Issues in the Decision Table

Reducts

Dispensable & Indispensable Attributes

Independent

Reduct & Core

An Example of Reducts & Core

Discernibility Matrix (relative to positive region)

Discernibility Matrix (relative to positive region) (2)

Discernibility Function (relative to objects)

Examples of Discernibility Matrix

Examples of Discernibility Matrix (2)

Rough Membership

Rough Membership (2)

Dependency of Attributes

Dependency of Attributes (2)

Dependency of Attributes (3)

A Rough Set Based KDD Process

What Are Issues of Real World ?

slide PPT

slide PPT

slide PPT

slide PPT

slide PPT

A Rough Set Based KDD Process

Observations

Discretization based on RSBR

Discretization Based on RSBR (2)

A Geometrical Representation of Data

A Geometrical Representation of Data and Cuts

Discretization Based on RSBR (3)

Discretization Based on RSBR (4)

A Discretization Process

The Set of Cuts on Attribute a

A Discretization Process (2)

A Sample Defined in Step 2

The Discernibility Formula

The Discernibility Formulae for All Different Pairs

The Discernibility Formulae for All Different Pairs (2)

A Discretization Process (3)

The Discernibility Formula in CNF Form

The Discernibility Formula in DNF Form

The Minimal Set Cuts for the Sample DB

A Result

A Rough Set Based KDD Process

Observations

The Goal of Attribute Selection

Attribute Selection

The Filter Approach

The Wrapper Approach

Basic Ideas: Attribute Selection using RSH

Why Heuristics ?

The Rule Selection Criteria in GDT-RS

Attribute Evaluation Criteria

Main Features of RSH

An Example of Attribute Selection

slide PPT

slide PPT

slide PPT

slide PPT

Searching for CORE (5)

R={b}

Attribute Evaluation Criteria

Selecting Attribute from {a,c,d}

Selecting Attribute from {a,c,d} (2)

Selecting Attribute from {a,c,d} (3)

Selecting Attribute from {a,c,d} (4)

A Heuristic Algorithm for Attribute Selection

A Heuristic Algorithm for Attribute Selection (2)

A Heuristic Algorithm for Attribute Selection (3)

Experimental Results

A Rough Set Based KDD Process

Main Features of GDT-RS

A Sample DB

A Sample GDT

Explanation for GDT

Probabilistic Relationship Between PIs and PGs

Unseen Instances

Rule Representation

Rule Strength (1)

Rule Strength (2)

Rule Strength (3)

Rule Discovery by GDT-RS

Regarding the Instances (Noise Rate = 0)

Generating Discernibility Vector for u2

Obtaining Reducts for u2

Generating Rules from u2

Generating Rules from u2 (2)

Generating Discernibility Vector for u4

Obtaining Reducts for u4

Generating Rules from u4

Generating Rules from u4 (2)

Generating Rules from All Instances

The Rule Selection Criteria in GDT-RS

Generalization Belonging to Class y

Generalization Belonging to Class n

Results from the Sample DB (Noise Rate = 0)

Results from the Sample DB (2) (Noise Rate > 0)

Regarding Instances (Noise Rate > 0)

Rules Obtained from All Instacnes

Example of Using BK

Changing Strength of Generalization by BK

Algorithm 1 Optimal Set of Rules

Algorithm 1 Optimal Set of Rules (2)

Algorithm 1 Optimal Set of Rules (3)

The Issue of Algorithm 1

Algorithm 2 Sub-Optimal Solution

Algorithm 2 Sub-Optimal Solution (2)

Algorithm 2 Sub-Optimal Solution (3)

Algorithm 2 Sub-Optimal Solution (4)

Time Complexity of Alg.1&2

Experiments

Experiment 1 (meningitis data)

Experiment 1 (meningitis data) (2)

Experiment 1 (meningitis data) (3)

Using Background Knowledge (meningitis data)

Using Background Knowledge (meningitis data) (2)

Explanation of BK

Using Background Knowledge (meningitis data) (3)

Using Background Knowledge (meningitis data) (4)

Experiment 2 (bacterial examination data)

Attribute Selection (bacterial examination data)

Some Results (bacterial examination data)

Experiment 3 (gastric cancer data)

Result of Attribute Selection (gastric cancer data)

Result of Attribute Selection (2) (gastric cancer data)

Experiment 4 (slope-collapse data)

Result of Attribute Selection (slope-collapse data)

The Discovered Rules (slope-collapse data)

Other Methods for Attribute Selection (download from http://www.iscs/nus.edu.sg/liuh/)

slide PPT

slide PPT

Result of WSFG

Result of WSFG (2) (class: direct death)

Result of WSBG

Result of WSBG (2) (class: direct death)

Result of LVI (gastric cancer data)

Some Rules Related to Direct Death

GDT-RS vs. Discriminant Analysis

GDT-RS vs. ID3 (C4.5)

Rough Sets in ILP and GrC -- An Advanced Topic --

Advantages of ILP (Compared with Attribute-Value Learning)

Weak Points of ILP (Compared with Attribute-Value Learning)

Goal

Normal Problem Setting for ILP

Normal Problem Setting for ILP (2)

Normal Problem Setting for ILP (3)

Issues

Imperfect Data in ILP

Imperfect Data in ILP (2)

Imperfect Data in ILP (3)

Observations

Observations (2)

Solution

Why GrC? A Practical Point of View

Solution (2)

Rough Sets

Rough Sets (2)

Rough Sets (3)

An Illustrating Example

An Illustrating Example (2)

Rough Problem Setting for Insufficient BK

Rough Problem Setting for Insufficient BK (2)

Rough Problem Setting for Insufficient BK (3)

Rough Problem Setting for Insufficient BK (4)

Example Revisited

Example Revisited (2)

Example Revisited (3)

Rough Problem Setting for Indiscernible Examples

Rough Problem Setting for Indiscernible Examples (2)

Rough Sets (GrC) for Other Imperfect Data in ILP

Future Work on RS (GrC) in ILP

Summary

Advanced Topics (to deal with real world problems)

Advanced Topics (2) (to deal with real world problems)

References and Further Readings

References and Further Readings

References and Further Readings

References and Further Readings

References and Further Readings

References and Further Readings

References and Further Readings

Related Conference and Web Pages

slide PPT

Info:
The presentation is quite large. It contains 210 slides. The HTML version is optimised for 800x600 resolution, full color.

The PowerPoint version of slides can be downloaded by clicking on the link below.
The size of .ppt file is approximately 1,2 MB.

Download

Please take care of copyrights while passing this material to third-party.