Cluster Tab
You can use the Cluster tab to review and select the clusters in the entity resolution file. The tab contains the following tabs:
Clusters - Displays the list of clusters contained in the entity resolution file. The list contains the following columns:
- Cluster
- Action
- Applied By
- Record Count
- Confidence
- Related Clusters
The toolbar at the top of the clusters list enables you perform the following tasks:
- Edit a cluster
- Apply changes to a cluster
- Go to a specified cluster
- Filter the cluster list
- Show the Cluster Analysis pane
Cluster Analysis Pane - Displays graphic views of the clusters in the entity resolution file. The toolbar enables you to select either a bubble plot view or a bar chart view. Note that you can put your cursor over the data points in the bubble plot or the bars in the bar chart to see more information.
Details Pane - Displays detailed information about a selected cluster. The Details pane contains the following tabs:
- Records - Displays the rows contained in the selected cluster.
- Related Clusters - Displays related clusters and cluster records for the selected cluster. Note that you can edit clusters and locate related cluster records.
- Notes - Enables you to review notes attached to the selected cluster.You also add new notes to the cluster, edit a note, and delete a note.
You can click Show Details in the toolbar adjacent to the menu bar to display the Details pane.
Confidence Values
Confidence values are usually derived from scores that are generated by the Match Code node in a data job, but only if the underlying Match definition is configured to give scores. The CI 2011A QKB is the first production QKB that supports Match definitions that can be configured to give scores.
A Confidence value shown in the Cluster tab is the lowest confidence value among the records in the cluster (i.e. for that match code). For example, assume that records 1 and 2 generate the following match codes and scores:
- Record 1: Match code XXX with score 90
- Record 1: Match code YYY with score 80
- Record 2: Match code XXX with score 70
- Record 2: Match code YYY with score 100
- Record 2: Match code ZZZ with score 60
The resulting clusters will be as follows:
- Cluster 0 (=Match code XXX): Record 1 with score 90, Record 2 with score 70 -> Summary confidence is 70
- Cluster 1 (=Match code YYY): Record 1 with score 80, Record 2 with score 100 -> Summary confidence is 80
- Cluster 2 (=Match code ZZZ): Record 2 with score 60 -> Summary confidence is 60