Clark School Home UMD

ISR News Story

D-Dupe: A Visual Interface for Relational DeDuplication (ISR IP)

ISR intellectual property available to license

Inventors:
Lise Getoor, Mustafa Bilgic, Louis Licamele, Ben Shneiderman

Description
It is no surprise that the amount of data being collected today is growing almost exponentially. This has led to multiple references to the same underlying entity leading to what is known as deduplication. Discovering, visualizing, analyzing and resolving duplicate records within the social networks are the challenges faced by the database communities.

Researchers at University of Maryland College Park have developed D-Dupe: an interactive tool that combines data mining algorithms for entity resolution with task-specific network visualization. The two novel features of D-Dupe are:

1. Stable Visual Layout Optimized for Entity Resolution: The stable and meaningful layout presents small sub-networks from large databases in a task-appropriate, simple, and surprisingly effective design for visually presenting information about potential duplicates.

2. Fine-grained Control for Combining Entity Resolution Algorithm: D-Dupe allows users the flexibility to apply and interleave different entity resolution algorithms. This feature when integrated with visualization of the common social context proves extremely efficient in resolving duplicates. The flexible combination of similarity measure provides a potent environment for decision making and recording of user actions for latter review.

Inventors have also explained the performance of D-Dupe on bibliographic datasets and walked through the procedure for removing duplicate entities. Challenge of data representation is solved effectively by combining visual and analytic information of data cleaning in an interactive tool. Powerful filtering and search techniques are also integrated into the tool to make it versatile.

Researchers have also investigated and demonstrated the application of D-Dupe for name resolution in email collections, place resolution in geospatial databases, and name resolution in academic genealogy datasets and found the tool to be highly effective.

| View a video illustrating the tool |

For more information
If you would like to license this intellectual property, have questions, would like to contact the inventors, or need more information, contact ISR External Relations Director Jeff Coriale at coriale@umd.edu or 301.405.6604.

Find more ISR IP
You can go to our main IP search page to search by research category or faculty name. Or view the entire list of available IP on our complete IP listing page.

ISR-IP-Shneiderman ISR-IP-databases ISR-IP-datamining

Related Articles:
HCE: Hierarchical Clustering Explorer (ISR IP)
Treemap 4.0 (ISR IP)
Treemap 3.0 (ISR IP)
Treemap 2000 (ISR IP)
Indexing RDF and Temporal RDF Databases (ISR IP)
The T-REX RDF Extraction System (ISR IP)
Optimal Data Diagnosis Algorithms (ISR IP)
GRIDL: Graphical Interface for Digital Libraries (ISR IP)
Excentric Labeling: Dynamic Neighborhood Labeling (ISR IP)
LifeLines for Visualizing History Records (ISR IP)

June 22, 2007


Prev   Next

 

 

For more information, contact ISR External Relations Director
Jeff Coriale at coriale@umd.edu or 301.405.6604.

Current Headlines

Gabriel Named "Professeur Invité Trottier"

Maryland Industrial Partnerships Program Approves 18 Technology Development Projects

Alum Haoyu Wang joins Shanghai Tech University

UMD Clark School to Host Mpact Week: Disaster Resilience

Piya Pal Receives Wilts Prize from Caltech

Barg receives NSF grant to develop better methods of storing large amounts of data

Mpact Week to feature robotics for disaster response

Waks, Shapiro receive NSF EAGER grant to test spintronic devices

Ulukus is PI for NSF grant on energy harvesting wireless communication devices

Gabriel to moderate DOE workshop Sept. 4

News Resources

Return to Newsroom

Search News

Archived News

Events Resources

Events Calendar