CiviCRM

Deduping and Merging

Duplicate contacts can turn up in your data for many reasons, such as mistakes by users who don't realise they're creating a contact for someone who is already in CiviCRM, duplicates that aren't caught in the import process and duplicate records created when people fill in forms about themselves (maybe with their names spelled differently or with a different email address) on your site without realising they're already in your list of contacts.

CiviCRM is equipped with duplicate matching rules that are applied automatically when new contacts are created, and can be run manually at any time to search for duplicates. You can configure these rules to suit your needs.

To view the dedupe rules, go to Contacts > Find and Merge Duplicate Contacts in the navigation menu. This displays the following screen:

rules

From the screen, here's an example of a process to dedupe all individuals in your data:

  1. Start by looking for dupes using a strict rule: click the Use Rule link for the third row (Contact Type: Individual, Level: Strict).
  2. Select All Contacts or a particular group.
  3. Click Continue.
  4. If duplicates are found, merge or delete the duplicate contacts.
  5. Now look for dupes using a fuzzy rule to find those dupes that were missed with the stricter rule: Click the Use Rule link for the fourth row (Contact Type: Individual, Level: Fuzzy).
  6. Select All Contacts or a particular group.
  7. Click Continue.
  8. If duplicates are found, merge or delete the duplicate contacts.

Different rules are configured for each contact type (individuals, organizations, and households.)  A default fuzzy rule and a default strict rule is set for each contact type. The default rules are used when CiviCRM invokes automatic checking, in ways we'll explain in detail shortly.

Strict and fuzzy rules

CiviCRM includes two categories of dedupe rules:

  • Strict: this type of rule places a priority on avoiding false matches, and therefore applies relatively rigid criteria. It is therefore possible to sometimes miss real duplicates.

    Strict rules are invoked during imports to scan for duplicates without user intervention. These rules are used here because it is easier to sort out duplicates later than to disentangle two incorrectly merged contacts.

    An example of a strict rule is one that matches individual contacts only if three criteria are met:  identical email addresses, first names, and last names. This rule would allow both Mike Tael and Michael Tael into the database because only two criteria are met: last name and email rather than first name, last name, and email.

    Default strict rules are also automatically checked when new contacts are created through online registrations including events, membership, contributions, and profile pages, and when you create a contact through CiviCRM's programming API.

  • Fuzzy: this type of rule has a relatively loose definition of matches in the hope of catching as many possible duplicates as possible.

    Fuzzy rules are used in instances where human intelligence can be applied to decide whether a match is accurate. This means that a wider range of possible match results is both permissible and useful.

    Default fuzzy rules are automatically used to check for possible duplicates when contacts are added or edited via the CiviCRM user interface (the default strict rules are automatically used when contacts are added or edited via a Profile, the API, or on import). You'll probably also want to use a fuzzy rule when scanning your database for possible duplicates.

Configuring rules

To determine whether two contacts are duplicates, CiviCRM checks up to five fields that you can specify. You can also set a length value which determines how many characters in the field should be compared. For example, if you set a length of 2 on the First name field, a first name of "Mike" would match "Michael" and they would be recognized as duplicates, because the first 2 characters are the same. However, if you set the length to 3 instead, "Mike" would no longer match "Michael" and they would be accepted as different contacts. If the length value is left blank, the comparison is done on the entire field value.

Each field is also configured with a numeric weight that determines the relative importance of a match on that field. When a match is discovered on a field, that field's weight is added to the total weight for the rule. After each field is checked, if the total weight is equal to or greater than the numerical threshold set for the rule, the contacts being compared are flagged as suspected duplicates.

Using rules and merging duplicate contacts

  1. Go to Administer > Manage > Find and Merge Duplicate Contacts
  2. Click the Use Rule link to scan for duplicate contacts using the selected rule. 
  3. You can then select to search all contacts for duplicates or to search only a particular group. Contacts of the type to which the rule is assigned will be scanned and compared. If the match between two contacts exceeds the rule's threshold, the contacts will be displayed on the following screen of possible duplicates.
  4. Clicking Merge for any pair of contacts brings up a table showing details for each contact. CiviCRM designates one record as the duplicate record and displays its information in the left column. The record in the right column is considered the original record into which selected data from the duplicate record will be merged. 
  5. If you want to move the information in the opposite direction, you can swap the duplicate and original contacts by choosing Flip between original and duplicate contacts.
  6. For each field, you can choose whether to keep the original data shown on the right (don't check the check-box in the middle column), or use the value from the duplicate contact instead (check the box). For the email addresses or phone numbers, you can decide to keep both the value of the duplicate and of the original (check both the checkbox in the middle column and the "add new" on the right column) to copy the duplicate data. Note that associated tags, groups and activity data (including event attendance, contributions, etc.) will appear in addition to data already recorded in the original record, not in place of it. It is safer in general to keep the tags, groups and activities of both contacts after the merge.

Merging contacts from search results

If you notice duplicate contacts within a set of search results you can quickly merge them directly from the search results instead of using the separate Find and Merge Duplicate Contacts process. This is a great way to clean up your database during your everyday workflow with minimal disruption.

  1. Select the duplicate contacts from your search results by clicking the check box at the left side of each record. 
  2. Select Merge Contacts from the - more actions - menu.
  3. Click Go.
  4. Follow the normal steps for merging duplicate contacts.