Removing duplicates from an EndNote library

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Removing duplicates from an EndNote library

blogs.lshtm.ac.uk/library/2018/12/07/removing-duplicates-from-an-endnote-library/

December 7, 2018

If you have done any systematic review searching, you will have spent time removing
duplicate references from your results. Faced with the prospect of deduplicating 26k
results, I put out a plea/rant on twitter.

If anyone knows of a system which is good at removing duplicates from >20


databases, let me know. 4/

— Jane Falconer (@falkie71) November 17, 2018

AS often happens, lovely library colleagues came to the rescue. Naila Dracup
(@nailadracup) sent me a link to a guide written by Judy Wright (@jmwleeds) and the
AUHE Information Specialists at the University of Leeds.

If you don’t already use this method: https://t.co/bzxQ9jT2yy

— Naila Dracup (@nailadracup) November 17, 2018

Wichor Bramer (@wichor) has also written a paper about how to do this, which he
pointed out on twitter.

Mine works faster i think, and sensitivity is 99.5% and error margin 1 in 3000.
I’d try my method.
Page numbers should be adapted, as medline has abreviated page numbers. My
article describes how that can be done.
I import medline first then export with a special style & reimport

— Wichor Bramer, PhD (@wichor) November 17, 2018


1/6
You can find Wichor’s paper at Bramer WM, et al. De-duplication of database search
results for systematic reviews in EndNote. J Med Libr Assoc. 2016;104(3): 240-3.
doi:10.3163/1536-5050.104.3.014.

Below I’ve re-written the instructions provided by Leeds University Library as I have
tested them myself. I’ve not had a chance to try Wichor’s technique. Let me know in the
comments if you have given it a try.

1. Importing your references into EndNote

1.1 Import your results in the correct order


Did you know that the order that you import your references can have an impact on the
quality of the information your EndNote library contains? This is because when
EndNote removes duplicates, it automatically leaves the first copy added to your library
and removes subsequent copies. So if you import your results from a database which
doesn’t have abstracts (for example), then import results from one which does, the copy
with the abstract will automatically be deleted.

It is recommended you import your references in the following order:

1. Medline
2. Embase
3. Medline in process (if included)
4. Other databases from OvidSP (PsycInfo, EconLit etc)
5. PubMed
6. Cinahl Plus
7. Other databases from Ebsco
8. Web of Science databases
9. Scopus
10. ProQuest databases
11. Cochrane databases
12. CRD databases
13. Any other databases
14. Clinical Trials websites

If you haven’t searched one or more of these databases, that’s fine. Just go to the next
on the list. There is instructions on the LAS Databases page on how to import results
from most of these databases to EndNote.

Always import all results into EndNote.

1.2 Organizing your imported references

2/6
I also recommend you organize your results into groups and add keywords so that you
can keep track of where each reference has come from. I create a group for each
database and drag and drop my results into the group as i’m importing. I also add a
keyword to each reference which details the database the reference has been retrieved
from. ITS EndNote training can tell you more about creating groups and editing fields
in EndNote.

2. Set up your EndNote library for accurate duplicates removal


Once you have all of your references uploaded and organised in groups, display the
following fields in EndNote so that you can accurately spot duplicates.

Record number
Author
Year
Title
Journal/Secondary Title
Pages
Volume

Do this by going to Edit > Preferences then clicking the ‘Display Fields’ option.

3. Find duplicates
Finding duplicates is a multi-stage process. This is because each database formats the
information slightly differently, making accurate machine spotting of duplicates very
difficult.

3.1 Step 1
Set the ‘find duplicates’ preferences to Author, Year, Title, Journal. Make sure
‘Ignore spacing and punctuation’ is checked.

Sort all references by Journal and highlight those with a journal title in the journal field
(ignore those with a blank journal field). Run ‘Find Duplicates’ and click ‘Cancel’ in the
resulting dialog box. You will see a new group has appeared called ‘Duplicates’ with
duplicates highlighted. Click ‘Delete’ on the keyboard to move the highlighted items to
the trash. These do not need to be checked.

3.2 Step 2
Set the ‘find duplicates’ preferences to Author, Year, Title, Pages. Make sure ‘Ignore
spacing and punctuation’ is checked.

Sort all references by Pages and highlight those with a page number in the pages field
(ignore those with a blank pages field). Run ‘Find Duplicates’ and click ‘Cancel’ in the
resulting dialog box. You will see the duplicates group has been updated with a new
3/6
group of duplicates. Click ‘Delete’ on the keyboard to move the highlighted items to the
trash. These do not need to be checked.

3.3 Step 3
Set the ‘find duplicates’ preferences to Title, Journal, Pages. Make sure ‘Ignore
spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check the references with no page numbers or page numbers
beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting
or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 4
Set the ‘find duplicates’ preferences to Year, Title, Pages. Make sure ‘Ignore spacing
and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check the references with no page numbers or page numbers
beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting
or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 5
Set the ‘find duplicates’ preferences to Title, Pages. Make sure ‘Ignore spacing and
punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check the references with no page numbers or page numbers
beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting
or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 6
Set the ‘find duplicates’ preferences to Author, Year, Journal, Pages. Make sure
‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check the references with no page numbers or page numbers
beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting
or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 7
4/6
Set the ‘find duplicates’ preferences to Author, Year, Title Make sure ‘Ignore spacing
and punctuation’ is checked.

Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check the references with no title, and select/deselect duplicates
by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to
move the highlighted items to the trash.

Step 8
Set the ‘find duplicates’ preferences to Author, Year, Journal. Make sure ‘Ignore
spacing and punctuation’ is checked.

Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check all references by looking at the page numbers field, and
select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click
‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 9
Set the ‘find duplicates’ preferences to Author, Year. Make sure ‘Ignore spacing and
punctuation’ is checked.

Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check all references by looking at the page numbers field, and
select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click
‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 10
Set the ‘find duplicates’ preferences to Year, Title. Make sure ‘Ignore spacing and
punctuation’ is checked.

Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check all references by looking at the page numbers field, and
select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click
‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 11
Set the ‘find duplicates’ preferences to Title. Make sure ‘Ignore spacing and
punctuation’ is checked.

5/6
Sort all references by Ttile. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting
dialog box. You will see the duplicates group has been updated with a new group of
duplicates. Manually check all references by looking at the page numbers field, and
select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click
‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 12
Now, you can catch the final few duplicates by manually picking them out. Sort your
entire EndNote library by title and make the title column very wide so that you can see
lots of the title words. Carefully look at your titles and remove the duplicate with the
highest reference number. Be aware that sometimes translated titles are displayed in
[brackets].

Step 13
Repeat step 12 but this time sort on page numbers and remove any duplicates.

Now you should have removed your duplicates. Numbers of references remaining in the
groups can be used to complete the PRISMA diagram. Your results can now be used in
the screening process.

6/6

You might also like