Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 53

Mapping science

using Bibexcel and Pajek

By Olle Persson
Relations

• Units of analysis
- document level
- aggregated level: authors, universities, countries, journals …
• Citation based relationes
- direct citations
- shared references
- co-citations
• Co-occurrences
- co-authorships
- co-word
Citatbased relations between dokuments

C D

A
B

A cites C = direct citation


A and C both cites B = bibliografic coupling
A and C are both cited by D = co-citation
Similarity measures

• Frequencies (raw counts)


- n of direct of citations
- n of co-occurences
- n of shared references
• Normalized measures
- Salton’s index
- Jaccard’s index
- Pearsons correlation
Mapping science

1. Preparing data
2. Calculating measures
3. Making maps

Good if you have some experience with Pajek.


You will learn the basics of Bibexcel in this tutorial!
You will need this material

1. A set of data
http://www8.umu.se/inforsk/esss/cocit569.tx2

2. Bibexcel sofware
http://www8.umu.se/inforsk/Bibexcel/bibexcel.exe
3. Pajek
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
4. Reading material
1st chapter in:
http://www8.umu.se/inforsk/Bibexcel/ollepersson60.pdf
Preparing data
Topic=(co-citation* OR cocitation*)
Databases=SCI-EXPANDED, SSCI,
A&HCI Timespan=All Years.
Update 2011-03-04

1. Convert to Dialog format


1. We have already searched and downloaded 569 records from
Web of Science on co-citation analysis and
2. We have already replaced line feeds with carriage return in the
downloaded file using Bibexcel:
Edit doc-file/Replace line feed with carriage return
3. The file to be used is cocit569.tx2
4. Put Bibexcel.exe in c:\Bibexcel and coccit569.tx2 in c:\
Bibexcel\Data
5. Start bibexcel.exe, and next we will have to convert to Dialog
format that Bibexcel is designed for
You can open Bibexcel and make
all steps in this presentation!

Select the cocit569.tx2 file and run Misc/Convert to Dialog format/Convert from Web of Science
Select cocit569.doc and press View file

Two letter field tag ; = Separates units | = End of field | |= End of record
2. Extracting data from CD- field (cited documents)

Put tag here Units are separated by semicolon Let’s start!


cocit569.out has the cited documents

This is the reference list of doc nr 1


3. Refining the out-file
To improve data quality the Edit out-files menu has several options. For example, you
may wish to reduce variation by only allowing the 1st initial in author names. Select
cocit569.out and run Edit out-files/Keep only author’s first initial
Look at cocit569.1st and you can see that EOM SB is changed to EOM S
Let’s improve a little bit more: Select cocit569.1st and run
Edit outfiles/Convert Upper lower Case/Good for Cited reference strings
Look at cocit569.low. I think this looks much nicer compared to the out-file!
Calculating data
1. Looking at frequencies
Select cocit569.low.
Tick here Choose Whole string Press Start!
Look at cocit569.cit which has the cited references in decreasing frequency!
For anyone familiar with co-citation research, the top 3 papers shouldn’t come as a
surprise.
2. Making co-citations
Select the cocit569.cit-file, press View file. In The list, mark cited references down to frequency=30 and
then press Copy, then Clear and then Paste. These are the references for which you want co-citations
Select the cocit569.low-file, and run Analyze/Co-occurrence/Make pairs via listbox, and answer No
to the next question, and OK for the question after that!
The cocit569.coc had the co-citation frequencies. We will use that file for mapping!
Select cocit569.coc and run Mapping/Create net-file for Pajek … be sure to answer No to the question if
directed arcs, since we do not have any directions here.
The cocit569.net file can be opened from within Pajek, Netdraw, Mapquation etc for drawing maps.
Mapping with Pajek
Open cocit569.net file in Pajek, and then Draw/Draw
This is the first layout with randomly ordered nodes.
To the upper left, choose
Layout/Energy/Kamada-Kawai/Separate components or just press Ctrl-K
The Kamada-Kawai layout is better but still there is perhps too many lines in the graph,
since almost everyone is connected to all others
To reduce complexity minimize the
draw window and then run
Net/Transform/Remove/Lines with
Value/lower than/ and put 10 in the
box and answer yes to Make new
network.
After that run Draw/Draw again!
This map ha more structure. We find that papers to the left and newer ones to the right.
You can press Ctrl-K several times to see what happens
Making vectors
Making circles on nodes based on citation frequencies. Go to Bibexcel and select cocit569.cit
and the run Mapping/Create vec-file. Below you can see that cocit569.vec is created
Go back to Pajek. Open the Vector file cocit569.vec
and then run Draw/Draw-Vector
Now you can see that circles correspond to n of citations
Making partitions
If you wish you can create a clu-file using Bibexcel that indicates the publication year, or decade
of the cited documents.
1. Select cocit569.cit and run Edit out-file/Extract publication year from references
2. and you will get a file named cocit569.dpy.
3. Select cocit569.dpy and run Mapping/Create clu-file
4. and you will get a file named cocit569.clu
5. Go to Pajek and open cocit569.clu as partiotion
6. Run Draw/Draw-Partition-Vector and then in the draw window Layers/In y direction
Makes sense?
Using Options/Lines/Different This could be a
Widths and GreyScale and chronological reading list
Options/Size/Of lines = 0.25 for reviewers and students
Bibexcel makes so many files….
1. cocit569.tx2: text-file where LF was replaced by CR
2. cocit569.doc: converted to Dialog-format
3. cocit569.out : out-file based on CD-field
4. cocit569.1st : keep only author’s first initial
5. cocit569.low: convert to upper and lower case
6. cocit569.cit: frequencies
7. cocit569.coc: co-occurrences
8. cocit569.net: net-file to be open in Pajek
9. cocit569.vec: vec-file to be open as Vectors in Pajek
10. cocit569.clu: clu-file to be open as Partitions in Pajek
11. cocit569.vel: vertices for net-file for use by Bibexcel

…. but better to have them than not!


All author co-citation analysis using Scopus records
“Its always better not to limit to 1st cited author as in WoS”

1. Get scopuscocit.ris from http://www8.umu.se/inforsk/esss/scopuscocit.ris


2. Select scopuscocit.ris and run Edit doc-file/Replace line feed with carriage return
3. Select scopuscocit.tx2 and run Misc/Convert to Dialog format/Convert from Scopus RIS
format
4. Select scopuscocit.doc, put CD in Old tag, choose “Any ; separated field” and press Prep
5. Select scopuscocit.out and run Edit out-file/Scopus tools/Extract all authors from Scopus
references
6. Select scopuscocit.sco and run Edit out-file/Decompress outfile
7. Select scopuscocit.nnu, choose Whole string, mark Remove duplicates and Make new
out-file, and then press Start
8. Select scopuscocit.oux, mark Sort decending and press Start
9. Select scopuscocit.cit and press View file and select units down to frequencies=30, and be
sure only these are in The List
10. Select scopuscocit.oux and run Analyze/Co-occurrences/Make pairs via list box
11. Select the scopuscocit.coc file and then run Mapping/Create net-file for Pajek…
12. Select scopuscocit.cit and run Mapping/Create vec-file
13. Go to Pajek and open scopuscocit.net as Network and scopuscocit.vec as Vectors
14. Run Draw/Draw-Vector…
Draw-vector
To reduce complexity minimize the draw window and
then run Net/Transform/Remove/Lines with
Value/lower than/ and put 10 in the box and answer
yes to Make new network.
After that run Draw/Draw-vectorand then ctrl-K

Griffith BC
would probably
not show up in
1st author
analysis

Webo
metrics

Go back and fix this


variant!
For vector graphic quality. At the Draw window run
Export/2D/SVG/General and save as allauthormap.htm
Get Inkscape free from http://inkscape.org/download/
and open allauthormap.htm, edit and export to png-format
Analyzing direct citations on Web of Science records
1. Select cocit569.low and run Analyze/Citations among docs/Make citation
links. This will make cocit569.lin that has citing docnr in first column and cited
docnr in second column.
2. Of course you need to label the doc numbers.
Select the cocit569.ddc and double click in the box at “Type new file name here”
and the path to cocit569.ddc should appear.
3. Select cocit569.lin and run Add data classify/Add labels to docnr-docnr
pairs. Answer No to questions about swapping, self-related pairs, overlapping
sets, and about writing doc numbers in addition to labels
4. Select cocit569.add and then run Mapping/Create net-file for Pajek and
answer Yes for directed graphs!
5. Open cocit569.net in Pajek and Draw/Draw
6. You will need to reduce complexity: Run
Net/Transform/Reduction/Degree/Input and set value=15. Then Draw!
7. If you would like to have different circle sizes: Minimize Draw window and then
run Net/Vector/Summing up values of lines/Input a Vector is created that has
the number of inlinks to each node. Then Draw/Draw-vector…
Analyzing using Weighted Direct Citations (WDC)
We can add number of shared outlinks and inlinks to each direct citation, to
give each direct citation different strength
1. Select cocit569.lin and run Analyze/Citations among docs/ Weighted Direct
Citations (WDC). The cocit569.wdc has the WDC values for each docnr-docnr
pair
2. Again you need to label the doc numbers.
Select the cocit569.ddc and double click in the box at “Type new file name here”
and the path to cocit569.ddc should appear.
3. Select cocit569.wdc and run Add data classify/Add labels to freq-docnr-
docnr/making freq-label-label. Answer No to questions about swapping, self-
related pairs, and overlapping sets.
4. Select the cocit569.cdd file and run Edit out-file/Sort numeric/Descending by
first column and you will see which are the strongest links by the WDC measure
5. Select cocit569.cdd and run Mapping/Create net-file for Pajek, and answer
Yes for directed arcs!
6. In Pajek use Net/Transform/Remove/Lines with Values/Lower than=10!
7. Then Draw/Draw and you will see one big network component and several
smaller ones and quite many isolates. You can zoom in to the bigger one by
pressing right mourse button and draw.
8. If you go back to Pajek main window and run Net/Components/Weak and type
size=20 you will get 1 component and then with Operations/Extract from
network/Partition=1 you will get a new network with the big component. Then
Draw that network!
…further improvement by saving major component and
adding new partitions and vectors

1. Be sure to mark the main component (with 63 nodes)


2. Then File/Network/Save and then overwrite cocit569.net
3. In Bibexcel select the cocit569.net and run Mapping/Create vel-file
from net-file
4. Select the cocit569.ddc file and run and run Edit out-file/Extract
publication year from references
5. Select cocit569.dpy and run Mapping/Create clu-file
6. Open cocit569.clu as Partition in Pajek and then Draw/Draw-partition
and then Layers/In y direction
7. If you would like to have different circle sizes: Minimize Draw window
and then run Net/Vector/Summing up values of lines/Input a Vector
is created that has the sum of WDC values of inlinks to each node.
Then Draw/Draw-Partition-Vector…
…reduce direct citations by citation year lag
1. Select cocit569.cdd and run Analyze/Calculate year lags in pairs and answer Yes to add
year lag values, which will come in column 1. Column 2 has a normalization (col.3 divided by
col.3,) and col. 3 has the WDC value, col. 4 citing doc and col.5 cited doc.
2. Select cocit569.lag and to get year lags 0-2 years put 2 in Max number Box and then run Edit
out-files/Delete values high frequencies
3. Select cocit569.max, put 3/4/5 in The Box and run Edit out-file/Select columns
4. Now cocit569.col has WDC values only for links no older than 2 years!
5. Select cocit569.col and run Mapping/Create net-file for Pajek
6. Go to Pajek and open the net-file and the vec-file! Removed lines with values less than 5, then
Net/Componenets/Weak (min 20), then extract and save the major component to file
cocit569.net
7. In Bibexcel, select cocit569.cdd, put 1/3 in The Box and run Edit out-files/Select columns,
and then select cosit569.col and make frequencies with whole string, then cocit569.cit will
have number of times a paper is cited.
8. In Bibexcel select cocit569.net and run Mapping/Create vel-file from net-file and then select
the cocit 569.cit and run Mapping/Create vec-file
9. Back to Pajek and open the vec-file, and then Draw/Draw-vector
r e!
he
is
n
s io
en
m
di
e
m
Ti
…also, you can reduce co-citations by citation year lag

1. Select cocit569.coc and run Analyze/Calculate year lags in pairs


and answer Yes to add year lag values
2. Select cocit569.lag and to get year lags 0-5 years put 5 in Max number
Box and then run Edit out-files/Delete values high frequencies
3. Select cocit569.max, put 1/4/5 in The Box and run Edit out-file/Select
columns
4. Now cocit569.col has co-citations values only for pairs no older than 5
years!
5. Select cocit569.col and run Mapping/Create net-file for Pajek
6. Also select cocit569.cit and run Mapping/Create vec-file
7. Go to Pajek and open the net-file and the vec-file!
The same graph as previous, but now
ordered in year layers and edited using
Inkscape
The End

You might also like