Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/2050-3806.htm

Rotten web citations cited in Rotten web


citations in
scholarly journals: use of time scholarly
journals
travel for retrieval
B. Niveditha and Mallinath Kumbar 225
Department of Studies in Library and Information Science, University of Mysore,
Mysuru, India, and Received 15 May 2021
Revised 23 August 2021
B.T. Sampath Kumar 16 October 2021
Accepted 21 October 2021
Department of Studies and Research in Library and Information Science,
Tumkur University, Tumkur, India

Abstract
Purpose – The present study compares the use of web citations as references in leading scholarly journals in
Library and Information Science (LIS) and Communication and Media Studies (CMS). A total of 20 journals
(each 10 from LIS and CMS) were selected based on the publishing history and reputation published between
2008 and 2017.
Design/methodology/approach – The present study compares the use of web citations as references in
leading scholarly journals in LIS and CMS. A PHP script was used to crawl the Uniform Resource Locators
(URLs) collected from the reference list. A total of 12,251 articles were downloaded and 555,428 references were
extracted. Of the 555,428 references, 102,718 web citations were checked for their accessibility.
Findings – The research findings indicated that 76.90% URLs from LIS journals and 84.32% URLs from
Communication and Media Studies journals were accessible and others were rotten. The majority of errors were
due to HTTP 404 error code (not found) in both the disciplines. The study also tried to retrieve the rotten URLs
through Time Travel, which revived 61.76% rotten URLs in LIS journal articles and 65.46% in CMS journal
articles.
Originality/value – This is an in-depth and comprehensive comparative study on the availability of web
citations in LIS and CMS journals articles spanning a period of 10 years. The findings of the study will be
helpful to authors, publishers, and editorial staff to ensure that web citations will be accessible in the future.
Keywords Web citations, URLs, Library and Information Science, Communication and Media Studies,
HTTP errors, Time Travel
Paper type Research paper

Introduction
The World Wide Web (Web), which is based on hypermedia, was invented to access, organize
and share information and has now become a crucial means for conducting scientific research.
Journal articles contain references that appear at the end of the publication. The term
“reference” refers to a publication that is listed in the reference section of a citing publication
(Ding et al., 2014). The references provide an opportunity for researchers to explore and expand
their areas of research. The development of the Internet and the World Wide Web during the
past decade has had a profound impact on the references. With the vast quantity and easily
accessible documentation available on the Web, many authors often use web citations as part of
the attribution process when it comes to acknowledging supporting material in their
publications (Germaine, 2000). Web citations are the mentions of Uniform Resource Locators
(URLs) of web pages. Web citations are also termed as “web references”, “online citations”,
“URL citations” or “electronic references”. The web citations make it easier for readers to access
Aslib Journal of Information
the resources cited in the references. But as the Web is extremely ephemeral, most of its Management
information becomes unavailable and is lost forever after a short period. The web resources Vol. 74 No. 2, 2022
pp. 225-243
cited become inaccessible or rotten in the future. The rotten web citations pose a severe issue for © Emerald Publishing Limited
2050-3806
the academic community. The references may have less academic value if the cited content is DOI 10.1108/AJIM-05-2021-0139
AJIM not available to the readers. The preservation of digital materials is crucial to modern societies
74,2 because web publications are extremely transient. To overcome the problem of rotten web
citations, preserving electronic resources has become inevitable to prevail over the accessibility
of the resources. Web archives help in preserving web resources permanently and enable long-
term access to the web citations. In this context, the present study has attempted to check the
accessibility of web citations in two disciplines during the period 2008–2017 using a PHP script.
Apart from checking the availability of web citations, the script also obtains the characteristic
226 features of URLs like the file format, top-level domain, path depth and character length. This
was obtained to determine how characteristic features of URLs influence their accessibility. The
study also aims at retrieving the rotten web citations cited in scholarly articles through Time
Travel. The Time Travel memento is a multi-archive recovery service, which can retrieve rotten
web citations from multiple web archives.

Literature review
The Internet and the World Wide Web have become among the main communicational tools
for academicians to use as they have made the articles in scholarly journals to be available
electronically and in open access format. This has resulted in a change in research publishing
landscape as there is a shift from traditional print publication to electronic publication like e-
books, e-journals, e-theses and dissertations and e-prints of research articles. The electronic
publication has increased the scope for researchers and authors in various subject fields and
stimulated their research productivity (Saberi and Abedi, 2012). The Web has become the
first choice for the academicians to search for information (Zhao and Logan, 2002). As there is
more electronic resources on the Web, authors refer to more and more web resources to
increase their research productivity. Isfandyari-Moghaddam and Saberi (2010) have stated
that the Web has also influenced the citing behaviour of researchers and this, in turn, has
influenced the growth of web citations in the form of web links or DOIs at the end of the
reference. It has become necessary for researchers to focus attention on the frequency with
which the authors use web information to document their scholarly research (Casserly and
Bird, 2003).
Citations to web resources have been studied since as early as the mid-1990s. Researchers
noticed that the proportion of web citations in scholarly literature has been growing. Harter
and Kim (1996) analysed 4,317 references cited in 279 articles by 74 e-journals and found only
1.9% of the references were electronic resources. Herring (2002) found that web citations
accounted for 16% out of the total 4,289 unique references. The findings in the study reflect
the fact that a radical change in information-seeking behaviour and information resource use
is taking place as scholars and researchers are becoming more comfortable and familiar with
the resources available through the Web.
Web citations have become common in scholarly publications in all disciplines (Sampath
Kumar and Prithviraj Raj, 2012), for instance, Dimitrova and Bugeja (2006) in their study on
six leading communication journals found a total of 1,600 online citations. Their number
increased starting from 276 online citations in 2000, 300 in 2001, 485 in 2002 and 539 in 2003.
In other words, the number of online citations in articles from 2000 to 2003 almost doubled.
Thorp and Brown (2007) made a comprehensive analysis of web references in the Annals of
Emergency Medicine published in 2000, 2003 and 2005 and obtained a total of 586 web
references from the 15,745 references cited. Russell and Kane (2008) observed the citation
pattern in two history journals published between 2000 and 2005 and found that there were a
total of 510 web citations with an average of 3.9 per article. Mardani (2012) surveyed the
available web citations in chemistry articles and discovered that there were a total of 46,762
(24.9%) web citations extracted from 1, 87,823 available citations.
LIS authors refer to web resources as part of their increased research productivity, and this
has increased the number of web citations in scholarly papers in LIS (Zhao and Logan, 2002).
Many studies have investigated the use of URL citations in LIS scholarly journals. M^egnigb^eto Rotten web
(2006) studied the use of web resources among undergraduate students’ dissertations in LIS citations in
from 1997 to 2004. The total number of web citations over the period 1997–2004 is 91 from 25
dissertations. This yielded a ratio of 3.64 web citations per dissertation. Riahinia et al. (2011)
scholarly
considered six LIS journals during 2005–2008 for their study. Of the 37,791 citations, 4,840 journals
(12.8%) were web citations. Sampath Kumar and Manoj Kumar (2012) conducted a study on
URL citation in two LIS open access scholarly journals between 1996 and 2009 and found that
2,890 (18.77%) were URL citations. Vinay Kumar et al. (2015) investigated citations cited in two 227
LIS journal articles published between 2008 and 2012. The study found that 23.81% (2,477 out
of 10,400 references) of URLs were cited in the journal articles. In the same vein, Chikate and
Patil (2009) explored web citations in conference proceedings of LIS. They stated that web
resources were becoming the preferred source of information for LIS professionals.
Though many comparative studies were made to study the citation behaviour among
various disciplines (Vaughan and Shaw, 2005; Chen et al., 2009; Yang et al., 2010; Yang et al.,
2012), a comprehensive interdisciplinary comparative study with a longer period is yet to be
undertaken. This would give an insight into the web citation behaviour of various domains
(Riahinia et al., 2011).
Although the use of web citations in scholarly writing is increasing, the decay of web
resources led to an emerging challenge since they are constantly being threatened by decay
and disappearance (Isfandyari-Moghaddam and Saberi, 2011). The reasons for the non-
persistence of web citations were failure to maintain old links while restructuring websites
(Lawrence et al., 2001), broken links and restructuring the file hierarchy by some providers
(Markwell and Brooks, 2003), server problems and invalid URL hostname or paths
(Spinellis, 2003).
The domain names associated with missing web citations are also discussed in many
studies. Sampath Kumar and Manoj Kumar (2012) found that the top-level domain having the
greatest number of missing URLs was the commercial domain (.com). The finding is also in
line with those of Dimitrova and Bugeja (2007b), Goh and Ng (2007) and Saberi and Abedi
(2012) who reported that the commercial domain was among those with poorer stability and
persistence. The error message “HTTP 404” accounted for the majority of all inactive URLs,
and this was reported in the studies of Sadat-Moosavi et al. (2012) and Jalalifard et al. (2013).
However, the missing web citations could be retrieved through Internet archives. Many
studies have attempted to retrieve the inaccessible web citations using various search
engines and Wayback Machine (Dimitrova and Bugeja, 2007b; Tajeddini et al., 2011; Sampath
Kumar et al., 2015). A recent study also used Time Travel to retrieve missing URLs (Vinay
Kumar and Sushmitha, 2019). It was noted that the accessibility rate increased after the
retrieval of missing web citations.
Though the Internet archives retrieved some lapsed web citations, some web citations
could not be located. This is because the archives’ purpose is to resurrect dead web pages
rather than preserve information in a form of use for scholars (Dimitrova and Bugeja, 2007b).
Russell and Kane (2008) stated that the current archival methods ameliorate, but do not solve,
the problem of link rot. The Internet archives thus should be used in conjunction with various
web preservation strategies. Wren et al. (2006), Casserly and Bird (2003) and Dimitrova and
Bugeja (2007a) suggested that publishers, editors and authors should work together through
systematic checking of the web citations before publication, getting backup of cited
information and using the more stable file formats and domains. Saberi and Abedi (2012)
reported that the use of the Digital Object Identifier (DOI) system and Uniform Resource
Names (URNs) was the best solution to prevent the decay or disappearance of web citations.
Sampath Kumar et al. (2015) stated that URL citations used in the reference list should include
a detailed bibliographical description. Publishers should require authors to adhere to the
citation policies, styles and formats established by their journals and should archive the
AJIM online citations cited in the articles they publish. Rumsey (2002) recommended that the
74,2 authors should provide parallel print citations where possible. These suggestions would be
useful for authors, editorial staff and publishers, and they need to work together to improve
existing citation conventions, promote URL use and ensure that cited resources are accessible
to future researchers.
This present study aims to extend the above-said studies by comparing the use and
accessibility of web citations in two disciplines through the PHP script and retrieving the
228 rotten URLs through Time Travel.

Research questions
The study has been conducted with the following research questions:
RQ1. What percentage of web citations is used in the journal articles of LIS and CMS?
RQ2. What percentage of URLs is rotten and which is the most prominent HTTP
error code?
RQ3. What percentage of URLs is retrieved through Time Travel?

Hypotheses of the study


Based on the above research questions, the following hypotheses were formulated:
H1. The use of web citations and the year are positively correlated.
H2. The age of publication and the percentage of rotten URLs are positively correlated.
H3. The path depth and percentage of retrieved URLs are positively correlated.
H4. The character length and percentage of retrieved URLs are positively correlated.

Research methodology
Selection of journals
For the present study, data were drawn from 20 leading scholarly journals: 10 from LIS and
10 from CMS. The journals were selected based on their high-impact factor as per Clarivate
Analytics’ 2018 “Journal Citation Report”. The journals selected for the current study are
listed in Table 1.

Selection of articles and references


All the research articles published during the 10-year period, that is, from 2008 to 2017, were
taken up for the study. Editorial notes, book reviews and short communication were
excluded. The references that were adjoined at the end of each article were considered for the
study. A total of 555,428 references were selected from 12,251 articles published in the 20
journals.

Extraction of URLs
The references that contained web links and DOIs were extracted as the study deals with
their accessibility. The DOIs and arXiv identifiers were first resolved to URLs using the
syntax https://dx.doi.org/. For example, a DOI name 10.1010.1234/567 would be resolved
from the address https://dx.doi.org/10.1010.1234/567. Similarly, the arXiv identifier was
resolved to URLs using the syntax https://arxiv.org/. A total of 102,718 URLs were extracted
for checking their availability.
Library and Information Science Communication and Media Studies
Rotten web
Impact Impact citations in
Journal factor Journal factor scholarly
Journal of Informetrics (JOI) 3.484 Journal of Computer-Mediated 4 journals
Communication (JCMC)
Information Processing and Management 3.444 Journal of Communication (JOC) 3.729
(IPM) 229
Journal of the Association for Information 2.835 Communication Research (CR) 3.391
Science and Technology (JASIST)
Scientometrics 2.173 New Media and Society (NMS) 3.121
College and Research Libraries (CRL) 1.626 Information, Communication and 3.084
Society (ICS)
Journal of the Medical Library Association 1.541 Journal of Advertising (JOA) 2.88
(JMLA)
Portal: Libraries and the Academy (Portal) 1.473 Political Communication (PC) 2.738
Aslib Journal of Information Management 1.461 Communication Theory (CT) 2.733
(AJIM)
Journal of Academic Librarianship (JAL) 1.459 Media Psychology (MP) 2.57 Table 1.
Library and Information Science Research 1.372 Public Understanding of Science 2.452 Journals selected for
(LISR) (PUOS) the study

Testing URLs and examining their lexical features


A PHP script was developed to test bulk URLs. The script uses the CURL library, a standard
PHP extension to check for URL availability. The URLs which are not available are
documented as rotten URLs. The script also determines the error code associated with rotten
URLs. The lexical features of URLs like their file format, top-level domain, path depth and
character length were determined to know how they influence the rottening of URLs. The
path depth of the URL or the URL depth is the number of directory levels in the URL. In the
URL http://example.com/level-1/level-2/level-3/, each subdirectory after the domain name
“example.com” corresponds to a particular path depth. The subdirectory level-1 has a path
depth of 1; level-2 has a path depth of 2 and so on. The more number of levels in a URL can
increase the complexity and accessibility of the URL as there is an increased possibility of a
particular subdirectory being removed or renamed. Similarly, the URLs should not exceed a
certain number of characters to make crawling easier for the search engines. The URL length
is usually not considered as a ranking factor, but it is believed that the shorter URLs are
always convenient for the users to understand and are much easier to share or bookmark.

Retrieval of rotten URLs


The study used Time Travel (http://timetravel.mementoweb.org/) to find whether the URLs
were archived or not. The Time Travel retrieves the rotten URLs that are archived in Internet
Archive, Library of Congress Web Archive, Archive-it, Perma-cc and so forth. The URLs that
were not archived were considered missing URLs.

Results
Year-wise distribution of articles, references and web citations
A total of 7,986 and 4,265 articles were published in Library and Information Science (LIS) and
Communication and Media Studies (CMS) journals, respectively, during the period 2008–2017.
The data presented in Table 2 depict that the highest number of articles in LIS journals were
published during the year 2016 (959) and the least percentage of articles in 2008 (637). Though
there was a surge in the number of articles when compared to the previous years, there was a
AJIM slight decline in the number of articles during the year 2017, from 959 to 951. In a trend similar
74,2 to the LIS journals, the CMS journals had the highest number of articles during the year 2016
(539) and the least number of articles were seen during the year 2008 (337). There was a dip in
the number of articles in CMS journals during the years 2010 and 2017. Overall, it is evident that
the number of articles in both disciplines has grown from a low in 2008 to a high in 2016. The
articles in LIS journals contained a total of 324,636 references, which is higher than the
references in CMS journals, which have a total of 230,792 references.
230 Table 2 also summarizes the year-wise distribution of web citations in LIS and CMS
journal articles. This answers research question 1, which was to find the percentage of web
citations used in the journal articles of LIS and CMS. It can be seen that the number of web
citations in LIS journal articles (51,839) is more than in CMS journal articles (50,879). The rate
of use of web citations corresponds to the use of references. Pearson’s correlation analysis
was performed, and it was found that the number of references and web citations was
positively correlated in LIS journals (r(8) 5 0.965, p < 0.001) and CMS journals (r(8) 5 0.960,
p < 0.001). As the p-value is less than 0.05, there is statistical evidence that there is consistent
growth in the total number of references and the number of web citations. The percentage of
web citation by year in both disciplines has increased from a low of 11.21 and 7.22 in the year
2008 to a high of 22.80 and 37.22 in the year 2017. The percentage of web citations from the
total number of references is high in CMS journal articles (22.05%) than in LIS journal articles
(15.97%). The Pearson’s correlation analysis shows that there is a positive correlation
between the year and the percentage of web citations in LIS journals (r(8) 5 0.919, p < 0.001)
and in CMS journals (r(8) 5 0.968, p < 0.001). The p-value of less than 0.05 indicates that the
percentage of web citations in articles has increased from 2008 to 2017 in both disciplines.
The results are in line with studies that indicated the use of web citations had increased
significantly from 1996 to 2009 in LIS journal articles (Sampath Kumar and Manoj Kumar,
2012) and from 2001 to 2006 in CMS journal articles (Zhang, 2007).

Journal-wise distribution of articles, references and web citations


Table 3 shows that the journal Scientometrics in LIS has the highest number of articles (2,575) as
well as references (96,137). This is followed by the Journal of the Association for Information
Science and Technology with 1744 articles and 87,468 references and New Media and Society, a
CMS journal with 794 articles and 40,663 references. The lowest number of articles was found in
the journal Media Psychology (212) and Communication Theory (213), which both are CMS

Library and information science Communication and media studies


Total Total
number Total Total Percentage number Total Total Percentage
of number of web of web of number of web of web
Year articles references citations citations articles references citations citations

2008 637 22,064 2,474 11.21 337 18,226 1,316 7.22


2009 667 23,566 2,965 12.58 380 20,615 1,933 9.38
2010 705 25,556 2,960 11.58 364 19,707 1,823 9.25
2011 708 26,859 3,452 12.85 385 20,823 1,811 8.7
2012 752 28,495 3,919 13.75 416 21,399 3,734 17.45
2013 819 32,661 4,676 14.32 424 22,384 5,673 25.34
2014 864 35,440 5,327 15.03 464 25,622 7,303 28.5
Table 2. 2015 924 41,811 6,942 16.6 484 26,353 7,815 29.66
Year-wise distribution 2016 959 43,662 8,974 20.55 539 29,089 9,581 32.94
of articles, references 2017 951 44,522 10,150 22.8 472 26,574 9,890 37.22
and web citations Total 7,986 324,636 51,839 15.97 4,265 230,792 50,879 22.05
Library and information science Communication and media studies
Total Total
number of Total number Total web Percentage of number of Total number Total web Percentage of
Journal articles of references citations web citations Journal articles of references citations web citations

JMLA 241 6,293 2,255 35.83 ICS 704 34,782 13,233 38.05
JAL 698 22,658 6,999 30.89 JOC 469 25,811 8,631 33.44
Portal 288 10,412 2,906 27.91 MP 212 12,495 4,147 33.19
LISR 319 16,302 3,369 20.67 JCMC 337 17,946 4,548 25.34
CRL 376 13,376 2,722 20.35 CT 213 15,348 3,274 21.33
AJIM 378 15,536 2,682 17.26 CR 401 24,503 4,781 19.51
Scientometrics 2,575 96,137 14,261 14.83 PC 254 14,715 2,396 16.28
JOI 647 24,901 3,546 14.24 NMS 794 40,663 6,468 15.91
JASIST 1,744 87,468 10,301 11.78 PUOS 531 25,102 2,461 9.8
IPM 720 31,553 2,798 8.87 JOA 350 19,427 940 4.84
Total 7,986 324,636 51,839 15.97 4,265 230,792 50,879 22.05
scholarly
journals

231
citations in
Rotten web

distribution of articles,
Table 3.

references and web


citations
Journal-wise
AJIM journals. The lowest number of references was found in the Journal of the Medical Library
74,2 Association (6,293) and Portal (10,412). It can be observed that the references in the journal
Scientometrics and JASIST have accounted for 56% of the total cited references in LIS journals.
The total web citations among all journals were found to be highest in the journal
Scientometrics (14,261), followed by Information Communication and Society (13,233) and
JASIST (10,301). A low number of web citations were noted in the Journal of Advertising (940)
and JMLA (2,255). The highest percentage of web citations were cited in a CMS journal,
232 Information Communication and Society with 38.05% references citing a web source,
followed by the Journal of Medical Library Association (35.83%). A low percentage of web
citations was noticed in the Journal of Advertising (4.84%).

Distribution of URLs and DOIs


The permanence of web citation is of major concern to academicians, and the use of DOIs
instead of URLs can avoid its decay. The DOI is defined as a character string that is used to
identify a scholarly publication in the digital environment uniquely. Even if the URL changes,
the DOI does not change. Hence, the DOIs are considered persistent identifiers. Apart from
DOIs, arXiv and WoS identifiers were also used in references. The arXiv identifier is given by
arXiv, which is an open-access archive for pre-print scholarly articles operated by Cornell
University. The Web of Science unique identifier (UT) provided by Thomson Reuters is also
an alphanumeric character string.
Figure 1 shows the distribution of URLs and DOIs in LIS journal articles. It is found that
the journal articles had 31,291 (60.36%) URLs, 19,640 (37.89%) DOIs and 908 (1.75%) arXiv
and WOS identifiers cited in the references. In contrast, CMS journal articles had 19,946
(39.20%) URL links, 30,868 (60.67%) DOIs and 65 (0.13%) arXiv identifiers, which is observed
from Figure 2. It can be noted that there are more DOIs in CMS journal articles than in LIS
Journal articles.

Year-wise distribution of accessible and rotten URLs


Though the web has eased information access, missing web citations is a major concern for
researchers (Sadat-Moosavi et al., 2012). The data presented in Table 4 show the distribution
of accessible and rotten URLs, which answers research question 2. In LIS journal articles, out
of the 51,839 URLs, 39,866 (76.90%) were accessible and the remaining 11,973 (23.10%)
encountered accessibility errors. In comparison, CMS journal articles had 42,899 URLs
(84.32%) that were accessible and 7,980 (15.68%) which were rotten from a total of

Others
1.75%

DOIs
37.89%

URLs
60.36%

Figure 1.
Distribution of URLs
and DOIs in library and
information science
journals URLs DOIs Others
Rotten web
Others
0.13%
citations in
scholarly
URLs journals
39.20%

233
DOIs
60.67%

Figure 2.
Distribution of URLs
and DOIs in
communication and
URLs DOIs Others media studies journals

Library and Information Science Communication and Media Studies


Total Accessible Rotten Total Accessible Rotten
Year URLs URLs % URLs % URLs URLs % URLs %

2008 2,474 1,223 49.43 1,251 50.57 1,316 728 55.32 588 44.68
2009 2,965 1,608 54.23 1,357 45.77 1,933 934 48.32 999 51.68
2010 2,960 1752 59.19 1,208 40.81 1,823 1,040 57.05 783 42.95
2011 3,452 2,292 66.4 1,160 33.6 1,811 1,012 55.88 799 44.12
2012 3,919 2,703 68.97 1,216 31.03 3,734 2,894 77.5 840 22.5
2013 4,676 3,476 74.34 1,200 25.66 5,673 4,921 86.74 752 13.26
2014 5,327 4,083 76.65 1,244 23.35 7,303 6,430 88.05 873 11.95
2015 6,942 5,780 83.26 1,162 16.74 7,815 7,056 90.29 759 9.71 Table 4.
2016 8,974 7,783 86.73 1,191 13.27 9,581 8,676 90.55 905 9.45 Year-wise distribution
2017 10,150 9,166 90.31 984 9.69 9,890 9,208 93.1 682 6.9 of accessible and
Total 51,839 39,866 76.9 11,973 23.1 50,879 42,899 84.32 7,980 15.68 rotten URLs

50,879 URLs. The percentage of rotten URLs is highest during the year 2008 (50.57%) in LIS
journal articles and 51.68% of URLs were rotten in CMS journal articles during the year 2009,
which was the highest. It is found from the study that CMS journal articles had fewer rotten
URLs when compared to the URLs in LIS journal articles. This may be attributed to the more
number of DOIs used in CMS journal articles. Pearson’s correlation analysis indicated that
there is a positive correlation between the age of publication and the percentage of rotten
URLs in LIS journal articles (r(8) 5 0.997, p < 0.001) and CMS journal articles (r(8) 5 0.928,
p < 0.001). As the p-value is less than 0.05 in both the disciplines, it can be inferred that the
early cited URLs tend to be rotten, which can also be seen in previous studies (Sampath
Kumar and Manoj Kumar, 2012; Sampath Kumar and Prithvi Raj, 2015) (see Figure 3).

Journal-wise distribution of accessible and rotten URLs


It is observed from Table 5 that among the two disciplines, CMS journals, Media Psychology
(6%), Journal of Communication (7.33%) and Communication Theory (9.32%) had less
number of rotten URLs. A large number of rotten URLs were found in the journal Portal
(44.46%), which is a Library Science journal. This is followed by New Media and Society
AJIM LIS journal articles CMS journal articles
74,2
12000 60 12000 60

10000 50 10000 50
234

8000 40 8000 40

NUMBER OF URLS
NUMBER OF URLS

PERCENTAGE
PERCENTAGE
6000 30 6000 30

4000 20 4000 20

2000 10 2000 10

0 0 0 0
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017

YEAR YEAR
RoƩen URLs RoƩen URLs
Figure 3.
Accessible and Accessible URLs Accessible URLs
rotten URLs Percentage of RoƩen URLs Percentage of RoƩen URLs

(38.67%), followed by the Journal of Advertising (37.23%) and Public Understanding of


Science (33.69%), which are CMS journals.

Distribution of HTTP error codes


Table 6 presents the various HTTP error codes encountered by the rotten URLs. The HTTP
errors encountered are the result of the URLs tested once during April 2020 at a specific client.
The results presented here answer research question 2. The common HTTP error messages
encountered are explained below.
(1) HTTP 400 (Bad Request): This error occurs due to the server not being able to
process because of some client error. This may occur due to syntax being invalid, size
being too large and so forth.
(2) HTTP 403 (Forbidden): This error occurs when the server understands the request
but refuses to authorize it. The client does not possess access rights to the content;
hence, it is unauthorized, so the server refuses to give the requested resource. Unlike
401, the client’s identity is known to the server.
Library and Information Science Communication and Media Studies
Journal Total URLs Accessible URLs % Rotten URLs % Journal Total URLs Accessible URLs % Rotten URLs %

JOI 3,546 3,163 89.2 383 10.8 JCMC 4,548 3,861 84.89 687 15.11
IPM 2,798 2,236 79.91 562 20.09 JOC 8,631 7,998 92.67 633 7.33
JASIST 10,301 7,842 76.13 2,459 23.87 CR 4,781 4,323 90.42 458 9.58
Scientometrics 14,261 11,956 83.84 2,305 16.16 NMS 6,468 3,967 61.33 2,501 38.67
CRL 2,722 1,864 68.48 858 31.52 ICS 13,233 11,545 87.24 1,688 12.76
JMLA 2,255 1,751 77.65 504 22.35 JOA 940 590 62.77 350 37.23
Portal 2,906 1,614 55.54 1,292 44.46 PC 2,396 2,116 88.31 280 11.69
AJIM 2,682 1804 67.26 878 32.74 CT 3,274 2,969 90.68 305 9.32
JAL 6,999 5,187 74.11 1,812 25.89 MP 4,147 3,898 94 249 6.00
LISR 3,369 2,449 72.69 920 27.31 PUOS 2,461 1,632 66.31 829 33.69
Total 51,839 39,866 76.90 11,973 23.10 Total 50,879 42,899 84.32 7,980 15.68
scholarly
journals

235
citations in

distribution of
accessible and
Rotten web

rotten URLs
Table 5.
Journal-wise
AJIM Library and Information Science Communication and Media Studies
74,2 HTTP error codes Number of rotten URLs % Number of rotten URLs %

400 535 4.47 159 1.99


401 17 0.14 10 0.13
403 644 5.38 493 6.18
404 9,949 83.1 6,735 84.4
236 405 41 0.34 53 0.66
410 65 0.54 61 0.76
416 71 0.59 32 0.4
429 11 0.09 2 0.03
500 474 3.96 282 3.53
502 14 0.12 13 0.16
Table 6. 503 110 0.92 98 1.23
Distribution of HTTP Others 42 0.35 42 0.53
error code Total 11,973 100 7,980 100

(3) HTTP 404 (Page Not Found): This error occurs when the server does not find the
target resource or when it is not ready to reveal that the resource exists. The reason of
this error may be due to modified URLs or removal or relocation of files.
(4) HTTP 500 (Internal Server Error): This occurs when the server encounters an
unexpected condition/situation and does not know how to handle it; thus, it fails to
fulfill the request.
(5) HTTP 502 (Bad Gateway): This occurs when the server acts as a gateway or as a
proxy from another server in order to fulfill a request from the client.
(6) HTTP 503 (Service Unavailable): This error may be noticed when the server is not
capable of handling the requests by the client. This may be due to overload or
scheduled maintenance, which is temporary and may be alleviated after some time.
HTTP error codes like 406, 408, 409, 415, 440, 456, 463, 501, 504, 505 and 530 are categorized in
“Others” category.
It can be observed that in LIS journal articles, out of the 11,973 rotten URLs, the HTTP 404
error message (Not found) is encountered by 83.10% of URLs, followed by HTTP 403 error
message with 5.38% and HTTP 500 error message with 3.96%. The HTTP error codes in the
“Others” category accounted for 0.35%. Of the 7,980 rotten URLs in CMS, the HTTP 404 error
code is encountered by almost 84.40% of URLs, followed by HTTP 403 error message with
6.18% and HTTP 500 error message with 3.53%. The HTTP error codes in the “Others”
category contributed 0.53% of the total error codes. The results are comparable with previous
study results of Goh and Ng (2007) and Sampath Kumar and Manoj Kumar (2012) where most
of the errors are due to HTTP 404 error code.

Year-wise distribution of retrieved URLs


The answer to research question 3 which was to determine the percentage of URLs retrieved
through Time Travel is presented in Table 7. It is evident that 65.46% of URLs were retrieved
from various web archives through Time Travel in CMS, which is higher than in LIS, which
had retrieved 61.76% of URLs. The result is consistent with a study done by Vinay Kumar
and Sampath Kumar (2019) who attempted to retrieve inaccessible URLs in an LIS journal
using Time Travel and recouped almost 60% of inaccessible web citations. The percentage of
retrieved URLs varied from a low of 56.50% in the year 2017 to a high of 66.67% during the
year 2008 in LIS journal articles. Though the percentage of archived URLs showed a
Library and Information Science Communication and Media Studies
Rotten web
Year Rotten URLs Archived URLs Percentage Rotten URLs Archived URLs Percentage citations in
scholarly
2008 1,251 834 66.67 588 402 68.37
2009 1,357 872 64.26 999 671 67.17 journals
2010 1,208 789 65.31 783 515 65.77
2011 1,160 723 62.33 799 527 65.96
2012 1,216 730 60.03 840 568 67.62 237
2013 1,200 728 60.67 752 502 66.76
2014 1,244 767 61.66 873 576 65.98
2015 1,162 702 60.41 759 498 65.61
2016 1,191 693 58.19 905 561 61.99 Table 7.
2017 984 556 56.50 682 404 59.24 Year-wise distribution
Total 11,973 7,394 61.76 7,980 5,224 65.46 of retrieved URLs

decreasing trend, there was a surge in the archived URLs during the years 2010, 2013 and
2014. Similarly in CMS journal articles, the retrieved URLs varied from a low of 59.24% in
2017 to a high of 68.37% in 2008. There was a slight surge noted in the archived URLs during
the years 2011 and 2012. This shows that the older published URLs are retrieved more than
the newer ones. The web archives have thus tried to preserve the web content to make sure
that they are available for future readers.

Journal-wise distribution of retrieved URLs


Table 8 shows that the highest percentage of retrieved URLs through web archives is noted in
CMS journals, Information Communication Society (71.56%), followed by the Journal of
Communication (70.46%) and Communication Research (70.09%). The least percentage of
retrieved URLs is noted in the Journal of Informetrics (45.43%), followed by Information
Processing and Management (53.02%) and Portal (56.81%), which are from LIS. It can also be
observed that all the journals in both the disciplines except for the Journal of Informetrics in
LIS had retrieved more than 50% of URLs from web archives.

File formats associated with rotten and retrieved URLs


The data illustrated in Table 9 indicate that in LIS journal articles, the highest percentage
(74.54%) of retrieved URLs belonged to .cfm file format, followed by URLs with .asp file

Library and Information Science Communication and Media Studies


Rotten Archived Rotten Archived
Journal URLs URLs Percentage Journal URLs URLs Percentage

JOI 383 174 45.43 JCMC 687 400 58.22


IPM 562 298 53.02 JOC 633 446 70.46
JASIST 2,459 1,618 65.80 CR 458 321 70.09
Scientometrics 2,305 1,598 69.33 NMS 2,501 1,566 62.61
CRL 858 517 60.26 ICS 1,688 1,208 71.56
JMLA 504 306 60.71 JOA 350 201 57.43
Portal 1,292 734 56.81 PC 280 193 68.93
AJIM 878 579 65.95 CT 305 213 69.84 Table 8.
JAL 1,812 1,056 58.28 MP 249 169 67.87 Journal-wise
LISR 920 514 55.87 PUOS 829 507 61.16 distribution of
Total 11,973 7,394 61.76 Total 7,980 5,224 65.46 retrieved URLs
AJIM Library and Information Science Communication and Media Studies
74,2 File Rotten Archived Rotten Archived
formats URLs URLs Percentage URLs URLs Percentage

.asp 514 363 70.62 584 436 74.66


.cfm 216 161 74.54 90 65 72.22
.cgi 43 25 58.14 11 7 63.64
238 .doc 80 49 61.25 32 21 65.63
.html 6,551 3,907 59.64 4,755 3,045 64.04
.jsp 67 42 62.69 35 23 65.71
Table 9. .pdf 3,815 2,388 62.6 1,910 1,226 64.19
File format associated .php 483 315 65.22 524 374 71.37
with rotten and Others 204 144 70.59 39 27 69.23
retrieved URLs Total 11,973 7,394 61.76 7,980 5,224 65.46

formats (70.62%) and URLs with .php file format (65.22%). However, a lowest percentage of
retrieval is associated with URLs having .cgi file format (58.14%), .doc file format (61.25%)
and .pdf file format (62.60%). In CMS journal articles, out of the 7,980 retrieved URLs, the
highest percentage of URLs belonged to .asp file format (74.66%), followed by URLs with .cfm
file formats (7222%) and URLs with .php file format (71.37%). A lowest percentage of
retrieval is associated with URLs having .cgi file format (63.64%), .html file format (64.04%)
and .pdf file format (64.19%).

Top-level domain associated with rotten and retrieved URLs


It can be observed from Table 10 that the highest percentage of URLs with top-level domain
.info (68.42%) was archived followed by .int (66.67%), .gov (64.01%) and .org top-level
(62.97%) domains. It is also observed from the table that a low percentage of the retrieved
URLs belonged to .net (56.90%), .edu (58.57%), .com (59.73%) and country code (62.90%) top-
level domains. In CMS journal articles, 80% URLs with top-level domain .info were archived
followed by 75.29% of URLs with .gov top-level domain, 72.73% with .int top-level domain
and 69.72% with country code top-level domain. The lowest percentage of retrieved URLs
had .com top-level domain (60.29%) followed by .net (63.48%), .org- (65.40%) and .edu
(66.61%) top-level domains.
Path depth associated with rotten and retrieved URLs
It can be noted from Table 11 that in LIS journal articles, 69.74% URLs that are retrieved
through web archives belonged to the path depth 4, followed by URLs with path depth level 5

Library and Information Science Communication and Media Studies


Top-level Rotten Archived Rotten Archived
domain URLs URLs Percentage URLs URLs Percentage

.com 1,999 1,194 59.73 2,070 1,248 60.29


country code 3,016 1,897 62.90 1,780 1,241 69.72
.edu 1,675 981 58.57 614 409 66.61
.gov 603 386 64.01 263 198 75.29
.info 95 65 68.42 20 16 80.00
.int 45 30 66.67 33 24 72.73
Table 10. .net 297 169 56.90 293 186 63.48
Top-level domain .org 4,234 2,666 62.97 2,899 1,896 65.40
associated with rotten Others 9 6 66.67 8 6 75.00
and retrieved URLs Total 11,973 7,394 61.76 7,980 5,224 65.46
Library and Information Science Communication and Media Studies
Rotten web
Path depth Rotten URLs Archived URLs Percentage Rotten URLs Archived URLs Percentage citations in
scholarly
PD 5 0 268 151 56.34 171 93 54.39
PD 5 1 1,252 746 59.58 877 502 57.24 journals
PD 5 2 3,247 1,817 55.96 2,357 1,568 66.53
PD 5 3 2,880 1,819 63.16 1,746 1,223 70.05
PD 5 4 2,085 1,454 69.74 1,298 895 68.95 239
PD 5 5 1,036 704 67.95 760 483 63.55
PD 5 6 552 313 56.7 389 229 58.87
PD 5 7 356 208 58.43 151 86 56.95 Table 11.
PD 5 8 101 63 62.38 60 38 63.33 Path Depth associated
PD > 8 196 119 60.71 171 107 62.57 with rotten and
Total 11,973 7,394 61.76 7,980 5,224 65.46 retrieved URLs

(67.95%) and path depth level 3 (63.16%). A low percentage of retrieved URLs was associated
with path depth 2 (55.96%) followed by path depth level 0 (56.34%) and path depth level 6
(56.70%). In CMS journal articles, 70.05% of retrieved URLs had path depth of 3, followed by
68.95% having path depth level of 4 and 66.53% having path depth of 2. A low percentage of
retrieved URLs had path depth 0 (54.39%) followed by path depth 7 (56.95%) and path depth
1 (57.24%). The correlation analysis was performed to know the relationship between the
path depth and the percentage of retrieved URLs through web archives. In LIS, it is found that
there is a positive correlation between the path depth and the percentage of retrieved URLs,
r(8) 5 0.192, p 5 0.595 and in CMS r(8) 5 0.102, p 5 0.779. As the p-value is greater than 0.05,
there is no statistical evidence to show that there is a relationship between the path depth of
URLs and the retrieved URLs. Previous studies also showed that relationship between path
depth of URLs and recovered URL citations was not statistically significant (Sampath Kumar
and Vinay Kumar, 2013; Sampath Kumar et al., 2015).

Character length associated with rotten and retrieved URLs


Table 12 indicates the retrieval of rotten URLs through web archives in LIS journal articles. It
shows that the majority of retrieved rotten URLs had a character length of less than 20
(70.67%), followed by URLs with character length 51–60 (67.69%), and URLs with character
length 41–50 (63.95%). A low percentage of rotten URLs with character length 81–90

Library and information Science Communication and media Studies


Character Rotten Archived Rotten Archived
length URLs URLs Percentage URLs URLs Percentage

<20 75 53 70.67 28 21 75
21–30 555 332 59.82 318 204 64.15
31–40 1,368 828 60.53 864 563 65.16
41–50 2,175 1,391 63.95 1,497 1,013 67.67
51–60 2,343 1,586 67.69 1,520 1,011 66.51
61–70 1,863 1,144 61.41 1,153 798 69.21
71–80 1,270 740 58.27 900 573 63.67
81–90 865 493 56.99 611 382 62.52 Table 12.
91–100 535 310 57.94 365 226 61.92 Character length
>100 924 517 55.95 724 433 59.81 associated with rotten
Total 11,973 7,394 61.76 7,980 5,224 65.46 and retrieved URLs
AJIM (56.99%), 91–100 (57.94%) and 71–80 (58.27%) is retrieved through web archives. In CMS
74,2 journal articles, more number of retrieved URLs had a character length of less than 20 (75%),
followed by URLs with character length 61–70 (69.21%) and 41–50 (67.67%). Less number of
retrieved URLs had character length of 91–100 (61.92%), 81–90 (62.52%) and 71–80 (63.67%).
The correlation test indicates a negative relationship between the character length and the
percentage of retrieved URLs through web archives. A very meagre number of characters in a
URL leads to higher retrieval of rotten URLs. The analysis shows that there is a negative
240 correlation between the character length and percentage of retrieved URLs r(8) 5 0.716,
p 5 0.019 in LIS and in CMS r(8) 5 0.740, p 5 0.014. As the p-value is less than 0.05, there is
statistical evidence to show that there is an association between the character length of URLs
and the retrieved URLs. The results commensurate with the study done by Sampath Kumar
et al. (2015) who recovered vanished online citations in three journals published by Emerald
publications.

Testing of hypotheses
Table 13 presents the formulated hypotheses, the statistical test applied to verify the
hypotheses and the results. It can be seen from the table that only one hypothesis was not
supported by the study results.

Discussion and conclusion


The access and use of scientific information available on the World Wide Web have
contributed to the use of web citations in scholarly journals. The present study confirms the
growing use of URLs in the references cited in two disciplines in 2008–2017. The persistence
of URLs may hinder access to information on the Web. The web citations will be unusable by
the researchers if they tend to diminish over a while. This may be due to the URL moving to a
new location or when there is a change in the content. The disappearance of URLs may be
evaded by the use of a DOI. The use of DOIs, which are considered to be permanent and stable
identifiers instead of URLs, does not assure that the content of the document remains
unchanged. The content in the resource can be changed or modified at any time. Though the
content changes, the DOI of the document remains the same. The resource with the old
content may not be available to the authors or researchers. Hence, the use of Internet web
archives along with search engines can help to retrieve the original document. The editors
and publishers should also take up the responsibility of checking the availability of URLs.
Archival of URLs should be done by the authors as well as publishers. Many online archives
like Internet Archive and WebCite are available free of charge to archive or access a web page
which encounters an error. The publishers of scholarly work should develop guidelines to
maintain the stability of the web documents. The professional and academic disciplines need

p-value Result
Hypotheses LIS CMS LIS CMS

The use of web citations and the year are positively 0.000 0.000 Supported Supported
correlated
The age of publication and the percentage of rotten URLs are 0.000 0.000 Supported Supported
positively correlated
The path depth and percentage of retrieved URLs are 0.595 0.779 Not Not
positively correlated supported supported
Table 13. The character length and percentage of retrieved URLs are 0.019 0.014 Supported Supported
Testing of hypotheses positively correlated
to develop new citation conventions like giving a link to the archived copy of the scholarly Rotten web
work so that the future researchers can access the original cited work. The journal publishers citations in
and editors need to instruct authors about the new conventions as and when they are
established. It is thus the responsibility of the authors, publishers as well as the editorial team
scholarly
to make certain that the cited reference in the scholarly work is available without hindrance to journals
the academicians.
241
References
Casserly, M.F. and Bird, J.E. (2003), “Web citation availability: analysis and implication for
scholarship”, College and Research Libraries, Vol. 64 No. 4, pp. 300-317, doi: 10.5860/crl.64.4.300.
Chen, C., Sun, K., Wu, G., Tang, Q., Qin, J., Chiu, K., Fu, Y., Wang, X. and Liua, J. (2009), “The impact of
internet resources on scholarly communication: a citation analysis”, Scientometrics, Vol. 81
No. 2, pp. 459-474, doi: 10.1007/s11192-008-2180-y.
Chikate, R.V. and Patil, S.K. (2009), “Measuring impact of web sources in ILA Conference Proceedings:
a citation analysis”, Library Herald, Vol. 47 No. 2, pp. 142-154.
Dimitrova, D.V. and Bugeja, M. (2006), “Consider the source: predictors of online citation permanence
in communication journals”, Portal: Libraries and the Academy, Vol. 6 No. 3, pp. 269-283, doi: 10.
1353/pla.2006.0032.
Dimitrova, D.V. and Bugeja, M. (2007a), “The half-life of Internet references cited in communication
journals”, New Media and Society, Vol. 9 No. 9, pp. 811-826, doi: 10.1177/1461444807081226.
Dimitrova, D.V. and Bugeja, M. (2007b), “Raising the dead: recovery of decayed online citations”,
American Communication Journal, Vol. 9 No. 2, pp. 1-14.
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X. and Zhai, C. (2014), “Content-based citation
analysis: the next generation of citation analysis”, Journal of the Association for Information
Science and Technology, Vol. 65 No. 9, pp. 1820-1833, doi: 10.1002/asi.23256.
Germain, C.A. (2000), “URLs: uniform resource locators or unreliable resource locators”, College and
Research Libraries, Vol. 61 No. 4, pp. 359-365, doi: 10.5860/crl.61.4.359.
Goh, D.H.L. and Ng, P.K.N. (2007), “Link decay in leading information science journals”, Journal of the
American Society for Information Science and Technology, Vol. 58 No. 1, pp. 15-24, doi: 10.1002/
asi.20513.
Harter, S.P. and Kim, H.J. (1996), “Electronic journals and scholarly communication: a citation and
reference study”, Information Research, Vol. 2 No. 1, available at: http://InformationR.net/ir/2-1/
paper9a.html.
Herring, D.H. (2002), “Use of electronic resources in scholarly electronic journals: a citation analysis”,
College and Research Libraries, Vol. 63 No. 4, pp. 334-340, doi: 10.5860/crl.63.4.334.
Isfandyari-Moghaddam, A. and Saberi, M.K. (2011), “The life and death of URLs: the case of journal of
the medical library association”, Library Philosophy and Practice, (July), (annual volume),
available at: www.webpages.uidaho.edu/,mbolin/moghaddam-saberi.htm.
Isfandyari-Moghaddam, A., Saberi, M.K. and Mohammad Esmaeel, S. (2010), “Availability and half-life
of web references cited in Information research journal: a citation study”, International Journals
of Information Science Management, Vol. 8 No. 2, pp. 57-75.
Jalalifard, M., Norouzi, Y. and Isfandyari-Moghaddam, A. (2013), “Analyzing web citations availability
and half-life in medical journals: a case study in an Iranian university”, Aslib Proceedings,
Vol. 65 No. 3, pp. 242-261, doi: 10.1108/00012531311330638.
Lawrence, S., Coetzee, F., Glover, E., Pennock, D.M., Flake, G. and Nielsen, F. (2001), “Persistence of
web references in scientific research”, IEEE Computer, Vol. 34 No. 2, pp. 26-31, doi: 10.1109/2.
901164.
Mardani, A. (2012), “An investigation of the web citations in Iran’s chemistry articles in SCI”, Library
Review, Vol. 61 No. 1, pp. 18-29, doi: 10.1108/00242531211207398.
AJIM Markwell, J. and Brooks, D.W. (2003), “Linkrot limits the usefulness of the web-based educational
material in biochemistry and molecular biology”, Biochemistry and Molecular Biology
74,2 Education, Vol. 31 No. 1, pp. 69-72, doi: 10.1002/bmb.2003.494031010165.
M^egnigb^eto, E. (2006), “Internet-based resources citation by undergraduate students”, International
Information and Library Review, Vol. 38 No. 2, pp. 49-55, doi: 10.1016/j.iilr.2006.04.001.
Riahinia, N., Zandian, F. and Azimi, A. (2011), “Web citation persistence over time: a retrospective
study”, The Electronic Library, Vol. 29 No. 5, pp. 609-620, doi: 10.1108/02640471111177053.
242
Rumsey, M. (2002), “Runaway train: problems of permanence, accessibility, and stability in the use of
web sources in law review citations”, Law Library Journal, Vol. 94 No. 2, pp. 27-39.
Russell, E. and Kane, J. (2008), “The missing link: assessing the reliability of internet citations in
history journals”, Technology and Culture, Vol. 49 No. 2, pp. 420-429.
Saberi, M.K. and Abedi, H. (2012), “Accessibility and decay of web citations in five open access ISI
journals”, Internet Research, Vol. 22 No. 2, pp. 234-247, doi: 10.1108/10662241211214584.
Sadat-Moosavi, A., Isfandyari-Moghaddam, A. and Tajeddini, O. (2012), “Accessibility of online
resources cited in scholarly LIS journals: a study of emerald ISI-ranked journals”, Aslib
Proceedings, Vol. 64 No. 2, pp. 178-192, doi: 10.1108/00012531211215196.
Sampath Kumar, B.T. and Manoj Kumar, K.S. (2012), “Persistence and half-life of URL citations cited
in LIS open access journals”, Aslib Proceedings, Vol. 64 No. 4, pp. 405-422, doi: 10.1108/
00012531211244752.
Sampath Kumar, B.T. and Prithvi Raj, K.R. (2012), “Availability and persistence of web citations in
Indian LIS literature”, The Electronic Library, Vol. 30 No. 1, pp. 19-32, doi: 10.1108/
02640471211204042.
Sampath Kumar, B.T. and Prithvi Raj, K.R. (2015), “Bringing life to dead: role of Wayback Machine in
retrieving vanished URLs”, Journal of Information Science, Vol. 41 No. 1, pp. 71-81, doi: 10.1177/
0165551514552752.
Sampath Kumar, B.T. and Vinay Kumar, D. (2013), “HTTP 404-page (not) found: recovery of decayed
URL citations”, Journal of Informetrics, Vol. 7 No. 1, pp. 145-157, doi: 10.1016/j.joi.2012.09.007.
Sampath Kumar, B.T., Vinay Kumar, D. and Prithvi Raj, K.R. (2015), “Wayback machine:
reincarnation to vanished online citations”, Program, Vol. 49 No. 2, pp. 205-223, doi: 10.1108/
PROG-07-2013-0039.
Spinellis, D. (2003), “The decay and failures of web references”, Communications of the ACM, Vol. 46
No. 1, pp. 71-77, doi: 10.1145/602421.602422.
Tajeddini, O., Azimi, A., Sadat-Moosavi, A. and Sharif-Moghaddam, H. (2011), “Death of web citations:
a serious alarm for authors”, Malaysian Journal of Library and Information Science, Vol. 16
No. 3, pp. 17-29, available at: http://ijie.um.edu.my/index.php/MJLIS/article/view/6710.
Thorp, A.W. and Brown, L. (2007), “Accessibility of internet references in Annals of emergency
medicine: is it time to require archiving?”, Annals of Emergency Medicine, Vol. 50 No. 2,
pp. 188-192, doi: 10.1016/j.annemergmed.2006.11.019.
Vaughan, L. and Shaw, D. (2005), “Web citation data for impact assessment: a comparison of four
science disciplines”, Journal of the American Society for Information Science and Technology,
Vol. 56 No. 10, pp. 1075-1087, doi: 10.1002/asi.20199.
Vinay Kumar, D. and Sampath Kumar, B.T. (2019), “Recouping the missing web citations in library hi-
tech journal”, Journal of Indian Library Association, Vol. 55 No. 4, pp. 1-8.
Vinay Kumar, D. and Sushmitha, M. (2019), “Recovery of missing URLs cited in Annals of library and
information studies: a study of time travel”, Annals of Library and Information Studies, Vol. 66
No. 1, pp. 24-32.
Vinay Kumar, D., Sampath Kumar, B.T. and Parameshwarappa, D.R. (2015), “URLs link rot:
implications for electronic publishing”, World Digital Libraries - An International Journal, Vol. 8
No. 1, pp. 59-66, doi: 10.18329/09757597/2015/8105.
Wren, J.D., Johnson, K.R., Crockett, D.M., Heilig, L.F., Schilling, L.M. and Dellavalle, R.P. (2006), Rotten web
“Uniform resource locator decay in dermatology journals: author attitudes and preservation
practices”, Archives of Dermatology, Vol. 142, pp. 1147-1152, doi: 10.1001/archderm.142.9.1147. citations in
Yang, S., Qiu, J. and Xiong, Z. (2010), “An empirical study on the utilization of web academic resources
scholarly
in humanities and social sciences based on web citations”, Scientometrics, Vol. 84 No. 1, pp. 1-19, journals
doi: 10.1007/s11192-009-0142-7.
Yang, S., Han, R., Ding, J. and Song, Y. (2012), “The distribution of Web citations”, Information
Processing and Management, Vol. 48 No. 4, pp. 779-790, doi: 10.1016/j.ipm.2011.10.002. 243
Zhang, Y. (2007), “The effect of open access on citation impact: a comparison study based on web
citation analysis”, Libri, Vol. 56 No. 3, pp. 145-156, doi: 10.1515/LIBR.2006.145.
Zhao, D. and Logan, E. (2002), “Citation analysis using scientific publications on the Web as data
source: a case study in the XML research area”, Scientometrics, Vol. 54 No. 3, pp. 449-472, doi:
10.1023/A:1016090601710.

Further reading
Niveditha, B. and Kumbar, M. (2020a), “Web citation analysis of library and information science and
communication and Media studies journals: a comparative study”, COLLNET Journal of
Scientometrics and Information Management, Vol. 14 No. 2, pp. 335-348, doi: 10.1080/09737766.
2021.1915721.
Niveditha, B. and Kumbar, M. (2020b), “Accessibility and characteristics of web citations in journal of
computer-mediated communication during 2008-2017”, Journal of Indian Library Association,
Vol. 56 No. 2, pp. 39-50.

Corresponding author
B. Niveditha can be contacted at: niveditha.jb@gmail.com

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like