Tsukioka 2017

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Accepted Manuscript

Investor sentiment extracted from internet stock message boards and IPO puzzles

Yasutomo Tsukioka, Junya Yanagi, Teruko Takada

PII: S1059-0560(17)30817-1
DOI: 10.1016/j.iref.2017.10.025
Reference: REVECO 1528

To appear in: International Review of Economics and Finance

Received Date: 18 April 2016


Revised Date: 11 October 2017
Accepted Date: 29 October 2017

Please cite this article as: Tsukioka Y., Yanagi J. & Takada T., Investor sentiment extracted from internet
stock message boards and IPO puzzles, International Review of Economics and Finance (2017), doi:
10.1016/j.iref.2017.10.025.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Investor Sentiment Extracted from Internet Stock Message

Boards and IPO Puzzles

PT
Yasutomo Tsukioka
School of Business Administration, Kwansei Gakuin University

Postal address: 1-155 Uegahara Ichiban-cho, Nishinomiya, Hyogo 662-8501, Japan

RI
Phone: +81 7 9854 6340
E-mail: tsukioka@kwansei.ac.jp (corresponding author)

SC
Junya Yanagi
Graduate School of Business, Osaka City University

U
Postal address: 3-3-138 Sugimoto, Sumiyoshi, Osaka 558-8585, Japan
AN
Phone: +81 - [0]6 - 6605 - 2226
E-mail: sweetzunya@gmail.com
M

Teruko Takada
Graduate School of Business, Osaka City University

Postal address: 3-3-138 Sugimoto, Sumiyoshi, Osaka 558-8585, Japan


D

Phone: +81 - [0]6 - 6605 - 2226


TE

E-mail: takada@bus.osaka-cu.ac.jp
C EP
AC
ACCEPTED MANUSCRIPT
ABSTRACT

High initial returns and long-run underperformance of initial public offerings

(IPOs) represent puzzles in corporate finance. We investigate how investor

PT
sentiment affects these puzzles. Investor sentiment on each issuing firm’s

pre-IPO is extracted from Yahoo! Japan Finance message boards using text

RI
mining and a support vector machine classification. Message data on 654

SC
Japanese IPOs from 2001-2010 are used to measure investor attention and

U
sentiment. We find that high investor attention and bullish investor sentiment
AN
positively affect IPO offer prices and initial returns, leading to subsequent price

declines. This suggests that excessive investor optimism can partially explain
M

the IPO puzzles.


D
TE

Keywords: investor sentiment, message board, text mining, IPO, investor


EP

attention

JEL Classification: G02; G14


C
AC

1. Introduction

Initial public offering (IPO) firms reportedly exhibit high initial returns and

long-term underperformance, phenomena known as “IPO puzzles.” According to

Jenkinson and Ljungqvist (2001), these IPO puzzles are caused by the following
1
ACCEPTED MANUSCRIPT
factors based on rational models: asymmetric information, institutional

explanation, or ownership and control. Ritter and Welch (2002) indicate that

investor sentiment influences IPO puzzles. While researchers have recognized

the importance of studying the effect of investor sentiment on IPO puzzles,

PT
investigations have been limited due to the scarcity of information on investor

sentiment relative to IPOs. Europe’s gray market, which is a private security

RI
market organized by security companies, provides investors with opportunities

SC
to trade IPO shares until the IPO date. Cornelli, Goldreich, and Ljungqvist

U
(2006) and Dorn (2009) capture investor sentiment using gray market prices and
AN
illustrate how investor sentiment causes IPO puzzles. One problem with the gray

market is that its data availability is limited only to Europe. Another problem is
M

that the gray market is an over-the-counter market, and it is thus difficult to


D

determine whether it directly reflects genuine investor sentiment.


TE

We exploit standard technologies to extract investor sentiment directly and


EP

examine the effect of investor sentiment on IPO puzzles, rather than using grey

market prices. Recent advances in information technology have provided us with


C

various Internet-based social media, such as stock message boards, blogs,


AC

Twitter, and Facebook. 1 Further, increases in computing power and advances in

data-mining techniques have enabled the extraction of opinions or sentiment

from text data. The Yahoo! Finance message board is one of the most attractive

data sources for direct analyses of investor sentiment. Yahoo! Japan Finance

(YJF) message boards are particularly attractive among those from various
2
ACCEPTED MANUSCRIPT
other countries, as this board has many effective features by which to measure

investor sentiment, including a unique forum in which investors can post

messages before IPOs. Moreover, each listing firm has only one thread and we

can obtain and observe posted messages in time-series order.

PT
This study aims to clarify the effect of investor sentiment on IPO puzzles,

based on a sufficiently large text-based dataset for investor sentiment analysis,

RI
by extracting data from YJF message boards. Substantial data can be derived

SC
from YJF message boards, and it is difficult to exploit all the information by

U
manually classifying posted words. We can apply text-mining and support vector
AN
machines (SVM) techniques to classify sentiment and fully exploit this massive

amount of data. The SVM technique is effective in this regard and can be used to
M

extract opinions from Japanese text data on Internet message boards and
D

newspapers (Maruyama, Umehara, Suwa, & Ota, 2008; Okada, Yamasaki,


TE

Sakakibara, & Yamasaki, 2013). 2 This large amount of text information can
EP

clarify the effects of investor sentiment on IPO puzzles, which have been difficult

to investigate through conventional approaches.


C

This study represents the first attempt in the IPO literature to extract investor
AC

sentiment from Internet stock message boards. Additionally, while we limit our

data period only to the time before IPOs, the size of our dataset is comparable to

those of previous studies that used Internet message boards in non-IPO

literature. We collected 129,676 messages about 654 firms over a 10-year study

period. In comparison, Antweiler and Frank (2004) collected 1.5 million


3
ACCEPTED MANUSCRIPT
messages about 45 firms during one year, and Das and Chen (2007) collected

145,110 messages about 24 firms over two months.

We calculate the indicator of bullish investor sentiment as follows: The

posted messages on the message board are first processed and cleaned

PT
through text mining, and classified into bullish, neutral, and bearish messages

aided by an SVM classifier to obtain a bullishness indicator, defined as the

RI
percentage of bullish or bearish messages on each IPO firm’s thread. We also

SC
measure the degree of investor attention through the number of messages as a

U
proxy. AN
Our findings are as follows: Investor bullishness and attention both positively

affect the offer prices and initial returns, while higher investor attention leads to
M

lower post-IPO stock returns. These findings indicate the possibility that
D

investors’ excessive optimism causes the IPO puzzles of high initial returns and
TE

long-run underperformance. This is the first empirical explanation based on


EP

Internet stock message boards of the role of investor sentiment regarding IPO

puzzles.
C

The rest of this paper is organized as follows: Section 2 describes the


AC

characteristics of the YJF message boards and the Japanese IPO process. In

Section 3, we review the related literature and develop our hypotheses. Section

4 presents our research design. Section 5 describes the data and provides a

statistical summary. Section 6 presents our empirical results, and Section 7

concludes the paper.


4
ACCEPTED MANUSCRIPT

2. Internet message boards and the Japanese IPO process

Internet social media have changed how information is delivered and

PT
acquired, enabling investors to search for information regarding other investors’

opinions and actions. Internet message boards, in particular, have many

RI
favorable characteristics for investors: they can use these boards to discuss

SC
their opinions, as well as gather information about other investors’ opinions and

U
behaviors. AN
Among the alternatives, YJF message boards have many attractive

characteristics in terms of measuring each issuing firm’s pre-IPO investor


M

sentiment. First, YJF message boards are considered the best among the many
D

Internet stock message boards in Japan, in terms of their history and size. They
TE

are the oldest Internet stock message boards in Japan, established in July 1998.
EP

They are also the largest message boards in terms of the number of messages

posted.
C

Second, YJF message boards have two attractive features not observed in
AC

other countries’ Yahoo! counterparts: 1) each listing firm has only one thread,

and 2) each thread posts messages in time-series order, and begins

approximately one month before the IPO. For example, each listing firm in the

United States’ Yahoo! Finance message boards has many threads, and most are

generated only after the IPO.


5
ACCEPTED MANUSCRIPT
Table 1 illustrates an example of a firm thread. Message data are comprised

of the post number, date, time, poster ID, title, and comment. 3 Since Yahoo!

Japan posts the first message by default, the second message is identified as

the first real message. The first message is typically posted before the filing

PT
range is set or book-building starts. The last message of the day immediately

prior to the IPO is noted as the final message.

RI
Figure 1 displays the timeline for the IPO and its distinct periods. The

SC
Japanese IPO process has certain unique features. First, both individual and

U
institutional investors reveal their bid prices in the filing range to the underwriter
AN
under the book-building process. Second, even if investor sentiment and

demand is strong in the book-building period, the filing range is not adjusted,
M

and the offer price is always determined within the filing range. The offer price
D

does not change once it has been set in the filing range. Third, aftermarket
TE

trading begins almost one week after the offer price is set. Finally, in case of
EP

excessive buying orders, the first aftermarket trade in Japan is not necessarily

executed on the IPO date. This is mainly due to the non-existence of market
C

makers, no daily trading limit, and the Itayose method, which allows changing
AC

the first trading price until the bid and ask are matched. By contrast, in the

United States, the offer price can be set outside the filing range, the aftermarket

trading starts one day after the offer price is set, and the first aftermarket trading

occurs on the IPO date. Therefore, for Japan, we consider two periods: Phase 1,

from the time at which the first message is posted until the end of book-building;
6
ACCEPTED MANUSCRIPT
and Phase 2, from the end of book-building until just before the IPO. The

pre-IPO phase thus includes both Phases 1 and 2.

[Please insert Table 1]

PT
[Please insert Fig. 1]

RI
SC
3. Related literature and hypothesis development

U
AN
Investor sentiment is thought to influence the IPO puzzles of high initial

returns and low post-IPO stock returns (Ritter & Welch, 2002). A few studies
M

have investigated the relationship between investor sentiment and IPO puzzles,
D

and these can be divided into two groups depending on the data on which they
TE

are based.
EP

One approach uses book-building data, or gray market data that are only

available in Europe. Derrien (2005) indicates that investors’ demand in


C

book-building positively relates to pre-IPO market conditions. Further, he finds


AC

that investor demand positively relates to offer prices and initial returns, and

negatively relates to post-IPO stock returns. Cornelli et al. (2006) and Dorn

(2009) measure investor sentiment using gray market prices in Europe to

investigate the causes of IPO puzzles. Cornelli et al. (2006) theoretically and

empirically demonstrate that offer prices and initial returns increase with the
7
ACCEPTED MANUSCRIPT
gray market price, and that stock returns decrease with the gray market price.

The other approach measures investor demand or attention from special data,

which can be applied more generally by including non-European countries. Chan

(2014) measures investor demand immediately after IPOs using the Trading and

PT
Quote data from the first trading day. Da, Engelberg, and Gao (2011) measure

investor attention using Google search frequencies, and find that investor

RI
attention positively relates to initial returns and negatively relates to post-IPO

SC
stock returns. 4 Jegadeesh and Wu (2013) present a new approach concerning

U
words’ tone in financial reports, and find that tone negatively relates to initial
AN
returns.

We extract the pre-IPO investor sentiment from YJF message boards. Those
M

who post on Internet stock message boards are assumed to be small individual
D

investors; this assumption is consistent with that of Cornelli et al. (2006)


TE

concerning gray market investors. Message posting on the YJF message boards

starts before the filing range is set. 5 Managers and underwriters can monitor the
EP

messages posted on these Internet stock message boards to update their


C

expectations concerning small investors’ demand and valuations. We consider


AC

that offer prices may be affected by investor sentiment; in other words, the more

bullish the Phase 1 investor sentiment, the higher the offer price. This is

because both managers and investors would assume that small investors’

demand and valuations would be higher. This leads to our first empirical

hypothesis:
8
ACCEPTED MANUSCRIPT

Hypothesis 1: Investor sentiment in Phase 1 positively relates to the offer

price.

PT
The effects of investor sentiment on book-building and offer price are limited

under the Japanese IPO process, in that the filing range is not adjusted even if

RI
investor sentiment and demand are strong. Investors reveal their bid prices and

SC
volumes during the book-building period, and most of the IPO firms’ offer prices

U
are determined at the filing range’s maximum point. Additionally, underwriters
AN
have offset incentives, or incentives to set the filing range at higher levels,

because their fees depend on raising money from IPOs. Alternatively, they have
M

incentive to lower the filing range and offer price to avoid the risk that the IPO
D

stock will not sell out. Thus, the offer price does not fully reflect investor
TE

sentiment. 6 Therefore, the closing price on the first trading day increases with
EP

bullish investor sentiment. In other words, the more bullish the investor

sentiment, the higher the initial returns. Additionally, investor sentiment might
C

change after book-building, in Phase 2. This leads to our second empirical


AC

hypothesis:

Hypothesis 2: Investor sentiment in the pre-IPO phase or Phase 2 positively

relates to initial returns.

9
ACCEPTED MANUSCRIPT
Newly listed firms are found to have difficulty in maintaining high first-day

trading prices. Ritter (1991) and Loughran and Ritter (1997) find that IPO firms

exhibit long-run underperformance. Ritter and Welch (2002) and Ljungqvist,

Nanda, and Singh (2006) conjecture that excessive optimism leads to high

PT
first-day returns and eventually to stock price reversal, which causes long-run

underperformance. We assume that high trading prices pushed up by investor

RI
sentiment prior to the IPO will converge to the fundamental value following the

SC
IPO. This leads to our third hypothesis:

U
AN
Hypothesis 3: Investor sentiment in the pre-IPO phase or Phase 2

negatively relates to post-IPO stock performance.


M
D
TE

4. Methodology
EP

4.1 Measurement of investor sentiment


C
AC

The message data collected from the YJF message boards are classified into

three investor sentiment categories, bullish, bearish, or neutral, through three

text-mining and SVM classification steps: 1) The collection and preprocessing of

raw message data, 2) the preparation of training data for SVM classification

based on part of the preprocessed data, and 3) sentiment classification through


10
ACCEPTED MANUSCRIPT
text-mining and SVM.

The first step involves the collection and preprocessing of raw message data.

We downloaded these raw message data from the YJF message boards using

Perl programming written by the authors. We collect the raw message data,

PT
which include the posting date and time, title, and comment. We deleted noise

by removing all punctuation marks and non-Japanese text characters. Training

RI
data sets should preferably contain as much relevant information as possible,

SC
and noise elimination is reported to significantly improve the results (Li & Shi,

U
2002; Yi, Liu, & Li, 2003). AN
We prepare the training data sets and construct the SVM in the second step.

First, we randomly select 1,000 messages from the collected message data in
M

each phase 1 and 2. 7 Each of these collections of 1,000 message data forms
D

training data sets. As our message data are not assigned “positive” or “negative”
TE

tags, we manually classify the training data into bullish, bearish, or neutral
EP

messages following Antweiler and Frank (2004) and Das and Chen (2007).

Those messages that represent optimism about future firm performance and
C

market conditions are classified as bullish, while those that represent


AC

pessimistic opinions are classified as bearish. The messages that represent no

clear sentiment about future firm performance and market conditions are

classified as neutral. Subsequently, the keywords that significantly relate to

bullish or bearish sentiment, such as “sell,” “buy,” and “high,” are selected from

the classified messages. Phase 1 includes 260 selected keywords, and Phase 2
11
ACCEPTED MANUSCRIPT
includes 255. The targeted parts of speech are restricted to nouns, verbs, and

adjectives, and include those used for negation. We use the SVM to classify

messages as bullish, bearish, or neutral based on the combination and

frequency of keywords in the message data.

PT
Table 2 illustrates the top 10 most frequently appearing keywords in bullish

and bearish messages, from the training data in Phases 1 and 2. 8 The

RI
frequencies of these keywords play an important role in SVM classification, as

SC
the SVM classifies the messages based on the frequencies of keywords in each

U
message in the decision function. The combination of keywords has important
AN
meanings in the SVM decision function. The meaning and order of the listed

words imply the central concerns of many messages, by sentiment and phase.
M

The negation word for bearish messages in both phases ranked higher than the
D

word “buy,” suggesting an unwillingness to buy. This is not the case in bullish
TE

messages, in which the word “buy” ranked third in Phase 1, with the negation
EP

word not being ranked, and preceded the negation word in Phase 2, suggesting

a willingness to buy. A change in sentiment can be observed from Phases 1 to 2.


C

The term meaning “first trading price below the offer price” appears in bearish
AC

messages in both phases, and its ranking increases, from tenth in Phase 1 to

second in Phase 2. This implies increased concerns regarding the risk of the first

trading price falling below the offer price.

[Please insert Table 2]


12
ACCEPTED MANUSCRIPT

The third step involves the constructed SVM classifying all the messages.

First, all preprocessed messages are parsed using RMeCab, and our special

dictionary added 167 words to exclude troublesome or noisy information. Mori

PT
and Neubig (2014) demonstrate that adding words to a dictionary improves the

tokenizer’s accuracy. As Japanese sentences do not include spaces between

RI
words, a tokenizer is necessary to parse the text prior to the text analysis. We

SC
use RMeCab, an R package for Japanese morphological analyses, to split the

U
message data into the parts of speech. We converted all messages into a
AN
frequency of keywords matrix. The SVM classifies all the messages into three

sentiment classes by learning the keywords’ frequencies.


M

The SVM is developed by Vapnik (1995), and it has been applied in various
D

research fields, indicating excellent classification performance. If we do not limit


TE

our interest solely to IPO-related literature, we discover several works that


EP

extract investor sentiment from Internet stock message boards or news articles,

such as those by Antweiler and Frank (2004), Maruyama et al. (2008), and
C

Okada et al. (2013). These works use SVM in text classification.


AC

The basic idea of SVM is to project the input data to a high-dimensional

feature space using implicit mapping, Φ , often referred as “kernel trick,” to

construct the maximum margin separating the hyper plane among the different

data classes. Let k be the number of classes. Given N training data ( x 1 ,y 1 ),…,

( x N ,y N ), where i=1,…,N and y i ∈{1,…,k} are the class labels of x i . In our problem, x i
13
ACCEPTED MANUSCRIPT
is the expressed frequency of keywords in each message, and three class labels

y i is from the set {bull, neutral, bear}.

We conduct three-class SVM classification by using multi-class

classification method proposed by Crammer and Singer (2002) which solves the

PT
following single optimization problem given the data from all classes:

RI
$
1
min ‖ ‖ + "#
, 2 !

SC
#

subject to 〈&('# ), (# 〉 − 〈&('# ),


U
〉 ≥ ,# − "# (- = 1, ⋯ , !)
AN
,# = 1 − /(#, .
M

where
D

ξ is the slack variable, and C controls the slack variable penalty and
TE

misclassification. Then the decision function is


EP

argmax ,⋯, 〈&('# ), 〉.


C

SVM implementation is by ksvm function from the kernlab package


AC

(Karatzoglou, Smola, Hornik, & Zeileis, 2004) in the R version 3.1.0. A Gaussian

radial basis function kernel is used, K(x i , x j ) = exp(-σ||x i - x j || 2 ) , and σ is set

based on the heuristic described by Caputo, Sim, Furesjo, and Smola (2002).

The cost function C is set as one.

Table 3 compares the SVM and manual classification results from Phases 1

14
ACCEPTED MANUSCRIPT
and 2. The training data sets exhibit a very small prediction error.

[Please insert Table3]

PT
We calculate three proxy variables for investor sentiment, following the

works of Antweiler and Frank (2004) and Maruyama et al. (2008). First, the

RI
natural logarithm of the number of messages, Ln (Number of messages), is

SC
considered a proxy for investor attention. As the number of messages increases,

U
IPO firms are more likely to attract attention and become more familiar to
AN
investors.

Second, the Bullishness index is defined, as follows:


M

1 + GℎC B>H,CI JK ,>??-@ℎ HC@@LMC@#


D

=>??-@ℎBC@@ -BDC'# = EB F N (1)


1 + GℎC B>H,CI JK ,CLI-@ℎ HC@@LMC@#
TE
EP

We assume that a positive (or negative) Bullishness index reflects bullish (or

bearish) investor sentiment.


C

Third, the Agreement index is defined as follows:


AC

OMICCHCBG -BDC'# =
(2)
|GℎC B>H,CI JK ,>??-@ℎ HC@@LMC@# − GℎC B>H,CI JK ,CLI-@ℎ HC@@LMC@# |
GℎC B>H,CI JK ,>??-@ℎ HC@@LMC@# + GℎC B>H,CI JK ,CLI-@ℎ HC@@LMC@#

The Agreement index ranges from zero to one. When the Agreement index is
15
ACCEPTED MANUSCRIPT
close to one (or zero), the investor sentiment variance is small (or large).

4.2 Empirical analysis methods

PT
The previous subsection discussed the measurement of investor sentiment.

RI
This subsection explains the regression models designed to test the three

SC
hypotheses regarding the relationship between investor sentiment and IPO

U
puzzles. AN
We investigate Hypothesis 1 by estimating the following logistic regression

model:
M

PI-QC ICR-@-JB D>HHS# = TU + V WX#,YZ[\] + V ^LI_CG QJBD-G-JB# + V` XXa#


D

+Vb OXc# + Vd ef=c# + Vg Ohf# + Vi j ,LQ_CD D>HHS# (3)


+Vk lBDCI I-GCI D>HHS# + Vm hIJ@@ nIJQCCD@# + o#
TE
EP

The dependent variable in Equation (3) is the Price revision dummy, which

equals one if the offer price is determined at the filing range’s maximum point,
C
AC

and zero otherwise. We also focus on proxy variables for investor sentiment (IS).

Equation (3) substitutes Ln (Number of messages), Bullishness index, and

Agreement index for IS. Other effects on Price revision is controlled by including

the following variables in the equation. Market condition is the buy and hold

return (BHR) in the reference portfolio over the 60-day period preceding the

16
ACCEPTED MANUSCRIPT
IPOs. We capture industry trends by constructing a reference portfolio based on

the NIKKEI medium-classification industry codes. Each reference portfolio

includes all firms within the same industry code as the IPO firms. Further, BHR is

calculated as a value-weighted return, and firms in the reference portfolio must

PT
have been listed for at least three years to restrict any new-listing bias.

Ljungqvist and Wilhelm (2003) argue that the secondary sales of managers and

RI
venture capital positively relate to the offer price. We include secondary sales

SC
shares divided by outstanding shares (SSR). AST is the natural logarithm of the

U
total assets and DEBT is the ratio of debt to total assets. AGE is the natural
AN
logarithm of firm age, which is defined as the number of years between the

founding and IPO dates. The VC backed dummy represents venture capital
M

backed IPOs and assumes a value of one if true, and zero otherwise. 9 The
D

Underwriter dummy equals one if the lead underwriter includes one of the big
TE

three Japanese securities companies, “Nomura,” “Daiwa,” or “Nikko,” and zero

otherwise. 10 The Gross proceeds is the natural logarithm of the offering price
EP

multiplied by the number of shares offered. All of our models include industry,
C

year, and market dummies. 11 Our equations do not include a control variable for
AC

news articles, as we believe news has only a slight effect on our results. Most

firms are not covered by newspapers (Fang & Peress, 2009). Among the

newspaper articles that do cover IPOs, some report only factual information,

such as the IPO date.

We test Hypothesis 2 using the following OLS regression model:


17
ACCEPTED MANUSCRIPT

WB-G-L? ICG>IB# = TU + V WX#,Yp]q rYs tZ[\] up YZ[\] + V PI-QC ICR-@-JB#


+V` ^LI_CG QJBD-G-JB# + Vb XXa# + Vd OXc# + Vg ef=c# + Vi Ohf#
(4)
+Vk j ,LQ_CD D>HHS# + Vm lBDCI I-GCI D>HHS#
+V U hIJ@@ nIJQCCD@# + o#

In Equation (4), the dependent variable is the Initial return. Let P 0 denote the

PT
closing price on the first trading day, and P offer be the offer price; the Initial

RI
return is defined as (P 0 / P offer ) - 1. We substitute Ln (Number of messages),

SC
Bullishness index, and Agreement index for IS in each period of the Pre-IPO

phase, or Phase 2 in Equation (4). The control variables in Equation (4) are the

U
same as those in Equation (3). 12 Hanley (1993) demonstrates that Price revision,
AN
which is the offer price divided by the midpoint of the filing range minus one,
M

positively associates with Initial return. Thus, we add Price revision to Equation
D

(4).
TE

Hypothesis 3 is tested by estimating the following OLS regression model:


EP

=vOa# = TU + V WX#,Yp]qrYs tZ[\] up YZ[\] + V OXc# + V` ef=c# + Vb Ohf#


+Vd j ,LQ_CD D>HHS# + Vg lBDCI I-GCI D>HHS# (5)
+Vi hIJ@@ nIJQCCD@# + o#
C
AC

The dependent variable in Equation (5) is the buy and hold abnormal return

(BHAR), defined as the BHR of issuing firms either 250 or 500 days after the IPO

minus those of a control firm. We match an IPO firm to a control firm in the same

industry and with the closest market value at 21 days after the IPO date. 13 When

18
ACCEPTED MANUSCRIPT
the control firm delisted, TOPIX is spliced. Equation (5) substitutes the IS with

the Ln (Number of messages), Bullishness index, and Agreement index. We use

the control variables AST, DEBT, and AGE.

PT
5. Data and sample statistics

RI
5.1 Sample selection

U SC
We collect an initial list and the details of IPO firms from “Trader ’s Web,”
AN
“eol,” and “Japanese IPO White papers.” Message data are obtained from the

YJF message boards for the 2001-2010 period. Our IPO firms’ financial data and
M

stock prices are sourced from Nikkei NEEDS Financial QUEST.


D

For our final sample, we selected 654 IPO firms satisfy the following
TE

requirements: 1) the availability of messages posted before the filing range has
EP

been set, 2) no IPO suspension or cancelation, 3) the availability of financial and

stock price data, and 4) non-inclusion of the finance sector. Our final sample
C

includes 129,676 messages, as those including URLs are removed from our
AC

sample to avoid spam messages.

5.2 Descriptive statistics

Panel A in Table 4 provides summary statistics for the IPO firms’


19
ACCEPTED MANUSCRIPT
characteristics. The offer prices are higher than the filing range’s midpoint, and

86% of the issuing firms have selected the filing range’s maximum price as the

offer price. The median of initial returns is 32.88%. The median of BHAR for 250

days after IPOs is −18.65%, and 63% of the IPO firms have negative returns.

PT
Panel B in Table 4 presents descriptive statistics for investor sentiment in

each period: the pre-IPO phase, Phase 1, and Phase 2. An average of 198

RI
messages were posted in the pre-IPO phase, of which 18.3 are bullish and 17.8

SC
are bearish. The Bullishness index is positive in both the pre-IPO phase and

U
Phase 2. A positive Bullishness index is consistent with Zhang and Swanson’s
AN
(2010) argument, in that the messages on Internet stock message boards are

likely to be bullish, as shareholders tend to post bullish messages. However, in


M

Phase 1, the Bullishness index is negative, and the Agreement index is greater
D

than that in Phase 2. This result might be partially derived from the small number
TE

of messages in Phase 1, being less than one-third of those in Phase 2. 14


EP

Table 5 indicates the variables’ Pearson and Spearman correlations.

Although there is a significant and positive correlation between AST and both
C

DEBT and AGE, removing them does not alter the results, so we retain these
AC

variables in our equations. The correlation of the number of messages between

Phases 1 and 2 is greater than 0.6. Further, investor attention changes in the

same direction and is stable before IPOs. In other words, the correlation of

Bullishness between Phases 1 and 2 is low and not significant; investor

bullishness differs between Phases 1 and 2. This result implies that if investors
20
ACCEPTED MANUSCRIPT
focus on the set offer price and own allocated shares in Phase 1, they are then

concerned with the first trading price in Phase 2.

[Please insert Table4]

PT
6. Empirical results

RI
This section tests our three hypotheses. Hypothesis 1 predicts a positive

SC
relationship between investor sentiment in Phase 1 and the Price revision

U
dummy, and we test it by estimating the logistic regression in Equation (3). Table
AN
6 provides the results of logistic regression in Equation (3), in which we use IS in

Phase 1. 15 Column (1) shows that the coefficients of Ln (Number of messages)


M

are positive and statistically significant, and suggests that investor attention
D

positively relates to price revision. Column (2) shows that the coefficient of the
TE

Bullishness index is positive and statistically significant, and suggests that


EP

bullish investor sentiment positively relates to price revision. In addition, column

(4) shows that the coefficient of the Ln (Number of messages) and Bullishness
C

index is positive and significant, and suggests that both investor attention and
AC

bullish investor sentiment positively affect price revision. These results support

Hypothesis 1 and imply that investor attention and bullish investor sentiment

motivates the issuing firm and underwriter to set higher offering prices.

Additionally, we found a significant relationship between the Price revision

dummy and Market condition.


21
ACCEPTED MANUSCRIPT
Hypothesis 2 is tested using the regression model (4). 16 Hypothesis 2

predicts that investor sentiment positively relates to initial returns. The results of

Equation (4) are reported in Table 7. The OLS regression Equation (4) from

Panel A in Table 7 assigns the pre-IPO phase’s sentiment variables to IS, and

PT
that in Panel B inserts the sentiment variables in Phase 2 for IS. Columns (1)

and (4) of both Panel A and B show a positive and statistically significant

RI
relationship exists between Ln (Number of messages) and Initial return, and

SC
suggest that investor attention leads to high initial returns. Column (2) and (4) of

U
both Panel A and B reveal that the Bullishness index have a positive and
AN
statistically significant relationship with Initial return, and suggest that bullish

investor sentiment prior to IPO contributes to high initial returns. Column (3) of
M

both Panel A and B shows that the Agreement index significantly negatively
D

relates to Initial return, however, column (4) of both Panel A and B shows that
TE

Agreement index positively relates to Initial return. In addition, when we


EP

estimate the regression in equation (4) using a sub-sample that satisfies the

criterion that the number of classified bull / bear messages are above the
C

median, we find that the coefficient of Agreement index is positive and the effect
AC

of Agreement index on Initial return is mixed. 17 These results are consistent

with Hypothesis 2 and suggest that investor attention and bullish investor

sentiment prior to IPO increase first trading price.

Table 8 reports the results of the OLS regression in Equation (5) in testing

Hypothesis 3, which predicts a negative relationship between investor sentiment


22
ACCEPTED MANUSCRIPT
and post-IPO stock returns. In the regression in Equation (5), the dependent

variable is BHAR, which is computed by subtracting the BHR of the control firm

in the same industry and the closest market value at 21 days after the IPO date,

based on returns 250 or 500 days after IPO. 18 Panel A in Table 8 illustrates the

PT
results of Equation (5), in which the pre-IPO phase’s sentiment variables are

used for IS. Columns (1), (4), (5), and (8), of Panel A reveal that the coefficient

RI
of Ln (Number of messages) is negative and significant for both horizons: 250

SC
and 500 days after IPO. Similarly, Columns (1), (4), (5), and (8) of Panel B in

U
Table 8, in which the Phase 2 sentiment variables are used as IS, reveal a
AN
negative relationship between Ln (Number of messages) and the BHAR with

250-day and 500-day horizons. 19 These results support Hypothesis 3 and


M

suggest that investor attention leads to low post-IPO stock returns. Additionally,
D

columns (2) and (4) of both Panels A and B in Table 8 indicate a positive
TE

relationship between the Bullishness index and the BHAR with a 250-day
EP

horizon. Columns (6) and (8) of Panel B show that the coefficient of the

Bullishness index is positive. However, columns (6) and (8) of Panel A show that
C

the coefficient of the Bullishness index is negative. These results do not


AC

consistently support Hypothesis 3, but suggest that investor bullishness partially

predicts post-IPO stock returns. Adding Initial return to the model obtains a

positive and statistically significant relationship between Initial return and the

BHAR in both periods. These results suggest that post-IPO underperformance is

caused by investor attention prior to the IPO, and the first trading day’s high
23
ACCEPTED MANUSCRIPT
closing price that is influenced by investor sentiment prior to the IPO.

[Please insert Table5]

[Please insert Table6]

PT
[Please insert Table7]

[Please insert Table8]

RI
SC
7. Conclusion

U
AN
This study investigated whether pre-IPO investor sentiment on Internet stock

message boards relates to the IPO puzzles of high initial returns and low
M

post-IPO stock returns. Investor sentiment for each issuing firm was measured
D

from YJF message boards by using text-mining techniques and SVM.


TE

We found the following three relationships. First, investor attention and


EP

sentiment positively relate to the likelihood that IPO firms set their offer price at

the filing range’s maximum point. Second, high investor attention and bullish
C

sentiment relate to higher initial returns. Finally, investor attention negatively


AC

relates to post-IPO stock returns.

These findings suggest that investors’ excessive attention and bullishness

lead to determining the offer price at the filing range’s maximum point and

increase the first trading price; and that the trading price pushed up by pre-IPO

investor sentiment then falls. Our evidence is the first empirical demonstration of
24
ACCEPTED MANUSCRIPT
this relationship between investor sentiment on Internet stock message boards

and IPO puzzles. Our results suggest that the extent of initial returns and

post-IPO stock performance is predictable by looking at pre-IPO sentiment.

This study focused on the static relationship between investor sentiment and

PT
stock price behavior after IPOs, but this analysis could be dynamically extended

as a direction for future research. Takada and Kitajima (2014) demonstrated

RI
SVM price trends’ classification performance, which significantly outperformed

SC
the buy-and-hold strategy. A combination of this and continuously measured

U
investor sentiment could detect dynamic changes in the relationship between
AN
investor sentiment and price trends after IPOs.
M

References
D

Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information
TE

content of Internet stock message boards. Journal of Finance, 59(3),


EP

1259-1294.

Barber, B. M., & Lyon, J. D. (1997). Detecting long-run abnormal stock returns:
C

The empirical power and specification of test statistics. Journal of Financial


AC

Economics, 43(3), 341-372.

Benveniste, L. M., & Spindt, P. A. (1989). How investment bankers determine the

offer price and allocation of new issues. Journal of Financial Economics,

24(2), 343-361.

Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market.
25
ACCEPTED MANUSCRIPT
Journal of Computational Science, 2(1), 1-8.

Brav, A., & Gompers, P. A. (1997). Myth or reality? The long-run

underperformance of initial public offerings: evidence from venture and

nonventure capital-backed companies. Journal of Finance, 52(5),

PT
1791-1821.

Caputo, B., Sim, K., Furesjo, F., & Smola, A. (2002). Appearance-based object

RI
recognition using SVMs: Which kernel should I use? Proc of NIPS Workshop

SC
on Statistical Methods for Computational Experiments in Visual Processing

U
and Computer Vision. AN
Carter, R. B., Dark, F. H., & Singh, A. K. (1998). Underwriter reputation, initial

returns, and the long-run performance of IPO stocks. Journal of Finance,


M

53(1), 285-311.
D

Chan, Y. (2014). How does retail sentiment affect IPO returns? Evidence from
TE

the Internet bubble period. International Review of Economics and Finance,


EP

29, 235-248.

Cornelli, F., Goldreich, D., & Ljungqvist, A. (2006). Investor sentiment and
C

pre-IPO markets. Journal of Finance, 61(3), 1187-1216.


AC

Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes

for multiclass problems. Machine Learning, 47(2-3), 201-233.

Da, Z., Engelberg, J., & Gao, P. (2011). In search of attention. Journal of

Finance, 66(5), 1461-1499.

Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from
26
ACCEPTED MANUSCRIPT
small talk on the web. Management Science, 53(9), 1375-1388.

Derrien, F. (2005). IPO pricing in “hot” market conditions: Who leaves money on

the table?. Journal of Finance, 60(1), 487-521.

Dorn, D. (2009). Does sentiment drive the retail demand for IPOs?. Journal of

PT
Financial and Quantitative Analysis, 44(1), 85-108.

Fang, L., & Peress, J. (2009). Media coverage and the cross-section of stock

RI
returns. Journal of Finance, 64(5), 2023-2052.

SC
Hanley, K. W. (1993). The underpricing of initial public offerings and the partial

U
adjustment phenomenon. Journal of Financial Economics, 34(2), 231-250.
AN
Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content

analysis. Journal of Financial Economics, 110(3), 712-729.


M

Jenkinson, T., & Ljungqvist, A. (2001). Going public: The theory and evidence on
D

how companies raise equity finance. New York: Oxford University Press.
TE

Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). Kernlab - An S4
EP

package for kernel methods in R. Journal of Statistical Software, 11(9),

1-20.
C

Li, X., & Shi, Z. (2002). Innovating web page classification through reducing
AC

noise. Journal of Computer Science and Technology, 17(1), 9-17.

Ljungqvist, A., & Wilhelm, W. J. (2003). IPO pricing in the dot-com bubble.

Journal of Finance, 58(2), 723-752.

Ljungqvist, A., Nanda, V., & Singh, R. (2006). Hot markets, investor sentiment,

and IPO pricing. Journal of Business, 79(4), 1667-1702.


27
ACCEPTED MANUSCRIPT
Loughran, T., & Ritter, J. R. (1997). The operating performance of firms

conducting seasoned equity offerings. Journal of Finance, 52(5),

1823-1850.

Maruyama, K., Umehara, E., Suwa, H., & Ota, T. (2008). Relationship between

PT
Internet message board content and stock markets. Securities Analysts

Journal, 46, 110-127.

RI
Mori, S., & Neubig, G. (2014). Language resource addition: dictionary or

SC
corpus?. Proceedings of the Ninth International Conference on Language

U
Resource and Evaluation. AN
Okada, K., Yamasaki, T., Sakakibara, S., & Yamasaki, T. (2013). Stock market

seasonality and investor sentiment -a text mining approach-. Securities


M

Analysts Journal, 51, 96-105.


D

Ritter, J. R. (1991). The long-run performance of initial public offerings, Journal


TE

of Finance, 46(1), 3-27.


EP

Ritter, J. R., & Welch, I. (2002). A review of IPO activity, pricing, and allocations.

Journal of Finance, 57(4), 1795-1828.


C

Takada, T., & Kitajima, T. (2014). Phase classification of stock markets by


AC

support vector machine. OCU-GSB Working Paper No. 201408.

Vapnik, V. N. (1995). The nature of statistical learnings theory. New York:

Springer-Verlag.

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator

and a direct test for heteroskedasticity. Econometrica, 48(4), 817-838.


28
ACCEPTED MANUSCRIPT
Yi, L., Liu, B., & Li, X. (2003). Eliminating noisy information in web pages for

data mining. Proceedings of the Ninth ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining.

Zhang, Y., & Swanson, P. E. (2010). Are day traders bias free? Evidence from

PT
Internet stock message boards. Journal of Economics and Finance, 34(1),

96-112.

RI
SC
Acknowledgements

U
We are grateful to the Editor, Professor Carl R. Chen and two anonymous
AN
referees for their helpful comments. We thank participants of Finance Camp

2014 (Japan Financial Association) for useful suggestions on an earlier version


M

of this paper. This work was supported by JSPS KAKENHI, Grant Number
D

26885065 for the first author, and JSPS KAKENHI, Grant Number 23330108 for
TE

the last author.


C EP

1
Bollen, Mao, and Zeng (2011) measure public mood from tweets to predict the stock
market. However, Twitter does not separate the measurement of investor sentiment for
AC

each firm.
2
Maruyama et al. (2008) extract investor sentiment from YJF message boards and
indicate a correlation between investor bullishness and daily returns. Okada et al.
(2013) extract sentiment from Nikkei newspapers to discover sentiment seasonality in
Japan.
3
Each user is identified by the first three characters of the provided user ID. We treat
cases in which one user possesses two user IDs as two different users.
4
Both investors and consumers search by company name using the Google search
engine. The measurement of investor attention based on a search volume index in
Google may therefore include not only investor attention, but also consumer interest.
5
The investors in our sample posted the first message on the message board an
average of 27 days before the IPO date.
6
The “partial adjustment phenomenon” proposed by Benveniste and Spindt (1989)
29
ACCEPTED MANUSCRIPT
and Hanley (1993) suggests that underwriters would compensate investors who offer
truthful information by allocating them more shares, and shares with high initial returns.
These authors indicated that the offer prices or price revisions positively relate to
initial returns.
7
We randomly selected 100 messages from each year between 2001 and 2010.
8
Removal of “do," "be," and "become" from our keywords did not significantly change
our results.
9
Brav and Gompers (1997) demonstrate that VC backed IPOs outperform non-VC
backed IPOs.
10
Carter, Dark, and Singh (1998) find that IPOs managed by more prestigious

PT
underwriters have less initial returns and post-IPO underperformance.
11
It has been documented that the post-IPO stock returns from issuing firms in the
second-board stock market is lower than that from issuing firms in the main market.
12
Ljungqvist and W ilhelm (2003) demonstrate that initial returns decrease with the

RI
secondary sales of managers and venture capital.
13
Although Barber and Lyon (1997) indicate that an appropriate control firm is similar
in size (market value) and book-to-market, we cannot obtain an accurate

SC
book-to-market before IPO.
14
In cases where some IPO firms have no classified bull / bear messages in phase 1,
we cannot obtain an Agreement index, and thus exclude this sample from the analysis.
15
We obtained results similar to Table 6 when we estimated the logistic regressions

U
for Equation (3); the first model used investor sentiment in the book-building period
instead of Phase 1, and the second model uses a binomial variable for the dependent
AN
variable, which equals one if the offer price is determined to be greater than the
midpoint of the filing range, and zero otherwise.
16
We obtain results similar to Table 7 by estimating Equation (4), in which the
dependent variables are the closing prices for the five trading days after the IPO date,
M

divided by the offer price minus one.


17
Sub-sample analysis indicates that Ln (Number of messages) and Bullishness index
are significantly positively related to Initial return.
D

18
We obtain similar results when we use the TOPIX and JASDAQ indexes and the
industry portfolio as alternatives to the control firm to control for BHAR, and when
TE

BHAR is measured 125 or 750 days after the IPOs. Moreover, we verify the results by
winsorizing all the variables at the 1st and 99th percentiles.
19
W hen we control the BHR of a control firm with closest market value or reference
portfolio, we find a statistically significant negative relationship between Ln (Number of
EP

messages) in Phase 2 and post-IPO stock returns.


C
AC

30
ACCEPTED MANUSCRIPT
Table 1: Example of message data

No Date Time ID Title Comment


最 近 の新 興 では珍 しい単 位 株 1000
トップバッター 株 。吉 とでるか、凶 と出 るか。
No.2 2007/2/26 19:01 mat (The first (A thousand unit is a rare case as
batter) recent emerging stocks. Good or
bad?)

PT
2 番目 当 たれ〜!!
No.3 2007/2/26 20:57 tre
(Second) (I want IPO shares!!)

RI
No.4 2007/3/1 16:47 sai 4 444
USJよりもこっちが正 解 !!!!当 た

SC
ってー!!!
No.5 2007/3/1 21:53 kim 5
(This is better than USJ!!!! I want
it!!!)
欲 しいなぁ。。。チョイと他 の BB が終

U
わるまで資 金 不 足 。マネは使 い勝 手
AN
が悪 い。。。
No.6 2007/3/2 21:53 tka 6?7? (I want it… My fund is insufficient
until the other IPO ends. My
M

security company is
inconvenient...)
English translations of the original Japanese texts are shown in parentheses.
D

Since the No.1 message is posted by Yahoo! Japan, the No.2 message is
considered the first message.
TE
C EP
AC
ACCEPTED MANUSCRIPT
Table 2: Frequency of keywords in bullish and bearish messages
within training data
Phase 1 Phase 2
Bullish Bearish Bullish Bearish
messages messages messages messages
Ra Freq- Freq- Freq- Freq-
Word Word Word Word
nk uenc y uenc y uenc y uenc y
1 する (do) 87 する (do) 127 する (do) 113 する (do) 123

PT
初 値 (first
2 いる (be) 37 いる (be) 47 ある (be) 51 trading 51
price)

RI
3 買 う (buy) 29 ある (be) 31 いる (be) 51 いる (be) 47
初 値 (first
なる なる
4 倍 (times) 23 27 trading 49 36

SC
(become) (become)
price)
公募割れ
(first trading

U
なる ない なる
5 20 25 43 price below 35
(become) (negation) (become)
AN
the offer
price)
会社 ない
6 ある (be) 18 19 倍 (times) 38 31
M

(firm) (negation)
会社 思う
7 14 買 う (buy) 19 36 買 う (buy) 27
(firm) (expect)
D

初値
(first
TE

8 13 倍 (times) 18 買 う (buy) 31 てる (ing) 26


trading
price)
EP

高い
期待 (relatively ない
9 12 18 24 ある (be) 24
(expect) expensive (negation)
C

)
公募割れ
AC

(first trading
いい 銘柄
10 11 price below 17 売 る (sell) 23 24
(nice) (firm)
the offer
price)
English translations of the original Japanese texts are shown in parentheses.

2
ACCEPTED MANUSCRIPT
Table 3: SVM classification accuracy within the training data
Phase 1 Phase 2
Classification by SVM
Bearish Neutral Bullish Bearish Neutral Bullish
Bearish 121 2 0 134 4 1
Manual
Neutral 0 777 1 2 717 0
Classification
Bullish 0 13 86 1 11 130

PT
This table provides comparative results of manual classification versus
classification by SVM within the training data.

RI
U SC
AN
M
D
TE
C EP
AC

3
ACCEPTED MANUSCRIPT
Table 4: Summary statistics
Panel A: IPO and issuing firm characteristics
Mean Median SD 1 s t Qu antile 3 t h Quantil e

AST 8.460 8.263 1.442 7.476 9.272


DEBT 0.581 0.618 0.220 0.412 0.763
AGE 2.807 2.867 0.785 2.180 3.414
Price revision (%) 5.683 6.250 5.520 4.027 8.409
Initial return (%) 64.145 32.878 98.004 2.798 95.590
250-day BHAR (%) -3.676 -18.645 114.283 -48.973 22.692

PT
500-day BHAR (%) 0.820 -26.519 179.827 -64.793 27.517
SSR (%) 8.567 7.792 6.899 4.281 11.581

RI
Panel B: Message board statistics and investor sentiment
Mean Median SD 1 s t Qu antile 3 t h Quantil e

SC
Pre-IPO phase 198.4 116.0 264.6 67.0 214.8
Number of
Phase 1 45.3 27.0 63.0 16.0 49.0
messages
Phase 2 153.1 88.0 216.5 46.0 169.0

U
Pre-IPO phase 4.838 4.754 0.905 4.205 5.369
Ln (Number of
AN
Phase 1 3.380 3.296 0.874 2.773 3.892
messages)
Phase 2 4.504 4.477 1.004 3.829 5.130
Pre-IPO phase 0.038 0.041 0.561 -0.288 0.367
Bullishness
M

Phase 1 -0.104 0.000 0.732 -0.693 0.405


index
Phase 2 0.082 0.076 0.588 -0.288 0.405
Pre-IPO phase 0.242 0.188 0.210 0.091 0.333
D

Agreement
Phase 1 0.556 0.500 0.396 0.200 1.000
index
TE

Phase 2 0.268 0.200 0.234 0.102 0.343


Panel A shows the descriptive statistics for the IPO firms in the data. The variables are
the natural logarithm of total assets (AST); debt divided by total assets (DEBT); and
EP

the natural logarithm of firm age (AGE); the offer price is normalized by the midpoint of
the filing range (Price revision); the closing price on the first trading day is normalized
by the offer price minus one (Initial return); IPO firms’ return 250 or 500 days after IPO
C

subtracted by those from a control firm in the same industry and with the closest
market value at 21 days after IPO date (BHAR); and secondary sales shares are
AC

divided by outstanding shares (SSR). Panel B shows statistics of the message boards
and investor sentiment proxies.

4
ACCEPTED MANUSCRIPT

Table 5: Correlations among variables

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
(1) Ln (Number *** *** *** *** * ** ***
-0.062 -0.412 0.643 -0.001 -0.242 0.282 -0.066 -0.062 -0.095 -0.259
Pha of messages)

PT
(2) se 1 Bullishness -0.077 **
-0.092 **
-0.016 0.063 -0.021 -0.028 -0.014 -0.167 ***
-0.084 **
-0.054
(3) Agreement -0.436 ***
-0.075 *
-0.297 ***
0.027 0.181 ***
-0.070 *
0.025 -0.052 0.034 0.116 ***

RI
(4) Ln (Number *** *** *** *** ** *** ***
0.676 -0.021 -0.320 0.038 -0.381 0.264 0.015 -0.083 -0.115 -0.304
Pha of messages)

SC
(5) se 2 Bullishness 0.006 0.064 0.020 0.062 0.171 ***
0.090 **
-0.028 -0.061 0.017 0.058
(6) Agreement -0.275 ***
-0.036 0.182 ***
-0.445 ***
0.081 **
-0.138 ***
0.023 0.005 0.053 0.118 ***

(7) Market

U
*** * *** ** ***
0.281 -0.035 -0.076 0.260 0.095 -0.148 0.010 -0.036 0.002 -0.032
condition

AN
(8) SSR -0.039 -0.047 0.019 0.044 -0.011 0.000 0.000 0.184 *** -0.028 0.217 ***
(9) AST 0.056 -0.161 ***
-0.073 *
0.053 -0.069 *
-0.005 -0.015 0.232 ***
0.467 ***
0.569 ***
(10) DEBT -0.090 ** -0.087 ** -0.107 *** -0.004 0.403 *** 0.195 ***

M
0.036 0.047 0.020 -0.017
(11) AGE -0.257 *** -0.033 0.120 *** -0.302 *** 0.042 0.148 *** -0.046 0.187 *** 0.502 *** 0.220 ***
This table indicates the Pearson correlations (lower triangle) and the Spearman correlations (upper triangle); ***, **, and * indicate significance at the 1%, 5%, and 10%

D
levels, respectively.

TE
C EP
AC

5
ACCEPTED MANUSCRIPT
Table 6: Investor sentiment in Phase 1 and price revision
Dependent variable: Price revision dummy
(1) (2) (3) (4)
IS(investor sentiment proxies) Phase 1
Ln (Number 0.868 *** 0.950 ***
of messages) (3.423) (2.960)
Bullishness 0.384 * 0.412 *
index (1.891) (1.868)
Agreement -0.323 0.355

PT
index (-0.724) (0.721)

RI
Market 0.070 *** 0.087 *** 0.073 *** 0.062 ***
condition (4.150) (5.388) (4.205) (3.367)
SSR -0.055 * -0.080 *** -0.069 ** -0.048

SC
(-1.742) (-2.639) (-2.055) (-1.385)
AST -0.475 ** -0.552 ** -0.651 *** -0.562 **
(-2.093) (-2.499) (-2.702) (-2.247)

U
DEBT 1.037 1.127 0.945 1.187
AN
(1.268) (1.403) (1.053) (1.270)
AGE 0.263 0.199 0.298 0.317
(1.049) (0.796) (1.118) (1.176)
M

VC backed 0.593 * 0.544 0.461 0.491


dummy (1.685) (1.586) (1.232) (1.270)
Underwriter -0.897 ** -0.819 ** -0.408 -0.403
D

dummy (-2.433) (-2.236) (-1.053) (-1.007)


Gross -0.231 0.168 0.109 -0.194
TE

proceeds (-1.005) (0.825) (0.486) (-0.744)


Intercept 5.910 1.676 22.129 23.576
(1.618) (0.486) (0.004) (0.004)
EP

Pseudo R 2 0.420 0.402 0.365 0.397


N 654 654 571 571
This table presents the results of the logistic regression in Equation (3). The
C

dependent variable is the Price revision dummy, which equals one if the offer price is
AC

determined at the filing range’s maximum point and zero otherwise. The independent
variables are the investor sentiment proxies (IS); the return of the reference portfolio,
constructed of firms that belong to the same industry as the IPO firms over the 60 days
prior to the IPOs (Market condition); secondary sales shares divided by outstanding
shares (SSR); the natural logarithm of total assets (AST); debt divided by total assets
(DEBT); and the natural logarithm of firm age (AGE); a VC backed dummy that
assumes a value of one if the firm is VC backed, and zero otherwise; an Underwriter
dummy, which equals one if the lead underwriter includes “Nomura,” “Daiwa,” or
“Nikko,” and zero otherwise; and the natural logarithm of the offering price multiplied
by the number of shares offered (Gross proceeds). Our models include an industry
6
ACCEPTED MANUSCRIPT
dummy, year dummy, and market dummy. We report z-statistics in parentheses. ***, **,
and * indicate significance at the 1%, 5%, and 10% levels, respectively.

PT
RI
U SC
AN
M
D
TE
C EP
AC

7
ACCEPTED MANUSCRIPT
Table 7: Investor Sentiment and Initial Returns
Panel A: Investor sentiment before IPOs and initial returns
Dependent variable: Initial return
(1) (2) (3) (4)
IS (investor sentiment proxies) pre-IPO phase

Ln (Number of 37.863 *** 36.229 ***


messages) (6.859) (6.467)
Bullishness 23.790 *** 19.620 ***

PT
index (4.967) (4.123)
Agreement -28.992 ** 2.481
index (-2.125) (0.181)

RI
0.138 0.860 0.893 0.126
Price revision
(0.207) (1.353) (1.379) (0.192)

SC
Market 1.276 *** 1.546 *** 1.566 *** 1.256 ***
condition (4.417) (5.424) (5.487) (4.346)
SSR -0.362 -1.250 ** -1.170 * -0.428
(-0.584)

U
(-2.048) (-1.893) (-0.694)
AN
AST -7.725 * -8.206 * -9.003 ** -6.751
(-1.831) (-1.880) (-2.034) (-1.612)
DEBT -24.844 -34.150 * -33.817 * -25.825
M

(-1.392) (-1.904) (-1.848) (-1.477)


AGE -5.203 -13.823 *** -11.366 ** -7.244
(-1.084) (-2.729) (-2.259) (-1.501)
D

VC backed -0.141 4.941 3.752 0.242


TE

dummy (-0.019) (0.678) (0.501) (0.034)


Underwriter 20.214 *** 16.609 ** 19.047 *** 18.333 ***
dummy (3.034) (2.402) (2.756) (2.764)
EP

Gross -29.481 *** -10.112 ** -12.533 *** -27.693 ***


proceeds (-5.319) (-2.194) (-2.641) (-5.022)
Intercept 569.491 *** 373.110 *** 427.511 *** 539.654 ***
C

(6.528) (4.727) (5.141) (6.219)


Adj. R 2 0.324 0.285 0.270 0.334
AC

N 654 654 654 654

8
ACCEPTED MANUSCRIPT
Panel B: Investor sentiment in Phase 2 and initial returns
Dependent variable: Initial return
(1) (2) (3) (4)
IS (investor sentiment proxies) phase 2

Ln (Number 32.061 *** 32.492 ***


of messages) (6.614) (6.342)
Bullishnes 18.612 *** 13.682 ***
s index (3.859) (2.838)

PT
Agreemen -27.595 ** 5.377
t index (-2.145) (0.411)

RI
Price revision 0.231 0.900 0.926 0.284
(0.346) (1.401) (1.419) (0.429)
Market 1.374 *** 1.498 *** 1.596 *** 1.322 ***

SC
condition (4.825) (5.233) (5.604) (4.635)
SSR -0.504 -1.246 ** -1.110 * -0.541
(-0.812) (-2.021) (-1.772) (-0.866)
AST -7.255 *

U
-8.809 ** -9.217 ** -7.315 *
AN
(-1.702) (-2.015) (-2.052) (-1.713)
DEBT -26.827 -34.643 * -32.618 * -28.071
(-1.501) (-1.920) (-1.766) (-1.583)
M

AGE -5.680 -13.246 *** -11.255 ** -6.627


(-1.173) (-2.619) (-2.249) (-1.357)
VC backed 1.244 4.604 3.772 0.542
D

dummy (0.171) (0.629) (0.497) (0.074)


TE

Underwriter 20.299 *** 17.374 ** 18.635 *** 20.185 ***


dummy (3.039) (2.500) (2.644) (2.977)
Gross -28.169 *** -10.266 ** -12.677 *** -27.084 ***
EP

proceeds (-5.125) (-2.201) (-2.635) (-4.891)


Intercept 584.989 *** 377.235 *** 431.293 *** 563.916 ***
(6.593) (4.725) (5.146) (6.305)
C

Adj. R 2 0.319 0.279 0.270 0.325


N 654 654 654 654
AC

This table presents the results of the OLS regression in Equation (4). The dependent
variable is the closing price on the first trading day normalized by the offer price minus
one (Initial returns). The independent variables are the investor sentiment proxies (IS);
the offer price normalized by the midpoint of the filing range (Price revision); the
returns from the reference portfolio constructed from firms that belong to the same
industry as the IPO firms over the 60 days prior to the IPOs (Market condition);
secondary sales shares divided by outstanding shares (SSR); the natural logarithm of
total assets (AST); debt divided by total assets (DEBT); the natural logarithm of firm
age (AGE); a VC backed dummy that assumes a value of one if the firm is VC backed,

9
ACCEPTED MANUSCRIPT
and zero otherwise; an Underwriter dummy, which equals one if the lead under writer
includes “Nomura,” “Daiwa,” or “Nikko,” and zero otherwise; and the natural logarithm
of the offering price multiplied by the number of shares offered (Gross proceeds). Our
models include an industry dummy, year dummy, and market dummy. The lower step of
each row indicates the t-statistics adjusted for heteroscedasticity errors (W hite, 1980).
***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively.

PT
RI
U SC
AN
M
D
TE
C EP
AC

10
ACCEPTED MANUSCRIPT

Table 8: Investor sentiment and post-IPO stock returns


Panel A: Investor sentiment before IPOs and post-IPO stock returns
Dependent variable: 250-day BHAR 500-day BHAR
(1) (2) (3) (4) (5) (6) (7) (8)
IS (investor sentiment proxies) pre-IPO phase

PT
Ln (Number -10.862 ** -12.441 ** -15.516 ** -13.855 *
of messages) (-2.185) (-2.392) (-2.095) (-1.722)

RI
Bullishness 6.095 7.648 -3.675 -1.945
index (1.154) (1.410) (-0.372) (-0.192)

SC
Agreement 1.889 -9.655 29.382 16.527
index (0.122) (-0.615) (1.183) (0.631)

U
AST 7.603 8.703 * 8.238 * 8.219 * 16.486 * 17.181 * 16.917 * 16.156 *

AN
(1.603) (1.814) (1.728) (1.700) (1.788) (1.924) (1.881) (1.812)
DEBT 28.261 * 29.256 * 29.637 * 27.471 * -4.835 -2.701 -2.445 -4.254
(1.711) (1.765) (1.783) (1.660) (-0.174) (-0.098) (-0.089) (-0.155)

M
AGE -24.777 *** -23.012 *** -22.500 *** -25.642 *** -40.131 *** -36.510 *** -37.293 *** -39.863 ***
(-3.604) (-3.398) (-3.379) (-3.671) (-3.671) (-3.326) (-3.464) (-3.562)

D
VC backed 4.137 2.341 2.389 4.051 -0.060 -2.682 -1.490 0.313
dummy (0.571) (0.327) (0.333) (0.558) (-0.005) (-0.230) (-0.126) (0.027)

TE
Underwriter 0.210 0.647 1.180 -0.522 -1.332 0.417 -0.244 -1.188
dummy (0.026) (0.081) (0.146) (-0.067) (-0.109) (0.034) (-0.020) (-0.096)
EP
0.513 -3.997 -4.138 1.070 -5.607 -12.499 * -11.124 -5.703
Gross proceeds
(0.111) (-0.972) (-0.960) (0.228) (-0.667) (-1.680) (-1.533) (-0.663)
Intercept 22.316 53.637 57.687 18.858 163.699 220.372 * 188.470 155.309
C

(0.324) (0.785) (0.784) (0.255) (1.344) (1.850) (1.570) (1.245)


AC

Adj. R2 0.130 0.125 0.124 0.129 0.133 0.128 0.130 0.130


N 654 654 654 654 654 654 654 654

11
ACCEPTED MANUSCRIPT

Panel B: Investor sentiment in Phase 2 and post-IPO stock returns


Dependent variable: 250-day BHAR 500-day BHAR
(1) (2) (3) (4) (5) (6) (7) (8)
IS (investor sentiment proxies) phase 2
Ln (Number -6.621 -8.067 -9.704 -10.412

PT
of messages) (-1.461) (-1.599) (-1.385) (-1.323)
Bullishness 9.258 * 10.483 * 4.235 5.859

RI
index (1.726) (1.961) (0.484) (0.667)
Agreement 13.780 2.406 12.828 -0.202

SC
index (0.940) (0.159) (0.557) (-0.008)

AST 7.723 8.819 * 8.209 * 8.443 * 16.638 * 17.691 * 17.224 * 17.049 *

U
(1.614) (1.856) (1.689) (1.721) (1.803) (1.949) (1.839) (1.824)

AN
DEBT 28.834 * 28.539 * 29.181 * 27.346 -4.044 -3.401 -3.630 -5.165
(1.745) (1.714) (1.746) (1.635) (-0.146) (-0.122) (-0.130) (-0.183)
AGE -23.973 *** -23.266 *** -22.749 *** -25.263 *** -39.038 *** -37.200 *** -37.021 *** -39.640 ***

M
(-3.544) (-3.426) (-3.375) (-3.642) (-3.653) (-3.369) (-3.402) (-3.614)
VC backed 3.324 2.154 2.791 3.557 -1.185 -2.738 -1.961 -0.786

D
dummy (0.464) (0.303) (0.388) (0.494) (-0.102) (-0.235) (-0.166) (-0.067)
Underwriter 0.520 0.501 1.016 -0.849 -0.914 -0.237 0.662 -1.152

TE
dummy (0.065) (0.062) (0.124) (-0.107) (-0.075) (-0.019) (0.053) (-0.094)
-1.048 -3.894 -3.637 -0.022 -7.719 -12.217 * -11.783 -7.308
Gross proceeds
EP
(-0.220) (-0.942) (-0.835) (-0.005) (-0.923) (-1.651) (-1.500) (-0.867)
Intercept 28.568 51.057 45.070 10.532 171.484 212.964 * 203.129 163.769
(0.402) (0.746) (0.626) (0.143) (1.386) (1.804) (1.423) (1.284)
C

2
Adj. R 0.127 0.127 0.124 0.127 0.130 0.129 0.128 0.127
AC

N 654 654 654 654 654 654 654 654


This table presents the OLS regression equation results (5). The dependent variable is IPO firms’ 250 or 500-day BHAR post-IPO subtracted by those of a control firm
in the same industry and with the closest market value at 21 days after the IPO date. The independent variables are the investor sentiment proxies (IS); the natural
logarithm of total assets (AST); debt divided by total assets (DEBT); the natural logarithm of firm year (AGE); a VC backed dummy that assumes a value of one if the
firm is VC backed, and zero otherwise; an Underwriter dummy, which equals one if the lead underwriter includes “Nomura,” “Daiwa,” or “Nikko,” and zero otherwise;

12
ACCEPTED MANUSCRIPT

and the natural logarithm of the offering price multiplied by the number of shares offered (Gross proceeds). Our models include an industry dummy, year dummy, and
market dummy. The lower-step of each row indicates the t-statistics adjusted for heteroscedasticity errors (White, 1980). ***, **, and * indicate significance at the 1%, 5%,
and 10% levels, respectively.

PT
RI
U SC
AN
M
D
TE
C EP
AC

13
ACCEPTED MANUSCRIPT

Fig. 1: Timeline for IPOs

PT
RI
SC
The No.2 message posted to the thread is considered the first message of the
series, as the No.1 message is posted by Yahoo! Japan.

U
AN
M
D
TE
C EP
AC
ACCEPTED MANUSCRIPT

We examine the effect of investor sentiment on initial returns and post-IPO


stock returns.

PT
This is the first study using text data on message boards to analyze IPO behavior.

Automatic categorization by text-mining and SVM realized analyzing 129,676

RI
messages.

SC
Pre-IPO investor sentiment is measured based on categorized messages on each
firm’s thread.

U
We find excessive optimism leads to the high initial returns and long-run
AN
underperformance.
M
D
TE
C EP
AC

You might also like