Professional Documents
Culture Documents
Tsukioka 2017
Tsukioka 2017
Tsukioka 2017
Investor sentiment extracted from internet stock message boards and IPO puzzles
PII: S1059-0560(17)30817-1
DOI: 10.1016/j.iref.2017.10.025
Reference: REVECO 1528
Please cite this article as: Tsukioka Y., Yanagi J. & Takada T., Investor sentiment extracted from internet
stock message boards and IPO puzzles, International Review of Economics and Finance (2017), doi:
10.1016/j.iref.2017.10.025.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
Yasutomo Tsukioka
School of Business Administration, Kwansei Gakuin University
RI
Phone: +81 7 9854 6340
E-mail: tsukioka@kwansei.ac.jp (corresponding author)
SC
Junya Yanagi
Graduate School of Business, Osaka City University
U
Postal address: 3-3-138 Sugimoto, Sumiyoshi, Osaka 558-8585, Japan
AN
Phone: +81 - [0]6 - 6605 - 2226
E-mail: sweetzunya@gmail.com
M
Teruko Takada
Graduate School of Business, Osaka City University
E-mail: takada@bus.osaka-cu.ac.jp
C EP
AC
ACCEPTED MANUSCRIPT
ABSTRACT
PT
sentiment affects these puzzles. Investor sentiment on each issuing firm’s
pre-IPO is extracted from Yahoo! Japan Finance message boards using text
RI
mining and a support vector machine classification. Message data on 654
SC
Japanese IPOs from 2001-2010 are used to measure investor attention and
U
sentiment. We find that high investor attention and bullish investor sentiment
AN
positively affect IPO offer prices and initial returns, leading to subsequent price
declines. This suggests that excessive investor optimism can partially explain
M
attention
1. Introduction
Initial public offering (IPO) firms reportedly exhibit high initial returns and
Jenkinson and Ljungqvist (2001), these IPO puzzles are caused by the following
1
ACCEPTED MANUSCRIPT
factors based on rational models: asymmetric information, institutional
explanation, or ownership and control. Ritter and Welch (2002) indicate that
PT
investigations have been limited due to the scarcity of information on investor
RI
market organized by security companies, provides investors with opportunities
SC
to trade IPO shares until the IPO date. Cornelli, Goldreich, and Ljungqvist
U
(2006) and Dorn (2009) capture investor sentiment using gray market prices and
AN
illustrate how investor sentiment causes IPO puzzles. One problem with the gray
market is that its data availability is limited only to Europe. Another problem is
M
examine the effect of investor sentiment on IPO puzzles, rather than using grey
from text data. The Yahoo! Finance message board is one of the most attractive
data sources for direct analyses of investor sentiment. Yahoo! Japan Finance
(YJF) message boards are particularly attractive among those from various
2
ACCEPTED MANUSCRIPT
other countries, as this board has many effective features by which to measure
messages before IPOs. Moreover, each listing firm has only one thread and we
PT
This study aims to clarify the effect of investor sentiment on IPO puzzles,
RI
by extracting data from YJF message boards. Substantial data can be derived
SC
from YJF message boards, and it is difficult to exploit all the information by
U
manually classifying posted words. We can apply text-mining and support vector
AN
machines (SVM) techniques to classify sentiment and fully exploit this massive
amount of data. The SVM technique is effective in this regard and can be used to
M
extract opinions from Japanese text data on Internet message boards and
D
Sakakibara, & Yamasaki, 2013). 2 This large amount of text information can
EP
clarify the effects of investor sentiment on IPO puzzles, which have been difficult
This study represents the first attempt in the IPO literature to extract investor
AC
sentiment from Internet stock message boards. Additionally, while we limit our
data period only to the time before IPOs, the size of our dataset is comparable to
literature. We collected 129,676 messages about 654 firms over a 10-year study
posted messages on the message board are first processed and cleaned
PT
through text mining, and classified into bullish, neutral, and bearish messages
RI
percentage of bullish or bearish messages on each IPO firm’s thread. We also
SC
measure the degree of investor attention through the number of messages as a
U
proxy. AN
Our findings are as follows: Investor bullishness and attention both positively
affect the offer prices and initial returns, while higher investor attention leads to
M
lower post-IPO stock returns. These findings indicate the possibility that
D
investors’ excessive optimism causes the IPO puzzles of high initial returns and
TE
Internet stock message boards of the role of investor sentiment regarding IPO
puzzles.
C
characteristics of the YJF message boards and the Japanese IPO process. In
Section 3, we review the related literature and develop our hypotheses. Section
4 presents our research design. Section 5 describes the data and provides a
PT
acquired, enabling investors to search for information regarding other investors’
RI
favorable characteristics for investors: they can use these boards to discuss
SC
their opinions, as well as gather information about other investors’ opinions and
U
behaviors. AN
Among the alternatives, YJF message boards have many attractive
sentiment. First, YJF message boards are considered the best among the many
D
Internet stock message boards in Japan, in terms of their history and size. They
TE
are the oldest Internet stock message boards in Japan, established in July 1998.
EP
They are also the largest message boards in terms of the number of messages
posted.
C
Second, YJF message boards have two attractive features not observed in
AC
other countries’ Yahoo! counterparts: 1) each listing firm has only one thread,
approximately one month before the IPO. For example, each listing firm in the
United States’ Yahoo! Finance message boards has many threads, and most are
of the post number, date, time, poster ID, title, and comment. 3 Since Yahoo!
Japan posts the first message by default, the second message is identified as
the first real message. The first message is typically posted before the filing
PT
range is set or book-building starts. The last message of the day immediately
RI
Figure 1 displays the timeline for the IPO and its distinct periods. The
SC
Japanese IPO process has certain unique features. First, both individual and
U
institutional investors reveal their bid prices in the filing range to the underwriter
AN
under the book-building process. Second, even if investor sentiment and
demand is strong in the book-building period, the filing range is not adjusted,
M
and the offer price is always determined within the filing range. The offer price
D
does not change once it has been set in the filing range. Third, aftermarket
TE
trading begins almost one week after the offer price is set. Finally, in case of
EP
excessive buying orders, the first aftermarket trade in Japan is not necessarily
executed on the IPO date. This is mainly due to the non-existence of market
C
makers, no daily trading limit, and the Itayose method, which allows changing
AC
the first trading price until the bid and ask are matched. By contrast, in the
United States, the offer price can be set outside the filing range, the aftermarket
trading starts one day after the offer price is set, and the first aftermarket trading
occurs on the IPO date. Therefore, for Japan, we consider two periods: Phase 1,
from the time at which the first message is posted until the end of book-building;
6
ACCEPTED MANUSCRIPT
and Phase 2, from the end of book-building until just before the IPO. The
PT
[Please insert Fig. 1]
RI
SC
3. Related literature and hypothesis development
U
AN
Investor sentiment is thought to influence the IPO puzzles of high initial
returns and low post-IPO stock returns (Ritter & Welch, 2002). A few studies
M
have investigated the relationship between investor sentiment and IPO puzzles,
D
and these can be divided into two groups depending on the data on which they
TE
are based.
EP
One approach uses book-building data, or gray market data that are only
that investor demand positively relates to offer prices and initial returns, and
negatively relates to post-IPO stock returns. Cornelli et al. (2006) and Dorn
investigate the causes of IPO puzzles. Cornelli et al. (2006) theoretically and
empirically demonstrate that offer prices and initial returns increase with the
7
ACCEPTED MANUSCRIPT
gray market price, and that stock returns decrease with the gray market price.
The other approach measures investor demand or attention from special data,
(2014) measures investor demand immediately after IPOs using the Trading and
PT
Quote data from the first trading day. Da, Engelberg, and Gao (2011) measure
investor attention using Google search frequencies, and find that investor
RI
attention positively relates to initial returns and negatively relates to post-IPO
SC
stock returns. 4 Jegadeesh and Wu (2013) present a new approach concerning
U
words’ tone in financial reports, and find that tone negatively relates to initial
AN
returns.
We extract the pre-IPO investor sentiment from YJF message boards. Those
M
who post on Internet stock message boards are assumed to be small individual
D
concerning gray market investors. Message posting on the YJF message boards
starts before the filing range is set. 5 Managers and underwriters can monitor the
EP
that offer prices may be affected by investor sentiment; in other words, the more
bullish the Phase 1 investor sentiment, the higher the offer price. This is
because both managers and investors would assume that small investors’
demand and valuations would be higher. This leads to our first empirical
hypothesis:
8
ACCEPTED MANUSCRIPT
price.
PT
The effects of investor sentiment on book-building and offer price are limited
under the Japanese IPO process, in that the filing range is not adjusted even if
RI
investor sentiment and demand are strong. Investors reveal their bid prices and
SC
volumes during the book-building period, and most of the IPO firms’ offer prices
U
are determined at the filing range’s maximum point. Additionally, underwriters
AN
have offset incentives, or incentives to set the filing range at higher levels,
because their fees depend on raising money from IPOs. Alternatively, they have
M
incentive to lower the filing range and offer price to avoid the risk that the IPO
D
stock will not sell out. Thus, the offer price does not fully reflect investor
TE
sentiment. 6 Therefore, the closing price on the first trading day increases with
EP
bullish investor sentiment. In other words, the more bullish the investor
sentiment, the higher the initial returns. Additionally, investor sentiment might
C
hypothesis:
9
ACCEPTED MANUSCRIPT
Newly listed firms are found to have difficulty in maintaining high first-day
trading prices. Ritter (1991) and Loughran and Ritter (1997) find that IPO firms
Nanda, and Singh (2006) conjecture that excessive optimism leads to high
PT
first-day returns and eventually to stock price reversal, which causes long-run
RI
sentiment prior to the IPO will converge to the fundamental value following the
SC
IPO. This leads to our third hypothesis:
U
AN
Hypothesis 3: Investor sentiment in the pre-IPO phase or Phase 2
4. Methodology
EP
The message data collected from the YJF message boards are classified into
raw message data, 2) the preparation of training data for SVM classification
The first step involves the collection and preprocessing of raw message data.
We downloaded these raw message data from the YJF message boards using
Perl programming written by the authors. We collect the raw message data,
PT
which include the posting date and time, title, and comment. We deleted noise
RI
data sets should preferably contain as much relevant information as possible,
SC
and noise elimination is reported to significantly improve the results (Li & Shi,
U
2002; Yi, Liu, & Li, 2003). AN
We prepare the training data sets and construct the SVM in the second step.
First, we randomly select 1,000 messages from the collected message data in
M
each phase 1 and 2. 7 Each of these collections of 1,000 message data forms
D
training data sets. As our message data are not assigned “positive” or “negative”
TE
tags, we manually classify the training data into bullish, bearish, or neutral
EP
messages following Antweiler and Frank (2004) and Das and Chen (2007).
Those messages that represent optimism about future firm performance and
C
clear sentiment about future firm performance and market conditions are
bullish or bearish sentiment, such as “sell,” “buy,” and “high,” are selected from
the classified messages. Phase 1 includes 260 selected keywords, and Phase 2
11
ACCEPTED MANUSCRIPT
includes 255. The targeted parts of speech are restricted to nouns, verbs, and
adjectives, and include those used for negation. We use the SVM to classify
PT
Table 2 illustrates the top 10 most frequently appearing keywords in bullish
and bearish messages, from the training data in Phases 1 and 2. 8 The
RI
frequencies of these keywords play an important role in SVM classification, as
SC
the SVM classifies the messages based on the frequencies of keywords in each
U
message in the decision function. The combination of keywords has important
AN
meanings in the SVM decision function. The meaning and order of the listed
words imply the central concerns of many messages, by sentiment and phase.
M
The negation word for bearish messages in both phases ranked higher than the
D
word “buy,” suggesting an unwillingness to buy. This is not the case in bullish
TE
messages, in which the word “buy” ranked third in Phase 1, with the negation
EP
word not being ranked, and preceded the negation word in Phase 2, suggesting
The term meaning “first trading price below the offer price” appears in bearish
AC
messages in both phases, and its ranking increases, from tenth in Phase 1 to
second in Phase 2. This implies increased concerns regarding the risk of the first
The third step involves the constructed SVM classifying all the messages.
First, all preprocessed messages are parsed using RMeCab, and our special
PT
and Neubig (2014) demonstrate that adding words to a dictionary improves the
RI
words, a tokenizer is necessary to parse the text prior to the text analysis. We
SC
use RMeCab, an R package for Japanese morphological analyses, to split the
U
message data into the parts of speech. We converted all messages into a
AN
frequency of keywords matrix. The SVM classifies all the messages into three
The SVM is developed by Vapnik (1995), and it has been applied in various
D
extract investor sentiment from Internet stock message boards or news articles,
such as those by Antweiler and Frank (2004), Maruyama et al. (2008), and
C
construct the maximum margin separating the hyper plane among the different
data classes. Let k be the number of classes. Given N training data ( x 1 ,y 1 ),…,
( x N ,y N ), where i=1,…,N and y i ∈{1,…,k} are the class labels of x i . In our problem, x i
13
ACCEPTED MANUSCRIPT
is the expressed frequency of keywords in each message, and three class labels
classification method proposed by Crammer and Singer (2002) which solves the
PT
following single optimization problem given the data from all classes:
RI
$
1
min ‖ ‖ + "#
, 2 !
SC
#
where
D
ξ is the slack variable, and C controls the slack variable penalty and
TE
(Karatzoglou, Smola, Hornik, & Zeileis, 2004) in the R version 3.1.0. A Gaussian
based on the heuristic described by Caputo, Sim, Furesjo, and Smola (2002).
Table 3 compares the SVM and manual classification results from Phases 1
14
ACCEPTED MANUSCRIPT
and 2. The training data sets exhibit a very small prediction error.
PT
We calculate three proxy variables for investor sentiment, following the
works of Antweiler and Frank (2004) and Maruyama et al. (2008). First, the
RI
natural logarithm of the number of messages, Ln (Number of messages), is
SC
considered a proxy for investor attention. As the number of messages increases,
U
IPO firms are more likely to attract attention and become more familiar to
AN
investors.
We assume that a positive (or negative) Bullishness index reflects bullish (or
OMICCHCBG -BDC'# =
(2)
|GℎC B>H,CI JK ,>??-@ℎ HC@@LMC@# − GℎC B>H,CI JK ,CLI-@ℎ HC@@LMC@# |
GℎC B>H,CI JK ,>??-@ℎ HC@@LMC@# + GℎC B>H,CI JK ,CLI-@ℎ HC@@LMC@#
The Agreement index ranges from zero to one. When the Agreement index is
15
ACCEPTED MANUSCRIPT
close to one (or zero), the investor sentiment variance is small (or large).
PT
The previous subsection discussed the measurement of investor sentiment.
RI
This subsection explains the regression models designed to test the three
SC
hypotheses regarding the relationship between investor sentiment and IPO
U
puzzles. AN
We investigate Hypothesis 1 by estimating the following logistic regression
model:
M
The dependent variable in Equation (3) is the Price revision dummy, which
equals one if the offer price is determined at the filing range’s maximum point,
C
AC
and zero otherwise. We also focus on proxy variables for investor sentiment (IS).
Agreement index for IS. Other effects on Price revision is controlled by including
the following variables in the equation. Market condition is the buy and hold
return (BHR) in the reference portfolio over the 60-day period preceding the
16
ACCEPTED MANUSCRIPT
IPOs. We capture industry trends by constructing a reference portfolio based on
includes all firms within the same industry code as the IPO firms. Further, BHR is
PT
have been listed for at least three years to restrict any new-listing bias.
Ljungqvist and Wilhelm (2003) argue that the secondary sales of managers and
RI
venture capital positively relate to the offer price. We include secondary sales
SC
shares divided by outstanding shares (SSR). AST is the natural logarithm of the
U
total assets and DEBT is the ratio of debt to total assets. AGE is the natural
AN
logarithm of firm age, which is defined as the number of years between the
founding and IPO dates. The VC backed dummy represents venture capital
M
backed IPOs and assumes a value of one if true, and zero otherwise. 9 The
D
Underwriter dummy equals one if the lead underwriter includes one of the big
TE
otherwise. 10 The Gross proceeds is the natural logarithm of the offering price
EP
multiplied by the number of shares offered. All of our models include industry,
C
year, and market dummies. 11 Our equations do not include a control variable for
AC
news articles, as we believe news has only a slight effect on our results. Most
firms are not covered by newspapers (Fang & Peress, 2009). Among the
newspaper articles that do cover IPOs, some report only factual information,
In Equation (4), the dependent variable is the Initial return. Let P 0 denote the
PT
closing price on the first trading day, and P offer be the offer price; the Initial
RI
return is defined as (P 0 / P offer ) - 1. We substitute Ln (Number of messages),
SC
Bullishness index, and Agreement index for IS in each period of the Pre-IPO
phase, or Phase 2 in Equation (4). The control variables in Equation (4) are the
U
same as those in Equation (3). 12 Hanley (1993) demonstrates that Price revision,
AN
which is the offer price divided by the midpoint of the filing range minus one,
M
positively associates with Initial return. Thus, we add Price revision to Equation
D
(4).
TE
The dependent variable in Equation (5) is the buy and hold abnormal return
(BHAR), defined as the BHR of issuing firms either 250 or 500 days after the IPO
minus those of a control firm. We match an IPO firm to a control firm in the same
industry and with the closest market value at 21 days after the IPO date. 13 When
18
ACCEPTED MANUSCRIPT
the control firm delisted, TOPIX is spliced. Equation (5) substitutes the IS with
PT
5. Data and sample statistics
RI
5.1 Sample selection
U SC
We collect an initial list and the details of IPO firms from “Trader ’s Web,”
AN
“eol,” and “Japanese IPO White papers.” Message data are obtained from the
YJF message boards for the 2001-2010 period. Our IPO firms’ financial data and
M
For our final sample, we selected 654 IPO firms satisfy the following
TE
requirements: 1) the availability of messages posted before the filing range has
EP
stock price data, and 4) non-inclusion of the finance sector. Our final sample
C
includes 129,676 messages, as those including URLs are removed from our
AC
86% of the issuing firms have selected the filing range’s maximum price as the
offer price. The median of initial returns is 32.88%. The median of BHAR for 250
days after IPOs is −18.65%, and 63% of the IPO firms have negative returns.
PT
Panel B in Table 4 presents descriptive statistics for investor sentiment in
each period: the pre-IPO phase, Phase 1, and Phase 2. An average of 198
RI
messages were posted in the pre-IPO phase, of which 18.3 are bullish and 17.8
SC
are bearish. The Bullishness index is positive in both the pre-IPO phase and
U
Phase 2. A positive Bullishness index is consistent with Zhang and Swanson’s
AN
(2010) argument, in that the messages on Internet stock message boards are
Phase 1, the Bullishness index is negative, and the Agreement index is greater
D
than that in Phase 2. This result might be partially derived from the small number
TE
Although there is a significant and positive correlation between AST and both
C
DEBT and AGE, removing them does not alter the results, so we retain these
AC
Phases 1 and 2 is greater than 0.6. Further, investor attention changes in the
same direction and is stable before IPOs. In other words, the correlation of
bullishness differs between Phases 1 and 2. This result implies that if investors
20
ACCEPTED MANUSCRIPT
focus on the set offer price and own allocated shares in Phase 1, they are then
PT
6. Empirical results
RI
This section tests our three hypotheses. Hypothesis 1 predicts a positive
SC
relationship between investor sentiment in Phase 1 and the Price revision
U
dummy, and we test it by estimating the logistic regression in Equation (3). Table
AN
6 provides the results of logistic regression in Equation (3), in which we use IS in
are positive and statistically significant, and suggests that investor attention
D
positively relates to price revision. Column (2) shows that the coefficient of the
TE
(4) shows that the coefficient of the Ln (Number of messages) and Bullishness
C
index is positive and significant, and suggests that both investor attention and
AC
bullish investor sentiment positively affect price revision. These results support
Hypothesis 1 and imply that investor attention and bullish investor sentiment
motivates the issuing firm and underwriter to set higher offering prices.
predicts that investor sentiment positively relates to initial returns. The results of
Equation (4) are reported in Table 7. The OLS regression Equation (4) from
Panel A in Table 7 assigns the pre-IPO phase’s sentiment variables to IS, and
PT
that in Panel B inserts the sentiment variables in Phase 2 for IS. Columns (1)
and (4) of both Panel A and B show a positive and statistically significant
RI
relationship exists between Ln (Number of messages) and Initial return, and
SC
suggest that investor attention leads to high initial returns. Column (2) and (4) of
U
both Panel A and B reveal that the Bullishness index have a positive and
AN
statistically significant relationship with Initial return, and suggest that bullish
investor sentiment prior to IPO contributes to high initial returns. Column (3) of
M
both Panel A and B shows that the Agreement index significantly negatively
D
relates to Initial return, however, column (4) of both Panel A and B shows that
TE
estimate the regression in equation (4) using a sub-sample that satisfies the
criterion that the number of classified bull / bear messages are above the
C
median, we find that the coefficient of Agreement index is positive and the effect
AC
with Hypothesis 2 and suggest that investor attention and bullish investor
Table 8 reports the results of the OLS regression in Equation (5) in testing
variable is BHAR, which is computed by subtracting the BHR of the control firm
in the same industry and the closest market value at 21 days after the IPO date,
based on returns 250 or 500 days after IPO. 18 Panel A in Table 8 illustrates the
PT
results of Equation (5), in which the pre-IPO phase’s sentiment variables are
used for IS. Columns (1), (4), (5), and (8), of Panel A reveal that the coefficient
RI
of Ln (Number of messages) is negative and significant for both horizons: 250
SC
and 500 days after IPO. Similarly, Columns (1), (4), (5), and (8) of Panel B in
U
Table 8, in which the Phase 2 sentiment variables are used as IS, reveal a
AN
negative relationship between Ln (Number of messages) and the BHAR with
suggest that investor attention leads to low post-IPO stock returns. Additionally,
D
columns (2) and (4) of both Panels A and B in Table 8 indicate a positive
TE
relationship between the Bullishness index and the BHAR with a 250-day
EP
horizon. Columns (6) and (8) of Panel B show that the coefficient of the
Bullishness index is positive. However, columns (6) and (8) of Panel A show that
C
predicts post-IPO stock returns. Adding Initial return to the model obtains a
positive and statistically significant relationship between Initial return and the
caused by investor attention prior to the IPO, and the first trading day’s high
23
ACCEPTED MANUSCRIPT
closing price that is influenced by investor sentiment prior to the IPO.
PT
[Please insert Table7]
RI
SC
7. Conclusion
U
AN
This study investigated whether pre-IPO investor sentiment on Internet stock
message boards relates to the IPO puzzles of high initial returns and low
M
post-IPO stock returns. Investor sentiment for each issuing firm was measured
D
sentiment positively relate to the likelihood that IPO firms set their offer price at
the filing range’s maximum point. Second, high investor attention and bullish
C
lead to determining the offer price at the filing range’s maximum point and
increase the first trading price; and that the trading price pushed up by pre-IPO
investor sentiment then falls. Our evidence is the first empirical demonstration of
24
ACCEPTED MANUSCRIPT
this relationship between investor sentiment on Internet stock message boards
and IPO puzzles. Our results suggest that the extent of initial returns and
This study focused on the static relationship between investor sentiment and
PT
stock price behavior after IPOs, but this analysis could be dynamically extended
RI
SVM price trends’ classification performance, which significantly outperformed
SC
the buy-and-hold strategy. A combination of this and continuously measured
U
investor sentiment could detect dynamic changes in the relationship between
AN
investor sentiment and price trends after IPOs.
M
References
D
Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information
TE
1259-1294.
Barber, B. M., & Lyon, J. D. (1997). Detecting long-run abnormal stock returns:
C
Benveniste, L. M., & Spindt, P. A. (1989). How investment bankers determine the
24(2), 343-361.
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market.
25
ACCEPTED MANUSCRIPT
Journal of Computational Science, 2(1), 1-8.
PT
1791-1821.
Caputo, B., Sim, K., Furesjo, F., & Smola, A. (2002). Appearance-based object
RI
recognition using SVMs: Which kernel should I use? Proc of NIPS Workshop
SC
on Statistical Methods for Computational Experiments in Visual Processing
U
and Computer Vision. AN
Carter, R. B., Dark, F. H., & Singh, A. K. (1998). Underwriter reputation, initial
53(1), 285-311.
D
Chan, Y. (2014). How does retail sentiment affect IPO returns? Evidence from
TE
29, 235-248.
Cornelli, F., Goldreich, D., & Ljungqvist, A. (2006). Investor sentiment and
C
Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes
Da, Z., Engelberg, J., & Gao, P. (2011). In search of attention. Journal of
Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from
26
ACCEPTED MANUSCRIPT
small talk on the web. Management Science, 53(9), 1375-1388.
Derrien, F. (2005). IPO pricing in “hot” market conditions: Who leaves money on
Dorn, D. (2009). Does sentiment drive the retail demand for IPOs?. Journal of
PT
Financial and Quantitative Analysis, 44(1), 85-108.
Fang, L., & Peress, J. (2009). Media coverage and the cross-section of stock
RI
returns. Journal of Finance, 64(5), 2023-2052.
SC
Hanley, K. W. (1993). The underpricing of initial public offerings and the partial
U
adjustment phenomenon. Journal of Financial Economics, 34(2), 231-250.
AN
Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content
Jenkinson, T., & Ljungqvist, A. (2001). Going public: The theory and evidence on
D
how companies raise equity finance. New York: Oxford University Press.
TE
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). Kernlab - An S4
EP
1-20.
C
Li, X., & Shi, Z. (2002). Innovating web page classification through reducing
AC
Ljungqvist, A., & Wilhelm, W. J. (2003). IPO pricing in the dot-com bubble.
Ljungqvist, A., Nanda, V., & Singh, R. (2006). Hot markets, investor sentiment,
1823-1850.
Maruyama, K., Umehara, E., Suwa, H., & Ota, T. (2008). Relationship between
PT
Internet message board content and stock markets. Securities Analysts
RI
Mori, S., & Neubig, G. (2014). Language resource addition: dictionary or
SC
corpus?. Proceedings of the Ninth International Conference on Language
U
Resource and Evaluation. AN
Okada, K., Yamasaki, T., Sakakibara, S., & Yamasaki, T. (2013). Stock market
Ritter, J. R., & Welch, I. (2002). A review of IPO activity, pricing, and allocations.
Springer-Verlag.
Zhang, Y., & Swanson, P. E. (2010). Are day traders bias free? Evidence from
PT
Internet stock message boards. Journal of Economics and Finance, 34(1),
96-112.
RI
SC
Acknowledgements
U
We are grateful to the Editor, Professor Carl R. Chen and two anonymous
AN
referees for their helpful comments. We thank participants of Finance Camp
of this paper. This work was supported by JSPS KAKENHI, Grant Number
D
26885065 for the first author, and JSPS KAKENHI, Grant Number 23330108 for
TE
1
Bollen, Mao, and Zeng (2011) measure public mood from tweets to predict the stock
market. However, Twitter does not separate the measurement of investor sentiment for
AC
each firm.
2
Maruyama et al. (2008) extract investor sentiment from YJF message boards and
indicate a correlation between investor bullishness and daily returns. Okada et al.
(2013) extract sentiment from Nikkei newspapers to discover sentiment seasonality in
Japan.
3
Each user is identified by the first three characters of the provided user ID. We treat
cases in which one user possesses two user IDs as two different users.
4
Both investors and consumers search by company name using the Google search
engine. The measurement of investor attention based on a search volume index in
Google may therefore include not only investor attention, but also consumer interest.
5
The investors in our sample posted the first message on the message board an
average of 27 days before the IPO date.
6
The “partial adjustment phenomenon” proposed by Benveniste and Spindt (1989)
29
ACCEPTED MANUSCRIPT
and Hanley (1993) suggests that underwriters would compensate investors who offer
truthful information by allocating them more shares, and shares with high initial returns.
These authors indicated that the offer prices or price revisions positively relate to
initial returns.
7
We randomly selected 100 messages from each year between 2001 and 2010.
8
Removal of “do," "be," and "become" from our keywords did not significantly change
our results.
9
Brav and Gompers (1997) demonstrate that VC backed IPOs outperform non-VC
backed IPOs.
10
Carter, Dark, and Singh (1998) find that IPOs managed by more prestigious
PT
underwriters have less initial returns and post-IPO underperformance.
11
It has been documented that the post-IPO stock returns from issuing firms in the
second-board stock market is lower than that from issuing firms in the main market.
12
Ljungqvist and W ilhelm (2003) demonstrate that initial returns decrease with the
RI
secondary sales of managers and venture capital.
13
Although Barber and Lyon (1997) indicate that an appropriate control firm is similar
in size (market value) and book-to-market, we cannot obtain an accurate
SC
book-to-market before IPO.
14
In cases where some IPO firms have no classified bull / bear messages in phase 1,
we cannot obtain an Agreement index, and thus exclude this sample from the analysis.
15
We obtained results similar to Table 6 when we estimated the logistic regressions
U
for Equation (3); the first model used investor sentiment in the book-building period
instead of Phase 1, and the second model uses a binomial variable for the dependent
AN
variable, which equals one if the offer price is determined to be greater than the
midpoint of the filing range, and zero otherwise.
16
We obtain results similar to Table 7 by estimating Equation (4), in which the
dependent variables are the closing prices for the five trading days after the IPO date,
M
18
We obtain similar results when we use the TOPIX and JASDAQ indexes and the
industry portfolio as alternatives to the control firm to control for BHAR, and when
TE
BHAR is measured 125 or 750 days after the IPOs. Moreover, we verify the results by
winsorizing all the variables at the 1st and 99th percentiles.
19
W hen we control the BHR of a control firm with closest market value or reference
portfolio, we find a statistically significant negative relationship between Ln (Number of
EP
30
ACCEPTED MANUSCRIPT
Table 1: Example of message data
PT
2 番目 当 たれ〜!!
No.3 2007/2/26 20:57 tre
(Second) (I want IPO shares!!)
RI
No.4 2007/3/1 16:47 sai 4 444
USJよりもこっちが正 解 !!!!当 た
SC
ってー!!!
No.5 2007/3/1 21:53 kim 5
(This is better than USJ!!!! I want
it!!!)
欲 しいなぁ。。。チョイと他 の BB が終
U
わるまで資 金 不 足 。マネは使 い勝 手
AN
が悪 い。。。
No.6 2007/3/2 21:53 tka 6?7? (I want it… My fund is insufficient
until the other IPO ends. My
M
security company is
inconvenient...)
English translations of the original Japanese texts are shown in parentheses.
D
Since the No.1 message is posted by Yahoo! Japan, the No.2 message is
considered the first message.
TE
C EP
AC
ACCEPTED MANUSCRIPT
Table 2: Frequency of keywords in bullish and bearish messages
within training data
Phase 1 Phase 2
Bullish Bearish Bullish Bearish
messages messages messages messages
Ra Freq- Freq- Freq- Freq-
Word Word Word Word
nk uenc y uenc y uenc y uenc y
1 する (do) 87 する (do) 127 する (do) 113 する (do) 123
PT
初 値 (first
2 いる (be) 37 いる (be) 47 ある (be) 51 trading 51
price)
RI
3 買 う (buy) 29 ある (be) 31 いる (be) 51 いる (be) 47
初 値 (first
なる なる
4 倍 (times) 23 27 trading 49 36
SC
(become) (become)
price)
公募割れ
(first trading
U
なる ない なる
5 20 25 43 price below 35
(become) (negation) (become)
AN
the offer
price)
会社 ない
6 ある (be) 18 19 倍 (times) 38 31
M
(firm) (negation)
会社 思う
7 14 買 う (buy) 19 36 買 う (buy) 27
(firm) (expect)
D
初値
(first
TE
高い
期待 (relatively ない
9 12 18 24 ある (be) 24
(expect) expensive (negation)
C
)
公募割れ
AC
(first trading
いい 銘柄
10 11 price below 17 売 る (sell) 23 24
(nice) (firm)
the offer
price)
English translations of the original Japanese texts are shown in parentheses.
2
ACCEPTED MANUSCRIPT
Table 3: SVM classification accuracy within the training data
Phase 1 Phase 2
Classification by SVM
Bearish Neutral Bullish Bearish Neutral Bullish
Bearish 121 2 0 134 4 1
Manual
Neutral 0 777 1 2 717 0
Classification
Bullish 0 13 86 1 11 130
PT
This table provides comparative results of manual classification versus
classification by SVM within the training data.
RI
U SC
AN
M
D
TE
C EP
AC
3
ACCEPTED MANUSCRIPT
Table 4: Summary statistics
Panel A: IPO and issuing firm characteristics
Mean Median SD 1 s t Qu antile 3 t h Quantil e
PT
500-day BHAR (%) 0.820 -26.519 179.827 -64.793 27.517
SSR (%) 8.567 7.792 6.899 4.281 11.581
RI
Panel B: Message board statistics and investor sentiment
Mean Median SD 1 s t Qu antile 3 t h Quantil e
SC
Pre-IPO phase 198.4 116.0 264.6 67.0 214.8
Number of
Phase 1 45.3 27.0 63.0 16.0 49.0
messages
Phase 2 153.1 88.0 216.5 46.0 169.0
U
Pre-IPO phase 4.838 4.754 0.905 4.205 5.369
Ln (Number of
AN
Phase 1 3.380 3.296 0.874 2.773 3.892
messages)
Phase 2 4.504 4.477 1.004 3.829 5.130
Pre-IPO phase 0.038 0.041 0.561 -0.288 0.367
Bullishness
M
Agreement
Phase 1 0.556 0.500 0.396 0.200 1.000
index
TE
the natural logarithm of firm age (AGE); the offer price is normalized by the midpoint of
the filing range (Price revision); the closing price on the first trading day is normalized
by the offer price minus one (Initial return); IPO firms’ return 250 or 500 days after IPO
C
subtracted by those from a control firm in the same industry and with the closest
market value at 21 days after IPO date (BHAR); and secondary sales shares are
AC
divided by outstanding shares (SSR). Panel B shows statistics of the message boards
and investor sentiment proxies.
4
ACCEPTED MANUSCRIPT
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
(1) Ln (Number *** *** *** *** * ** ***
-0.062 -0.412 0.643 -0.001 -0.242 0.282 -0.066 -0.062 -0.095 -0.259
Pha of messages)
PT
(2) se 1 Bullishness -0.077 **
-0.092 **
-0.016 0.063 -0.021 -0.028 -0.014 -0.167 ***
-0.084 **
-0.054
(3) Agreement -0.436 ***
-0.075 *
-0.297 ***
0.027 0.181 ***
-0.070 *
0.025 -0.052 0.034 0.116 ***
RI
(4) Ln (Number *** *** *** *** ** *** ***
0.676 -0.021 -0.320 0.038 -0.381 0.264 0.015 -0.083 -0.115 -0.304
Pha of messages)
SC
(5) se 2 Bullishness 0.006 0.064 0.020 0.062 0.171 ***
0.090 **
-0.028 -0.061 0.017 0.058
(6) Agreement -0.275 ***
-0.036 0.182 ***
-0.445 ***
0.081 **
-0.138 ***
0.023 0.005 0.053 0.118 ***
(7) Market
U
*** * *** ** ***
0.281 -0.035 -0.076 0.260 0.095 -0.148 0.010 -0.036 0.002 -0.032
condition
AN
(8) SSR -0.039 -0.047 0.019 0.044 -0.011 0.000 0.000 0.184 *** -0.028 0.217 ***
(9) AST 0.056 -0.161 ***
-0.073 *
0.053 -0.069 *
-0.005 -0.015 0.232 ***
0.467 ***
0.569 ***
(10) DEBT -0.090 ** -0.087 ** -0.107 *** -0.004 0.403 *** 0.195 ***
M
0.036 0.047 0.020 -0.017
(11) AGE -0.257 *** -0.033 0.120 *** -0.302 *** 0.042 0.148 *** -0.046 0.187 *** 0.502 *** 0.220 ***
This table indicates the Pearson correlations (lower triangle) and the Spearman correlations (upper triangle); ***, **, and * indicate significance at the 1%, 5%, and 10%
D
levels, respectively.
TE
C EP
AC
5
ACCEPTED MANUSCRIPT
Table 6: Investor sentiment in Phase 1 and price revision
Dependent variable: Price revision dummy
(1) (2) (3) (4)
IS(investor sentiment proxies) Phase 1
Ln (Number 0.868 *** 0.950 ***
of messages) (3.423) (2.960)
Bullishness 0.384 * 0.412 *
index (1.891) (1.868)
Agreement -0.323 0.355
PT
index (-0.724) (0.721)
RI
Market 0.070 *** 0.087 *** 0.073 *** 0.062 ***
condition (4.150) (5.388) (4.205) (3.367)
SSR -0.055 * -0.080 *** -0.069 ** -0.048
SC
(-1.742) (-2.639) (-2.055) (-1.385)
AST -0.475 ** -0.552 ** -0.651 *** -0.562 **
(-2.093) (-2.499) (-2.702) (-2.247)
U
DEBT 1.037 1.127 0.945 1.187
AN
(1.268) (1.403) (1.053) (1.270)
AGE 0.263 0.199 0.298 0.317
(1.049) (0.796) (1.118) (1.176)
M
dependent variable is the Price revision dummy, which equals one if the offer price is
AC
determined at the filing range’s maximum point and zero otherwise. The independent
variables are the investor sentiment proxies (IS); the return of the reference portfolio,
constructed of firms that belong to the same industry as the IPO firms over the 60 days
prior to the IPOs (Market condition); secondary sales shares divided by outstanding
shares (SSR); the natural logarithm of total assets (AST); debt divided by total assets
(DEBT); and the natural logarithm of firm age (AGE); a VC backed dummy that
assumes a value of one if the firm is VC backed, and zero otherwise; an Underwriter
dummy, which equals one if the lead underwriter includes “Nomura,” “Daiwa,” or
“Nikko,” and zero otherwise; and the natural logarithm of the offering price multiplied
by the number of shares offered (Gross proceeds). Our models include an industry
6
ACCEPTED MANUSCRIPT
dummy, year dummy, and market dummy. We report z-statistics in parentheses. ***, **,
and * indicate significance at the 1%, 5%, and 10% levels, respectively.
PT
RI
U SC
AN
M
D
TE
C EP
AC
7
ACCEPTED MANUSCRIPT
Table 7: Investor Sentiment and Initial Returns
Panel A: Investor sentiment before IPOs and initial returns
Dependent variable: Initial return
(1) (2) (3) (4)
IS (investor sentiment proxies) pre-IPO phase
PT
index (4.967) (4.123)
Agreement -28.992 ** 2.481
index (-2.125) (0.181)
RI
0.138 0.860 0.893 0.126
Price revision
(0.207) (1.353) (1.379) (0.192)
SC
Market 1.276 *** 1.546 *** 1.566 *** 1.256 ***
condition (4.417) (5.424) (5.487) (4.346)
SSR -0.362 -1.250 ** -1.170 * -0.428
(-0.584)
U
(-2.048) (-1.893) (-0.694)
AN
AST -7.725 * -8.206 * -9.003 ** -6.751
(-1.831) (-1.880) (-2.034) (-1.612)
DEBT -24.844 -34.150 * -33.817 * -25.825
M
8
ACCEPTED MANUSCRIPT
Panel B: Investor sentiment in Phase 2 and initial returns
Dependent variable: Initial return
(1) (2) (3) (4)
IS (investor sentiment proxies) phase 2
PT
Agreemen -27.595 ** 5.377
t index (-2.145) (0.411)
RI
Price revision 0.231 0.900 0.926 0.284
(0.346) (1.401) (1.419) (0.429)
Market 1.374 *** 1.498 *** 1.596 *** 1.322 ***
SC
condition (4.825) (5.233) (5.604) (4.635)
SSR -0.504 -1.246 ** -1.110 * -0.541
(-0.812) (-2.021) (-1.772) (-0.866)
AST -7.255 *
U
-8.809 ** -9.217 ** -7.315 *
AN
(-1.702) (-2.015) (-2.052) (-1.713)
DEBT -26.827 -34.643 * -32.618 * -28.071
(-1.501) (-1.920) (-1.766) (-1.583)
M
This table presents the results of the OLS regression in Equation (4). The dependent
variable is the closing price on the first trading day normalized by the offer price minus
one (Initial returns). The independent variables are the investor sentiment proxies (IS);
the offer price normalized by the midpoint of the filing range (Price revision); the
returns from the reference portfolio constructed from firms that belong to the same
industry as the IPO firms over the 60 days prior to the IPOs (Market condition);
secondary sales shares divided by outstanding shares (SSR); the natural logarithm of
total assets (AST); debt divided by total assets (DEBT); the natural logarithm of firm
age (AGE); a VC backed dummy that assumes a value of one if the firm is VC backed,
9
ACCEPTED MANUSCRIPT
and zero otherwise; an Underwriter dummy, which equals one if the lead under writer
includes “Nomura,” “Daiwa,” or “Nikko,” and zero otherwise; and the natural logarithm
of the offering price multiplied by the number of shares offered (Gross proceeds). Our
models include an industry dummy, year dummy, and market dummy. The lower step of
each row indicates the t-statistics adjusted for heteroscedasticity errors (W hite, 1980).
***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively.
PT
RI
U SC
AN
M
D
TE
C EP
AC
10
ACCEPTED MANUSCRIPT
PT
Ln (Number -10.862 ** -12.441 ** -15.516 ** -13.855 *
of messages) (-2.185) (-2.392) (-2.095) (-1.722)
RI
Bullishness 6.095 7.648 -3.675 -1.945
index (1.154) (1.410) (-0.372) (-0.192)
SC
Agreement 1.889 -9.655 29.382 16.527
index (0.122) (-0.615) (1.183) (0.631)
U
AST 7.603 8.703 * 8.238 * 8.219 * 16.486 * 17.181 * 16.917 * 16.156 *
AN
(1.603) (1.814) (1.728) (1.700) (1.788) (1.924) (1.881) (1.812)
DEBT 28.261 * 29.256 * 29.637 * 27.471 * -4.835 -2.701 -2.445 -4.254
(1.711) (1.765) (1.783) (1.660) (-0.174) (-0.098) (-0.089) (-0.155)
M
AGE -24.777 *** -23.012 *** -22.500 *** -25.642 *** -40.131 *** -36.510 *** -37.293 *** -39.863 ***
(-3.604) (-3.398) (-3.379) (-3.671) (-3.671) (-3.326) (-3.464) (-3.562)
D
VC backed 4.137 2.341 2.389 4.051 -0.060 -2.682 -1.490 0.313
dummy (0.571) (0.327) (0.333) (0.558) (-0.005) (-0.230) (-0.126) (0.027)
TE
Underwriter 0.210 0.647 1.180 -0.522 -1.332 0.417 -0.244 -1.188
dummy (0.026) (0.081) (0.146) (-0.067) (-0.109) (0.034) (-0.020) (-0.096)
EP
0.513 -3.997 -4.138 1.070 -5.607 -12.499 * -11.124 -5.703
Gross proceeds
(0.111) (-0.972) (-0.960) (0.228) (-0.667) (-1.680) (-1.533) (-0.663)
Intercept 22.316 53.637 57.687 18.858 163.699 220.372 * 188.470 155.309
C
11
ACCEPTED MANUSCRIPT
PT
of messages) (-1.461) (-1.599) (-1.385) (-1.323)
Bullishness 9.258 * 10.483 * 4.235 5.859
RI
index (1.726) (1.961) (0.484) (0.667)
Agreement 13.780 2.406 12.828 -0.202
SC
index (0.940) (0.159) (0.557) (-0.008)
U
(1.614) (1.856) (1.689) (1.721) (1.803) (1.949) (1.839) (1.824)
AN
DEBT 28.834 * 28.539 * 29.181 * 27.346 -4.044 -3.401 -3.630 -5.165
(1.745) (1.714) (1.746) (1.635) (-0.146) (-0.122) (-0.130) (-0.183)
AGE -23.973 *** -23.266 *** -22.749 *** -25.263 *** -39.038 *** -37.200 *** -37.021 *** -39.640 ***
M
(-3.544) (-3.426) (-3.375) (-3.642) (-3.653) (-3.369) (-3.402) (-3.614)
VC backed 3.324 2.154 2.791 3.557 -1.185 -2.738 -1.961 -0.786
D
dummy (0.464) (0.303) (0.388) (0.494) (-0.102) (-0.235) (-0.166) (-0.067)
Underwriter 0.520 0.501 1.016 -0.849 -0.914 -0.237 0.662 -1.152
TE
dummy (0.065) (0.062) (0.124) (-0.107) (-0.075) (-0.019) (0.053) (-0.094)
-1.048 -3.894 -3.637 -0.022 -7.719 -12.217 * -11.783 -7.308
Gross proceeds
EP
(-0.220) (-0.942) (-0.835) (-0.005) (-0.923) (-1.651) (-1.500) (-0.867)
Intercept 28.568 51.057 45.070 10.532 171.484 212.964 * 203.129 163.769
(0.402) (0.746) (0.626) (0.143) (1.386) (1.804) (1.423) (1.284)
C
2
Adj. R 0.127 0.127 0.124 0.127 0.130 0.129 0.128 0.127
AC
12
ACCEPTED MANUSCRIPT
and the natural logarithm of the offering price multiplied by the number of shares offered (Gross proceeds). Our models include an industry dummy, year dummy, and
market dummy. The lower-step of each row indicates the t-statistics adjusted for heteroscedasticity errors (White, 1980). ***, **, and * indicate significance at the 1%, 5%,
and 10% levels, respectively.
PT
RI
U SC
AN
M
D
TE
C EP
AC
13
ACCEPTED MANUSCRIPT
PT
RI
SC
The No.2 message posted to the thread is considered the first message of the
series, as the No.1 message is posted by Yahoo! Japan.
U
AN
M
D
TE
C EP
AC
ACCEPTED MANUSCRIPT
PT
This is the first study using text data on message boards to analyze IPO behavior.
RI
messages.
SC
Pre-IPO investor sentiment is measured based on categorized messages on each
firm’s thread.
U
We find excessive optimism leads to the high initial returns and long-run
AN
underperformance.
M
D
TE
C EP
AC