Professional Documents
Culture Documents
False Discovery Rate Control
False Discovery Rate Control
Introduction
Data Preprocessing
1. Initial Data Check: The dataset contained columns such as 'Ticker Symbol', 'Period Ending',
and 'Estimated Shares Outstanding', among others. Initial examination revealed mixed data types
and some missing values.
2. Handling Missing Values: The dataset had missing values across several columns. All rows
containing any missing values were dropped to ensure the analysis was conducted on complete
cases.
3. Column Removal: Columns not relevant to the analysis, including 'Ticker Symbol', 'Unnamed:
0', 'Period Ending', and 'Estimated Shares Outstanding', were dropped to focus on numerical
financial metrics.
1. Scatter Plots were generated to explore relationships between pairs such as R&D Expenses vs.
Earnings Per Share and Total Revenue vs. Net Income. These plots revealed no clear linear
relationship in the former and a non-proportional relationship in the latter case.
2. Histograms were used to analyze the distribution of 'Earnings Per Share' and 'Gross Margin',
indicating a skewed distribution for EPS and a multimodal distribution for Gross Margin.
3. Correlation Analysis with a heatmap highlighted the varying degrees of correlation between
different financial metrics.
1. Model Creation: A linear regression model was developed to predict 'Estimated Shares
Outstanding' using financial metrics as predictors. The model demonstrated an R-squared value
of 0.854, indicating a strong ability to explain variability in the dependent variable.
2. P-Value Analysis: A histogram of the p-values from the model indicated three main groups,
suggesting varying levels of significance among the predictors.
1. Benjamini-Hochberg Procedure: Applied at a q-value of 0.1, this method estimated the number
of true discoveries while controlling the false discovery rate (FDR). The analysis identified a
significant number of predictors that were statistically meaningful.
2. Sensitivity Analysis: Varying the q-values in the BH procedure showed changes in the number
of true discoveries, indicating the model's sensitivity to the threshold of significance.
3. FDR Analysis with Interaction Terms: Applying the BH procedure to the model with
interaction terms at a q-value of 0.1 identified a larger number of significant predictors,
reinforcing the value of including interaction terms in uncovering more complex relationships.
Conclusion
The analysis demonstrated the intricate relationships between financial metrics and their impact
on estimated shares outstanding. Including interaction terms significantly enhanced the model's
ability to capture these relationships, as evidenced by the increased number of true discoveries
and improved model fit metrics. However, the complexity of the enhanced model necessitates
careful interpretation and validation to avoid overfitting and ensure the generalizability of the
findings. This report underscores the importance of thorough data preprocessing, exploratory
analysis, and sophisticated modeling techniques in financial data analysis.