Skip to content

Quiz: Predictive Analytics and Market Intelligence

Test your understanding of predictive analytics, machine learning models, market microstructure, and production deployment for investor relations intelligence.


1. What distinguishes "predictive analytics" from "descriptive analytics" in IR?

  1. Predictive analytics forecasts future outcomes using historical patterns while descriptive analytics reports what happened
  2. Predictive analytics only analyzes stock prices while descriptive analytics covers all metrics
  3. Predictive analytics requires no data while descriptive analytics needs extensive databases
  4. Predictive analytics and descriptive analytics are exactly the same thing
Show Answer

The correct answer is A. Predictive analytics leverages historical patterns to forecast future outcomes (likely investor questions, probability of estimate revisions, expected volatility), while descriptive analytics focuses on backward-looking reporting (tracking earnings performance, documenting shareholder composition). Predictive capabilities enable proactive engagement rather than reactive firefighting. Option B mischaracterizes scope—predictive analytics applies broadly, not just to stock prices. Option C is backwards—prediction requires MORE data than description. Option D is incorrect—they represent different analytics maturity stages.

Concept Tested: Predictive Analytics, Predictive IR Analytics

Bloom's Level: Understand

See: Section 1: Analytics Maturity Journey


2. When engineering features for earnings surprise prediction, which type of feature captures comparisons to similar companies?

  1. Temporal features showing lagged values and rolling statistics
  2. Derived financial features calculating growth rates and margins
  3. Engagement features tracking investor meeting frequency
  4. Comparative features like P/E ratio versus sector median and performance rankings within industry
Show Answer

The correct answer is D. Comparative features create peer-relative metrics (P/E ratio vs. sector median, revenue growth percentile within industry, company sentiment minus sector average) that contextualize company performance against competitors. These features often have strong predictive power because investors make relative judgments. Option A describes temporal features (time-based). Option B describes derived financial features (calculated from financials). Option C describes engagement features (relationship-based).

Concept Tested: Feature Engineering IR

Bloom's Level: Remember

See: Section 2: Predictive Analytics Foundations


3. A gradient boosting model achieves 85% accuracy on the training set but only 62% on the test set. What problem does this indicate?

  1. The model performs perfectly and should be deployed immediately
  2. The training set is too small to learn patterns
  3. Overfitting—the model learned noise specific to training data rather than generalizable patterns
  4. The model is too simple and needs more complexity
Show Answer

The correct answer is C. The large gap between training accuracy (85%) and test accuracy (62%) indicates overfitting—the model learned patterns specific to the training data (including noise) that don't generalize to new data. Solutions include regularization, simpler models, more training data, or ensemble methods. Option A is dangerous—this model would fail in production. Option B is partially true but doesn't explain the train-test gap. Option D is backwards—overfitting suggests the model is too complex for the data, not too simple.

Concept Tested: Predictive Analytics, ML Model Calibration

Bloom's Level: Analyze

See: Section 8: Challenges and Best Practices


4. What is "data leakage" in predictive modeling and why is it problematic?

  1. When training data includes personally identifiable information
  2. When models consume too much memory during training
  3. When data storage systems have security vulnerabilities
  4. Using information in model training that wouldn't be available at prediction time, inflating apparent accuracy but failing in production
Show Answer

The correct answer is D. Data leakage occurs when training data includes information not available at prediction time (e.g., predicting Q4 ownership using Q4 stock returns unknown until after quarter-end). This inflates validation accuracy but causes production failure. Prevention requires strict temporal ordering and respecting information availability. Option A describes privacy issues, not leakage. Option B describes memory constraints. Option C describes security vulnerabilities, not the statistical concept of leakage.

Concept Tested: Predictive Analytics, Feature Engineering IR

Bloom's Level: Understand

See: Section 8: Challenges and Best Practices


5. Which model type is BEST suited for capturing long-range temporal dependencies in sequential quarterly data like investor engagement patterns?

  1. Simple linear regression with no temporal awareness
  2. LSTM (Long Short-Term Memory) neural networks designed for sequential data with memory of past events
  3. Random forest treating each quarter independently
  4. Logistic regression for binary classification only
Show Answer

The correct answer is B. LSTM (Long Short-Term Memory) networks are recurrent neural networks specifically designed to capture temporal dependencies across multiple time steps, making them ideal for sequences like quarterly engagement data where patterns like "investor engagement increases following margin improvements, followed by position initiation 2 quarters later" matter. Option A ignores time completely. Option C (random forest) treats observations independently, missing temporal patterns. Option D mischaracterizes logistic regression (can handle temporal features but isn't specialized for sequences).

Concept Tested: Deep Learning Forecasts, Neural Net Predictions

Bloom's Level: Apply

See: Section 2: Model Selection Framework


6. How do SHAP (SHapley Additive exPlanations) values help with model interpretability?

  1. SHAP values make models train faster by simplifying calculations
  2. SHAP values reduce the number of features needed for predictions
  3. SHAP values increase model accuracy through better optimization
  4. SHAP values assign each feature a contribution to individual predictions, enabling explanation of why the model made specific forecasts
Show Answer

The correct answer is D. SHAP values provide consistent, theoretically grounded explanations by assigning each feature a value representing its contribution to individual predictions. For example: "Model predicts Fidelity will increase position by 1.2%. Primary drivers: engagement frequency (+0.8%), positive sentiment (+0.5%), growth mandate alignment (+0.4%)." This explainability builds trust and enables strategic insights. Option A confuses interpretability with training speed. Option B describes feature selection, not interpretation. Option C describes optimization techniques, not explanation methods.

Concept Tested: Model Interpretability, Predictive IR Analytics

Bloom's Level: Understand

See: Section 5: Model Interpretability


7. What percentage of equity trading volume typically comes from algorithmic trading strategies?

  1. 60-70% of trading volume, with significant portions from high-frequency trading firms
  2. Less than 10% of trading volume
  3. Exactly 50% by regulatory requirement
  4. Algorithmic trading is prohibited in equity markets
Show Answer

The correct answer is A. Approximately 60-70% of equity trading volume comes from algorithmic strategies, with significant portions from high-frequency trading (HFT) firms. This reality impacts IR because algorithms respond rapidly to news sentiment, technical signals, and order flow imbalances—often moving stock prices before human investors process information. IR teams must account for this when timing disclosures and interpreting price movements. Option B dramatically understates algorithmic presence. Option C invents a non-existent regulatory requirement. Option D is false—algorithmic trading is legal and dominant.

Concept Tested: Algorithmic Trading Impact, Market Microstructure

Bloom's Level: Remember

See: Section 4: Market Microstructure


8. Your predictive model showed 74% accuracy historically but only 61% accuracy in the past 90 days. What is the most likely cause?

  1. The model is working better than ever and should be promoted
  2. Someone accidentally improved the model's performance
  3. Model drift—underlying data distributions or relationships have changed, requiring retraining or feature updates
  4. Historical accuracy was measured incorrectly and should be ignored
Show Answer

The correct answer is C. Declining accuracy from 74% to 61% indicates model drift—the underlying data distributions (data drift) or relationships between features and targets (concept drift) have changed. For example, feature distributions may have shifted (engagement frequency dropped), or investor behavior patterns may have evolved. Solutions include retraining with recent data, updating features, or collecting new data sources. Option A misreads declining performance as improvement. Option B is unlikely without intentional changes. Option D avoids addressing the real problem.

Concept Tested: Production Deployment, ML Model Calibration

Bloom's Level: Analyze

See: Section 6: Production Deployment and MLOps


9. When would an AutoML platform be most appropriate for IR predictive analytics?

  1. When you have only 50 data points and need a simple model
  2. When interpretability and full control over model architecture are the top priorities
  3. When you need to manually code every feature and hyperparameter
  4. When you have large datasets (10k+ samples), many potential features, and need rapid prototyping with ensemble methods
Show Answer

The correct answer is D. AutoML platforms excel with large datasets (10k+ samples), many potential features, and when ensemble methods are likely to outperform single custom models. They automate feature engineering, architecture search, and hyperparameter tuning—accelerating experimentation and establishing baselines. AutoML is valuable when data science resources are limited. Option A describes inadequate data for AutoML's automated approach. Option B identifies AutoML's weakness—less control and transparency. Option C is the opposite of AutoML's purpose (automation vs. manual coding).

Concept Tested: Predictive Analytics, Feature Engineering IR

Bloom's Level: Apply

See: Section 4: AutoML


10. What is "anomaly detection" used for in predictive IR analytics?

  1. Generating fake trading data for model training
  2. Identifying deviations from normal patterns like unusual trading volumes, information leakage, or block trades signaling sentiment shifts
  3. Automatically approving all regulatory filings without review
  4. Eliminating all data from analysis to simplify models
Show Answer

The correct answer is B. Anomaly detection algorithms (like Isolation Forest) identify deviations from normal behavior patterns, revealing unusual trading volumes before earnings (potential information leakage), block trades indicating institutional repositioning, algorithmic trading anomalies, or unusual options activity suggesting informed traders. These early warning signals enable proactive IR responses. Option A describes data synthesis, not anomaly detection. Option C dangerously suggests bypassing compliance. Option D confuses anomaly detection with data filtering.

Concept Tested: Anomaly Detection AI, Risk Assessment AI

Bloom's Level: Understand

See: Section 3: Early Warning Systems


11. How should IR teams translate predictive model outputs into strategic actions?

  1. Use predictions to optimize roadshow targeting, prioritize engagement, inform guidance strategy, and prepare crisis responses
  2. Ignore all predictions and rely solely on intuition
  3. Share all raw model outputs directly with investors
  4. Deploy models but never review their recommendations
Show Answer

The correct answer is A. Effective translation connects predictions to specific IR actions: roadshow optimization (prioritize high-probability investors), engagement prioritization (proactive outreach to at-risk holders), guidance strategy (model different guidance scenarios), and crisis preparation (respond to elevated risk signals). Predictions inform decisions but human judgment makes final strategic choices. Option B wastes analytical investment by ignoring insights. Option C could violate disclosure rules and confuse investors. Option D creates accountability gaps and missed errors.

Concept Tested: Roadshow Optimization, Predictive IR Analytics

Bloom's Level: Apply

See: Section 7: Translating Predictions into Strategy


12. What is the primary challenge of "high-frequency trading" (HFT) for IR professionals?

  1. HFT always improves market stability with no downsides
  2. HFT is illegal and should be reported to authorities
  3. HFT provides permanent, reliable liquidity at all times
  4. HFT liquidity can disappear during stress, amplifying volatility through flash crashes and exaggerated short-term price swings
Show Answer

The correct answer is D. While HFT firms provide liquidity under normal conditions, challenges include liquidity mirages (HFT liquidity disappears during stress, amplifying volatility), flash crashes (HFT algorithms interacting can cause extreme short-term dislocations), and exaggerated volatility from rapid algorithmic reactions. IR teams must understand these dynamics when timing disclosures and interpreting price movements. Option A ignores HFT's volatility risks. Option B is incorrect—HFT is legal. Option C overstates reliability—HFT liquidity is conditional and can vanish quickly.

Concept Tested: High-Frequency Trading, Market Microstructure

Bloom's Level: Analyze

See: Section 4: Market Microstructure


Quiz Statistics

  • Total Questions: 12
  • Bloom's Taxonomy Distribution:
    • Remember: 2 questions (17%)
    • Understand: 4 questions (33%)
    • Apply: 3 questions (25%)
    • Analyze: 3 questions (25%)
  • Answer Distribution:
    • A: 3 questions (25%)
    • B: 3 questions (25%)
    • C: 3 questions (25%)
    • D: 3 questions (25%)
  • Concepts Covered: 12 of 38 chapter concepts (32%)
  • Estimated Completion Time: 20-25 minutes

Next Steps

After completing this quiz:

  1. Review the Chapter Summary to reinforce predictive analytics concepts
  2. Work through the Chapter Exercises for hands-on feature engineering and model evaluation practice
  3. Proceed to Chapter 9: Personalized and Real-Time Engagement