Research Methodology

Data sources, scoring models, signal definitions, and valuation assumptions behind every tool on The AI Strategist.

This page documents the data sources, models, and assumptions behind every analytical tool published on The AI Strategist. All tools draw on public data: SEC filings, market feeds, and reported financials. Where approximations are necessary, they are stated explicitly. Last updated: 2026-03-19.

1. AI Adoption Intelligence

AI Adoption Intelligence scores how publicly traded companies disclose and deploy artificial intelligence, measured from annual SEC 10-K filings filed with the SEC EDGAR system. The scoring pipeline runs monthly and covers filings from 2020 to the present.

1.1 AI Adoption Score

The AI Adoption Score is a composite metric on a 0–100 scale:

ai_adoption_score = keyword_score × 0.30
                  + section_score × 0.20
                  + classification_score × 0.35
                  + breadth_score × 0.15

Component	Formula	Rationale
keyword_score	min(100, keyword_density × 5)	Density = keyword matches per 10,000 words. 20 mentions per 10k words yields a maximum score. Linear scaling.
section_score	min(100, Σ(section_weight) × 20)	AI mentioned in strategic sections (Business = 1.5, MD&A = 1.3) is weighted above defensive sections (Risk Factors = 0.7, Financials = 0.5).
classification_score	min(100, avg_category_weight × avg_commitment × 10)	An LLM classifies each AI-related passage by intent: strategic investment (2.0), operational deployment (1.5), exploratory mention (0.8), risk disclosure (0.5). Commitment is rated 1–5.
breadth_score	min(100, unique_keywords × 10)	Rewards diversity of AI terminology. A company mentioning 10 distinct AI terms reaches the maximum.

Weight rationale. Classification carries the highest weight (35%) because it captures intent: whether a company is investing in AI or merely disclosing it as a risk factor. Keywords (30%) measure raw signal strength. Section placement (20%) distinguishes strategic from defensive positioning. Breadth (15%) rewards firms engaged across multiple AI domains.

Keyword list (28 terms). artificial intelligence, machine learning, deep learning, neural network, natural language processing, NLP, large language model, LLM, GPT, generative AI, automation, robotics, computer vision, predictive analytics, data science, AI-powered, algorithm, cognitive, digital transformation, intelligent automation, foundation model, reinforcement learning, chatbot, transformer, RPA, robotic process automation, AI-enabled, AI-driven.

Fiscal year derivation. 10-K filings made in Q1 (January through March) are assigned to the prior fiscal year. This convention holds for most companies but may misalign for those with non-calendar fiscal year-ends.

1.2 Coverage Universe

Tier	Description	Count
Benchmark	Fortune 100 firms, used as sector-level peer context	~114
Coverage	Mid-market firms ($1B–$20B market cap), selected as active research targets	~30

Coverage companies were selected within Industrials and Financials based on market capitalisation, SEC filing availability, and relevance to AI-driven transformation (defence technology, industrial automation, fintech platforms, insurtech, PE fund infrastructure).

To add a company, append a row to comparables_universe.csv with tier=coverage. The company is scored automatically on the next pipeline run.

Known exclusions. WBA (Walgreens, no SEC CIK match); Q2 Holdings (SEC ticker is QTWO); one Dell filing failed to download.

1.3 Limitations

Keyword false positives. Terms such as “AI”, “transformer”, and “automation” may match non-AI contexts in industrial and energy filings. Word-boundary regex and section context mitigate but do not eliminate this.
LLM classification is optional. In default --no-llm mode, the classification score defaults to the exploratory-mention weight for all passages, underweighting companies with strong strategic AI commitments.
Survivorship bias. The universe reflects the current Fortune 100 snapshot. Companies that have dropped out of the index are not tracked.
Non-calendar fiscal years. The fiscal year derivation may misalign for retailers and others with January or February fiscal year-ends.
Small sector samples. Real Estate (7 companies) produces noisy percentile ranks. The top 20% threshold yields only one or two firms.

2. Deal Screener

The Deal Screener generates five investor-focused signals from the AI Adoption Intelligence pipeline. Signals are computed on the most recent 10-K filing per company and refreshed each time the scoring pipeline runs.

2.1 Signal Definitions

Signal	Criterion	Investor interpretation
AI_LEADER	AI Adoption Score at or above the sector 80th percentile	Established AI differentiation relative to sector peers
AI_MOMENTUM	Year-over-year score change at or above the sector 75th percentile	AI investment accelerating; potential emerging moat
AI_LAGGARD	AI Adoption Score at or below the sector 25th percentile	Transformation risk or acquisition opportunity
VALUE_PLUS_AI	AI_MOMENTUM flag + EV/EBITDA below the sector median	Under-appreciated AI adopter trading at a discount to peers
RISING_STAR	Coverage-tier company with a year-over-year gain of 10 points or more	Mid-market company catching up to benchmark peers

2.2 Threshold Rationale

The Leader threshold (80th percentile) uses a top-quintile screen, a standard institutional cut-off that identifies genuine differentiation rather than marginal differences.

The Momentum threshold (75th percentile of year-over-year change) captures companies actively increasing their AI investment, not simply those with historically high scores.

The Laggard threshold (25th percentile) flags the bottom quartile, which may represent either transformation risk for long holders or catch-up acquisition targets for PE buyers.

The Value + AI signal combines momentum with a below-median valuation. This is the core PE and long-biased equity signal: an improving AI profile in a company the market has not yet repriced.

The Rising Star threshold (10-point absolute gain) implies roughly a 1.5–2x increase in keyword density or a meaningful LLM classification upgrade. It is restricted to the mid-market coverage universe, where alpha potential is higher.

2.3 Sector Percentile Computation

All percentiles are computed within the latest fiscal year only, controlling for sector-level differences and time trends. This means a company’s AI_LEADER status reflects its position relative to current sector peers, not the full historical panel.

For small sectors such as Real Estate (7 companies), the top 20% threshold produces only one or two leaders. Percentile ranks in these sectors should be interpreted with caution.

3. Market Comps

Market Comps provides daily-refreshed valuation multiples and operating metrics for every company in the coverage universe, organised by sector. The tool is designed for comparable-company analysis: benchmarking a target against its publicly traded peers.

3.1 Data Sources

Source	Data	Refresh cadence
yfinance API	Share price, market capitalisation, shares outstanding, enterprise value, P/E, EV/EBITDA, EV/Revenue, margins, FCF yield	Daily
yfinance quarterly financials	Revenue by quarter (trailing actuals + analyst estimates)	Daily
SEC EDGAR	10-K filing metadata, CIK identifiers	As filed

3.2 Valuation Calculations

Current-year multiples use the real-time snapshot from yfinance (market cap, enterprise value, trailing and forward P/E, EV/EBITDA, EV/Revenue).

Historical multiples (2020–2024) are approximated as follows:

estimated_market_cap = year_end_closing_price × current_shares_outstanding
estimated_EV         = estimated_market_cap × 1.1
ev_to_ebitda         = estimated_EV / annual_EBITDA
ev_to_revenue        = estimated_EV / annual_revenue
trailing_pe          = estimated_market_cap / annual_net_income

Assumptions:

Shares outstanding is held constant at the current figure. This does not account for historical buybacks or dilution.
Enterprise value is approximated as 1.1x market capitalisation, assuming net debt of roughly 10% of market cap. This understates EV for highly leveraged firms and overstates it for cash-rich technology companies.
Year-end price is the closing price on the last trading day of the calendar year, which may not align with non-December fiscal year-ends.

3.3 Revenue Forecasting

The Market Comps tool generates simple revenue forecasts using trailing quarterly growth rates projected forward:

Base case: trailing quarterly revenue growth rate applied to the next two quarters.
Bull case: 1.5x the trailing growth rate.
Bear case: 0.5x the trailing growth rate.

These are mechanical projections, not analyst consensus estimates. They indicate the range of outcomes implied by recent momentum, not a forecast of fundamental performance.

Implied CAGRs compare the historical two-quarter growth trajectory against the forecast trajectory to flag acceleration or deceleration.

3.4 Outlier Detection

An outlier flag is applied to any company where a key valuation multiple (EV/EBITDA, trailing P/E, or EV/Revenue) falls outside 1.5 times the interquartile range (IQR) of its sector. This is the standard Tukey fence method. Outliers are displayed with a visual badge in the table but are included in median calculations.

3.5 Sector Pulse Commentary

Each sector table includes a short commentary block generated by an LLM from the latest sector data (median multiples, growth rates, notable movers). These commentaries are regenerated on each data refresh and stored in JSON files. They are descriptive summaries, not investment recommendations.

3.6 Limitations

EV approximation. The 1.1x multiplier is a rough heuristic. True enterprise value requires balance sheet data (market cap + total debt - cash and equivalents).
Constant shares outstanding. Historical buybacks, stock splits, and dilution from equity compensation are not reflected in historical market cap estimates.
Revenue forecasts are mechanical. They extrapolate recent trends and do not incorporate analyst estimates, guidance, or fundamental analysis.

4. Backtest Strategy

The AI Adoption Score can be backtested as a long/short signal.

Quintile long/short. Each year, rank all companies by AI Adoption Score. Go long the top quintile (Q5, equal-weighted) and short the bottom quintile (Q1, equal-weighted). The spread return is Q5 minus Q1, rebalanced annually.

Sector-neutral variant. Rank within each sector. Go long the top half of each sector, short the bottom half. This isolates the within-sector AI adoption effect from sector rotation.

Assumptions. Equal weighting within quintiles. No transaction costs, slippage, or shorting costs. No liquidity constraints. Annual rebalancing. Returns are illustrative; a real implementation would incur significant friction costs.

5. General Limitations

These limitations apply across all tools on the platform:

Public filings only. All analysis is based on what companies disclose in SEC filings and what is observable in market data. Private companies and non-US filers are not covered.
Survivorship bias. The Fortune 100 universe reflects today’s index composition. Companies that were previously in the index but have since been removed, acquired, or delisted are not in the historical panel.
Not investment advice. All tools, scores, and signals are for research and informational purposes. They do not constitute recommendations to buy, sell, or hold any security.

Source data: SEC EDGAR 10-K filings, yfinance market data, 2020 to present. AI scoring pipeline runs monthly. Market data refreshes daily.