What We Use AI For (and What We Don't): Inside ThetaLoop's Research Pipeline

ThetaLoop Research
Skip to content

AI in finance is everywhere right now. ChatGPT "stock picks" trend on consumer sites every week. Hedge funds announce AI-driven strategies. Anthropic, OpenAI, and Google have all rolled out finance-specific products in the last year.

Our position is unfashionable but simple: an LLM is a useful research aid and a poor signal generator. The rest of this article walks through what the engine actually computes, where Gemini contributes only prose, and why the line between "the engine decides" and "the AI explains" is built into the way ThetaLoop is wired.

The Short Answer

Three buckets, plain language:

  • Rule-based core — the X-Ray score (regime, momentum, volatility, candle patterns), the Theta Compass composite, the strike rules, the take-profit and circuit-breaker exits, and the track-record statistics on /tools. Same input → same output, every session.
  • Bounded AI use — Gemini drafts the macro paragraph attached to each daily research message and the daily morning briefing. Both are narrative summaries, not signals.
  • No AI involvement — which trades publish, what strike, what expiry, when to close. Those are decided by deterministic rules. Not a single one is generated by an LLM at render time.

You can verify the rule-based outputs yourself on the Theta Compass and the X-Ray. The methodology behind every score is documented at /learn/methodology.

What "Rule-Based" Actually Means

Testable, repeatable arithmetic. Inputs go in; the same number comes out. There is no judgment call in the middle, no model to retrain, no prompt to tune.

The historical record on systematic put-selling is unusually well documented because the strategy is mechanical. The Cboe S&P 500 PutWrite Index — selling monthly puts on the S&P 500 since 1986 — returned about 9.5% annualized over its first 32 years, at roughly two-thirds the volatility of the S&P 500 itself, with a worst drawdown of −33% versus −51% for the index. Independent backtests at Tastytrade and Spintwig come to compatible conclusions across thousands of trades and many configurations.

What unites these references is that anyone with the same rule and the same price tape gets the same answer. ThetaLoop's engine sits in that lineage: a fixed strike-selection rule, a fixed take-profit, a fixed circuit breaker, fixed indicators applied to fresh market data. Each session, the same code runs against the new tape. Code does not "drift." Market data moves; the formula does not.

How the Narrative Engine Works

The per-ticker explanations you read on every X-Ray page — the bottom-line paragraph, the technical breakdown — are not generated by a language model at render time. They come from a structured template system: pre-written variants for each branch of the score's underlying state, with a deterministic mapping that picks one. The same ticker always produces the same text. A score of 7.4 with a bullish regime always renders the same paragraph, whether you load the page now or next week or a search engine crawls it tomorrow.

That predictability is the point. Forecast vocabulary that regulators flag as off-limits in customer communications — "will," "expect," "target," "guarantee," "predict," "forecast" — is hard-coded out of the template library. A generative model can be prompted not to use those words; a fixed template structurally cannot use them. The explanation you see is the same explanation a regulator would see, the same one a search engine indexes, and the same one in our records six months from now.

Where AI Does Help (And Why It's Bounded)

The serious operators in this industry are explicit about the boundary. Morgan Stanley uses GPT-4 across 70,000+ proprietary research documents to help advisors find relevant material — and their Global Director of Research has stated she does not see a near-term path to letting the machine write the research call. Citadel's AI Assistant, in their own words, "does not decide what to buy or sell." BlackRock's Aladdin Copilot is contractually walled off from giving investment advice. Even Bridgewater, which markets the most aggressive AI-as-decision-maker frame in the industry, openly discloses that AI outputs "inevitably contain a degree of inaccuracy and error — potentially materially so."

The pattern across firms is consistent: LLMs handle summarization, synthesis, idea generation, and parsing unstructured text — not the trade decision. The reason is empirical. When researchers tested a leading frontier LLM with retrieval on real SEC-filing questions, it answered incorrectly or refused on roughly four out of five. Consumer-LLM benchmarks on everyday personal-finance questions tell a similar story.

ThetaLoop's bounded use of Gemini mirrors that posture. Gemini drafts the macro paragraph attached to each daily research message and the daily briefing — narrative summaries of regime context, what changed since yesterday, and how to read the day's setup. You can see the format on /preview. Gemini does not compute the X-Ray score, not size the trade, not select tickers, and not override the take-profit or circuit-breaker logic. Words are AI; numbers are code.

Why This Distinction Matters

Three reasons, in plain language.

1. Regulation. US broker-dealer rules — the same ones every legitimate options newsletter operates under — prohibit predicting performance, projecting returns, or framing past results as a forecast in customer communications. The regulator has confirmed that those rules apply equally to text written by humans and to text generated by AI. An LLM that confidently asserts a price target or a forward return is, from the regulator's point of view, the firm's communication regardless of which model produced it. A deterministic rule-based score is auditable: same inputs, same output, every time.

2. Reproducibility. Modern LLMs are non-deterministic by design. Even with temperature set to zero, frontier models often produce different outputs on identical inputs across reruns — researchers have documented accuracy gaps of more than 70 percentage points between repeated runs of the same prompt. OpenAI's own documentation states the API is "non-deterministic by default." That is fine for a chat assistant. It is not fine for a number that customers, search engines, and our own records all need to match six months from now.

3. Fitness for purpose. Frontier LLMs are fast and useful for summarizing a regime in plain English. They are also expensive at scale, slow compared to code, and weak at arithmetic — multiple peer-reviewed studies show LLM accuracy collapsing when math problems get even slightly harder. Using a frontier model to compute the same scores across 600 tickers every night would replace deterministic arithmetic that runs in milliseconds with non-deterministic arithmetic that costs more and gets things wrong more often. There is no upside.

When AI Is the Right Tool

The academic record on what LLMs actually do well is now mature. They are strong at summarization, explanation, classification, synthesis of unstructured text, and pattern recognition in language. They are weak at multi-digit arithmetic, time-series forecasting, calibrated probability, and run-to-run consistency. The most memorable recent illustration: a top Google model earned an IMO gold medal in 2025 — and reads analog clocks correctly only about half the time.

That gap maps almost perfectly onto where ThetaLoop uses Gemini and where it does not. Translating a regime score into plain English, narrating what changed in the macro tape, drafting the daily briefing — language tasks. Computing a Sharpe ratio, a confidence interval, a moving-average filter, the take-profit, or the circuit breaker — arithmetic. The right tool for each job is not a stylistic preference; it is what the data says.

The Bottom Line

ThetaLoop is rule-based research with AI confined to the explanation layer. Numbers are deterministic; language is bounded. That distinction is not aesthetic — it is shaped by how regulators treat firm communications, by a decade of LLM-stability research, and by how every serious institutional shop has actually deployed AI in production.

The track record on /tools shows what those rules produced — confidence intervals, Sharpe ratios, the equity curve, peak-to-trough drawdowns. The methodology behind every score is documented in plain English at /learn/methodology. If you want LLM-driven trade alerts, plenty of services offer them. ThetaLoop deliberately does not.

Frequently Asked
Do LLM-generated trading signals carry hallucination risk?
Yes — and the evidence is well-benchmarked. When researchers tested a leading frontier LLM with retrieval on real SEC filings, it answered incorrectly or refused on roughly four out of five questions. Independent reviews of consumer chatbots on personal-finance questions report error rates around one in three. The Financial Stability Board has formally defined hallucination as a model "providing a seemingly confident but inaccurate response," and US broker-dealer regulators flag the same risk for customer communications written or assisted by AI.
What makes rule-based research more reliable than LLM-generated for options?
Two reasons: reproducibility and regulation. Frontier LLMs are non-deterministic by design — even with temperature set to zero, repeated runs of the same prompt have produced accuracy gaps of more than 70 percentage points in published research, and OpenAI itself states its API is "non-deterministic by default." Meanwhile, US broker-dealer rules prohibit predicting or projecting performance in customer communications regardless of whether a human or a model wrote the words. A deterministic rule — same inputs, same output, every time — is auditable in a way an LLM number is not.
Can a rule-based score change without code changes?
Yes — because the data tape moves, not the engine. A rule is a fixed function of market data. When new closes, options chains, or volatility readings arrive, the score updates — even though the formula behind it is byte-identical. That is the standard distinction between a deterministic engine driven by changing market data and a non-deterministic model whose output varies even when the input does not.
Industry examples of AI used as research aid (not signal generator)?
Most top-tier firms have publicly bounded AI to a research-aid role. Morgan Stanley uses GPT-4 across 70,000+ proprietary research documents for advisors; their Global Director of Research has said she does not see a near-term path to letting the machine write the research call. BlackRock's Aladdin Copilot is contractually walled off from giving investment advice. Citadel's AI Assistant "does not decide what to buy or sell." Even Bridgewater's more aggressive AIA stack openly discloses that AI outputs "inevitably contain a degree of inaccuracy and error — potentially materially so."
Related Topics
How the X-Ray Score WorksCash-Secured PutsBull Put SpreadsTheta DecayThe VIX DecodedThe 200-Day Moving AverageOptions Greeks for Put SellersPosition Sizing for Put SellersRolling Cash-Secured PutsOptions AssignmentThe Wheel StrategyIV Crush and EarningsCovered CallsThe Cash-Secured Put Delta Cheat SheetHow Much Capital Do You Need to Sell Cash-Secured PutsEarnings Gap RepairDeep ITM Put (>12% ITM)Rolling DegradationAfter AssignmentWhen to Cut Losses
Put this knowledge to work
🧭 Theta Compass 🔬 X-Ray
Reading is a start. Acting on it is another thing.
Our daily research lists the CSP and BPS trades we would actually take today — with strikes, risk metrics, and Telegram alerts.
EXPLORE OUR RESEARCH — 14 DAYS FREE
📊 Full Research Track Record — All Wins AND Losses
CSP Track Record ↗ · Bull Put Spreads ↗ · See a sample →
These free tools show market conditions and individual scores.
Our daily research goes deeper — covering CSP strike analysis, defined-risk Bull Put Spread alerts, and full risk metrics delivered via Telegram.
EXPLORE OUR RESEARCH — 14 DAYS FREE