AI models have impressively large context windows now. You can paste tens of thousands of words and the model will "read" all of it.
But reading and comprehending are different things. And "tell me about this document" is still one of the weakest prompts you can give, even with a perfect context window.
This lesson covers how to get genuinely useful results from long documents — reports, contracts, research papers, transcripts, code files, anything substantial.
The Problem With Long-Context Prompting
More context isn't always better. A few issues that emerge with large documents:
Attention dilution — Models tend to weight information at the beginning and end of a context more than the middle. Critical points buried in the middle of a 50-page document may get underweighted. This is sometimes called the "lost in the middle" problem.
Vague questions produce vague answers — "Summarize this" on a 100-page contract is almost as bad as no document at all. What do you need to know? For what purpose? At what detail level?
Hallucination under complexity — The more complex the document and the more open-ended the question, the higher the risk the model will fill gaps with plausible-sounding fabrications rather than admitting it doesn't know.
The solutions come down to one principle: be more specific about what you need, not just what you're providing.
Technique 1: Purpose-First Prompting
Always tell the model why you need information from this document. The purpose shapes what's relevant.
Instead of:
Here is our Q3 investor report. Summarize it.
Do this:
Here is our Q3 investor report. I'm preparing a 10-minute verbal briefing
for our board next week. They've already read the report.
Extract the 3 most important items they'll want to discuss — things that
require decisions, show significant variance from plan, or represent new
strategic developments. Skip anything that's routine or already expected.
The same document produces completely different output depending on why you need it.
Technique 2: Directed Extraction
Instead of asking the model to decide what matters, tell it exactly what to pull out.
Read the contract below and extract the following:
1. Key dates and deadlines (format as a table: Date | Milestone)
2. Payment terms — amounts, schedule, late payment penalties
3. Termination clauses — under what conditions can either party exit?
4. IP ownership — who owns what is created under this agreement?
5. Any clauses that seem unusual or warrant legal attention
If information for any of these isn't in the document, say so explicitly.
The explicit list prevents the model from summarizing what it finds interesting — which may not be what you care about. The "if not present, say so" instruction prevents fabrication.
Technique 3: Chunking for Deep Analysis
For very long documents where you need thorough analysis of every section, don't paste everything at once. Work section by section.
I'm going to share a business plan in several parts. For each part,
hold your analysis until I've shared everything. Just confirm you've
received each part.
[Part 1 of 4]
[paste]
Once you've shared all parts:
Now that you have the full document, analyze the business plan. Focus on:
- Strength of the market opportunity argument
- Realism of the financial projections
- The weakest assumptions in the model
This approach lets you get comprehensive analysis without the attention problems that come with dumping everything at once.
Technique 4: Quote-and-Verify
For documents where accuracy matters (legal, financial, medical), always ask the model to quote the specific text it's basing its answer on.
Review this employment contract and tell me: does this contract include a
non-compete clause? If yes, quote the exact language and explain what it
means in plain terms.
This does two things. First, it forces the model to ground its answer in the actual text rather than paraphrasing from memory. Second, it lets you verify independently — you can search the document for the quoted passage and check that the interpretation is accurate.
Always verify important claims from documents against the primary source.
Technique 5: Q&A Against a Document
One of the most useful long-document workflows: paste the document, then ask specific questions.
I'm going to paste a research paper below. After reading it, I'll ask
you specific questions about it. Only answer based on what's in the paper —
if a question can't be answered from the paper, say so.
[paste paper]
Questions:
1. What was the sample size and how were participants recruited?
2. What were the three main findings?
3. What limitations did the authors acknowledge?
4. What follow-up research do they suggest?
This is more reliable than "summarize" because the questions target the specific information you need, and the constraint to only use the document prevents hallucination.
Technique 6: Comparative Analysis
When comparing multiple documents (different proposals, multiple contract versions, competitive reports), be explicit about what dimension to compare on.
I have two vendor proposals for the same project. I'll paste them below.
Compare them across these dimensions only:
1. Total cost and payment structure
2. Implementation timeline
3. Support and SLA commitments
4. Red flags or concerning omissions
Format as a side-by-side table where possible.
[Proposal A]
[paste]
[Proposal B]
[paste]
Without explicit comparison dimensions, you'll get a generic "Proposal A does X while Proposal B does Y" that doesn't help you decide.
Working Around Context Limits
Even with large context windows, you'll hit limits with very long documents. Options:
Summarize first, analyze second: Ask the model to summarize each major section (in a fresh conversation where you paste one section at a time), then combine those summaries and do your full analysis on the combined summary.
Be selective about what you paste: You don't always need the full document. A 200-page contract likely has 5 sections you actually care about. Paste those.
Use RAG for very large corpora: If you're regularly working with a collection of documents (a company knowledge base, a legal document library), Retrieval-Augmented Generation (RAG) architectures let you query across documents without pasting them all. This is a technical setup — covered in the Advanced Track — but worth knowing about if your use case involves large document collections.
A Mental Checklist for Long-Document Prompts
Before sending a long-document prompt, ask:
- [ ] Did I explain why I need this, not just what the document contains?
- [ ] Am I asking for specific extraction, or just "summarize"?
- [ ] For accuracy-critical information, did I ask the model to quote its sources?
- [ ] If I'm comparing documents, did I specify the comparison dimensions?
- [ ] Did I include instructions for what to do when information is missing or uncertain?
Key Takeaways
- Purpose-first prompting: tell the model why you need information, not just what the document contains
- Use directed extraction with explicit lists of what to pull out
- For critical documents, require the model to quote specific text it's referencing
- Chunk very long documents rather than pasting everything at once
- Be prepared to verify — long documents increase hallucination risk
Next up: working with multiple formats of input — not just text — and how to prompt when you have images, files, or mixed content. Multimodal Prompting →