Most AI code review prompts are useless. "Review this code" gets you a list of obvious style suggestions and a paragraph about error handling that was already fine. You already know to add comments. You need the model to find the bug you missed.
Getting AI to do genuinely useful code review requires being specific about what you want it to look for and giving it the right context. Here's what works.
The problem with generic code review prompts
When you ask an LLM to "review this code," it optimizes for looking thorough. It'll mention:
- Variable naming
- Missing docstrings
- Error handling suggestions
- General best practices
These aren't wrong, but they're rarely why code breaks in production. The bugs that matter are logic errors, race conditions, off-by-one errors, auth bypasses, and incorrect assumptions about inputs.
You need to tell the model what to look for, or it'll give you the feedback equivalent of running a linter.
Security-focused review
For any code handling user input, authentication, or data persistence:
Review this code for security vulnerabilities. Specifically look for:
1. Injection risks (SQL, command, LDAP, XSS, SSTI)
2. Authentication and authorization bypasses — can a user access data or actions they shouldn't?
3. Insecure deserialization or unsafe eval/exec calls
4. Sensitive data in logs, responses, or error messages
5. Race conditions in auth flows (TOCTOU issues)
6. Missing input validation or reliance on client-provided data for trust decisions
7. Hardcoded secrets, tokens, or credentials
For each issue: describe the vulnerability, explain how it could be exploited, and suggest the fix.
Skip style issues and general best practices — only security concerns.
[paste code]
The explicit instruction to skip style issues matters. Without it, the model buries the security findings in noise.
Logic and correctness review
For algorithmic code or business logic:
Review this code for logical errors and incorrect behavior.
Context: [brief description of what this function/module is supposed to do]
Look for:
1. Cases where the output is wrong even if no exception is thrown
2. Off-by-one errors in loops, slice operations, or boundary conditions
3. Incorrect handling of edge cases: empty inputs, null/None/undefined, zero values, negative numbers
4. Assumptions about input state that could be violated by callers
5. Logic that silently does the wrong thing (wrong branch taken, condition inverted, etc.)
For each issue: describe what the code does vs. what it should do, and which inputs would trigger the wrong behavior.
[paste code]
This prompt works especially well for functions with complex conditional logic or data transformations.
Concurrency and race condition review
For async code, multi-threaded services, or anything involving shared state:
Review this code for concurrency issues and race conditions.
This code runs in [describe context: e.g., "a multi-threaded web server", "async Python with asyncio", "a Node.js event loop with multiple concurrent requests"].
Look for:
1. Shared mutable state accessed from multiple threads/coroutines without proper locking
2. TOCTOU (time-of-check to time-of-use) patterns — checking a condition then acting on it without holding a lock
3. Non-atomic read-modify-write operations
4. Deadlock potential from nested locking or lock ordering issues
5. Async operations that assume sequential ordering of concurrent tasks
6. Database operations that need transactions but don't have them
For each issue: describe the race condition, the failure scenario (what bad state could occur), and the fix.
[paste code]
Performance review
When you suspect a performance issue but aren't sure where:
Review this code for performance problems.
Context: [describe the scale — e.g., "this runs on every API request, ~500 req/s", "this processes a list of ~50,000 records nightly"]
Look for:
1. N+1 query patterns — database queries inside loops
2. Unnecessary work in hot paths — expensive operations that could be cached or moved outside loops
3. Inefficient data structures for the access patterns used
4. Memory inefficiency — accumulating large intermediate collections unnecessarily
5. Missing indexes (if database queries are visible)
6. Blocking I/O where async would help
For each issue: estimate the relative impact (high/medium/low) and suggest the fix. Ignore micro-optimizations that won't matter at scale.
[paste code]
Pull request diff review
When reviewing a PR rather than isolated code, you need the model to understand what changed and why it matters:
I'm reviewing a pull request. Here's the diff:
[paste diff]
Context about what this change is supposed to do:
[1-2 sentences describing the intent of the PR]
Review focus:
1. Does the change correctly implement the stated intent? Are there cases where it doesn't?
2. Does the change introduce any regressions — does it break behavior that existed before?
3. Are there edge cases the author didn't account for?
4. For any deleted code: is anything being removed that was load-bearing for a non-obvious reason?
Note: don't comment on code that wasn't changed. Focus only on what this diff introduces or modifies.
The "don't comment on unchanged code" instruction prevents a common failure mode where the model starts reviewing the entire file instead of the change.
Integration and interface review
For code that connects to external services, APIs, or between modules:
Review this code for integration issues.
This code [describe what it integrates with — e.g., "calls a third-party payment API", "reads from a Kafka topic", "interfaces between our auth service and user service"].
Look for:
1. Missing error handling for network failures, timeouts, or unexpected response codes
2. Assumptions about response structure that could fail if the external service changes behavior
3. No retry logic where retries would be appropriate (and no circuit breaker where retries would make things worse)
4. Missing idempotency handling — if this request fires twice, does bad things happen?
5. Credentials or API keys passed incorrectly or logged
6. Missing pagination handling for endpoints that return paginated results
[paste code]
The context sandwich
The prompts above get better when you add more context around the code. Things worth including:
What the code is supposed to do: A one-sentence description of intent helps the model catch when the implementation diverges from it.
Known constraints: "This must be idempotent" or "this runs with read-only DB access" gives the model constraints to check against.
Related code: If a function calls into other functions whose behavior matters, include them. "Also including the validate_user function it calls" prevents the model from making incorrect assumptions.
What you already checked: "I've already verified the SQL queries use parameterized inputs" saves time on things you know are fine.
Asking for a second pass
After an initial review, ask the model to look harder at specific areas:
You identified [X] as a potential issue. Let's go deeper on this.
Walk through the execution path step by step for the case where [specific condition].
What is the actual state of [variable/object] at each step, and where does the behavior become incorrect?
This forces the model from "here's a pattern that looks suspicious" to "here's exactly what happens and why it's wrong." The specificity either confirms the bug or rules it out.
When AI review is most valuable
AI code review pays off most when:
- You're reviewing code in a domain you're less familiar with (the model knows security anti-patterns even if you don't)
- You're the only reviewer and want a second perspective before merging
- The PR is large and you want to identify which sections deserve your closest attention
- You're doing a post-mortem and want to understand how a bug could have been caught earlier
It's less useful for:
- Simple changes where your own review is clearly sufficient
- Code in unusual internal frameworks the model won't understand without extensive context
- Cases where the correctness depends on business logic that lives entirely outside the code being reviewed
For building more systematic review into your workflow, the prompt chaining lesson covers how to structure multi-step analysis tasks like this. And if you're using Claude Code for interactive debugging, the debugging with Claude Code post covers the interactive workflow.



