India has somewhere north of 250,000 QA engineers and SDETs. It's one of the largest concentrations of testing talent in the world — and it's also one of the roles most directly disrupted by AI coding tools. Not replaced, but reshaped. Claude Code doesn't eliminate QA work; it eliminates the part of QA work that nobody liked: writing boilerplate test scaffolding from scratch.
If you spend your days writing pytest suites, Postman collections, or Playwright test stubs, Claude Code can cut that time by 60–80%. Here's a practical guide to setting it up for real QA work — not toy examples, but the messy, schema-heavy, microservices-era testing work that Indian SDETs actually do.
What Claude Code can actually do for QA
Before diving in, let's be honest about where it's strong and where it isn't.
| Task | Claude Code capability | Notes |
|---|---|---|
| Generate pytest unit tests | Excellent | Needs source code or function signature |
| API test collection (Bruno/Postman) | Good | Needs OpenAPI spec or endpoint examples |
| Selenium/Playwright stubs | Decent | Better with page structure context |
| Test data generation | Excellent | Especially with schema files |
| Avro contract testing | Good | Needs .avsc schema file in context |
| Performance test scripts (k6/Locust) | Good | Needs load profile context |
| Mutation testing analysis | Fair | Can explain survivors but not auto-fix |
| Runtime behavior prediction | Poor | Can't know what a service does at runtime |
The pattern: give it structure (code, schemas, specs) and it performs well. Ask it to infer runtime behavior without context and it'll hallucinate.
Setting up CLAUDE.md for a QA project
The single most important thing you can do before writing a single test with Claude Code is create a CLAUDE.md file. This is a markdown file at the root of your test project that Claude Code reads at the start of every session. It tells Claude what stack you're using, what conventions to follow, and what to avoid.
Without it, you'll get generic tests that don't match your patterns. With it, every generated test looks like something your team actually wrote.
What to put in CLAUDE.md
Here's a template for a typical Python/pytest QA project:
# QA Project Context
## Stack
- Python 3.11
- pytest 7.x with pytest-asyncio for async tests
- httpx for API calls (not requests)
- Bruno for API collections (not Postman)
- Faker library for test data generation
- SQLAlchemy for database fixtures
## Test patterns we use
- Arrange-Act-Assert (AAA) — always use these comment sections
- Fixtures defined in conftest.py (not inline in test files)
- Test class names: TestFeatureName (no underscores)
- Test function names: test_should_[expected_behavior]_when_[condition]
- Always parametrize when testing multiple input variants
- Use pytest.mark.integration for tests that hit external services
## Database
- Use factory_boy for model factories, not raw SQL inserts
- Always roll back transactions in teardown (use session fixture from conftest)
- Schema: PostgreSQL 15, see /schemas/ directory
## What NOT to do
- Do not use unittest.TestCase (we use pure pytest)
- Do not hardcode test data inline — use Faker or factory_boy
- Do not use time.sleep() — use pytest-asyncio with proper await
- Do not write tests that depend on execution order
## API base URL
- Local: http://localhost:8000
- The service mounts at /api/v2/ (not /api/v1/)
If you're writing this for the first time, spend 20 minutes on it. It'll save you hours of editing generated tests to match your conventions.
Adding schema context for Kafka/Avro tests
If your service publishes Avro events, paste the schema file content into CLAUDE.md or reference it explicitly in your prompts:
## Kafka schemas
See /schemas/avro/ directory. Key schemas:
- payment_initiated.avsc — emitted when payment flow starts
- payment_completed.avsc — emitted on successful capture
- kyc_status_changed.avsc — emitted by KYC service
Always import schema from file, don't hardcode field names inline.
Generating pytest tests — real examples
From a FastAPI endpoint
Say you have this endpoint in your service:
@router.post("/payments/refund")
async def initiate_refund(
request: RefundRequest,
db: AsyncSession = Depends(get_db),
current_user: User = Depends(get_current_user)
) -> RefundResponse:
"""Initiate a refund for a completed payment."""
payment = await get_payment(db, request.payment_id)
if payment is None:
raise HTTPException(status_code=404, detail="Payment not found")
if payment.status != PaymentStatus.COMPLETED:
raise HTTPException(status_code=400, detail="Only completed payments can be refunded")
if payment.user_id != current_user.id:
raise HTTPException(status_code=403, detail="Cannot refund another user's payment")
refund = await create_refund(db, payment, request.amount)
return RefundResponse.from_orm(refund)
The Claude Code prompt to generate tests:
Generate a pytest test file for the POST /payments/refund endpoint above.
Requirements:
- Cover: happy path, payment not found (404), wrong status (400), wrong user (403)
- Use the AAA pattern with comment sections
- Mock get_payment and create_refund at the service layer (not DB layer)
- Use httpx AsyncClient with the FastAPI app (see our TestClient setup in conftest.py)
- Use pytest.mark.asyncio for all async tests
- Follow naming convention: TestInitiateRefund class, test_should_[behavior]_when_[condition] methods
- Generate test data with Faker where needed
Example of what Claude Code generates:
import pytest
from httpx import AsyncClient
from unittest.mock import AsyncMock, patch
from faker import Faker
fake = Faker()
class TestInitiateRefund:
@pytest.mark.asyncio
async def test_should_return_refund_when_payment_is_completed(
self, async_client: AsyncClient, authenticated_user
):
# Arrange
payment_id = fake.uuid4()
mock_payment = MockPayment(
id=payment_id,
status=PaymentStatus.COMPLETED,
user_id=authenticated_user.id,
amount=1000
)
mock_refund = MockRefund(payment_id=payment_id, amount=500)
with patch("app.routers.payments.get_payment", return_value=mock_payment), \
patch("app.routers.payments.create_refund", return_value=mock_refund):
# Act
response = await async_client.post(
"/api/v2/payments/refund",
json={"payment_id": payment_id, "amount": 500}
)
# Assert
assert response.status_code == 200
assert response.json()["payment_id"] == payment_id
@pytest.mark.asyncio
async def test_should_return_404_when_payment_not_found(
self, async_client: AsyncClient, authenticated_user
):
# Arrange
with patch("app.routers.payments.get_payment", return_value=None):
# Act
response = await async_client.post(
"/api/v2/payments/refund",
json={"payment_id": fake.uuid4(), "amount": 500}
)
# Assert
assert response.status_code == 404
assert "not found" in response.json()["detail"].lower()
Not perfect — you'll need to adjust the MockPayment and MockRefund classes to match your actual models. But the structure, the mock locations, the naming: all correct on first pass.
From an OpenAPI spec
If you have an OpenAPI spec, Claude Code can generate a full Bruno collection:
I have an OpenAPI 3.0 spec for our payments service (pasted below / in openapi.yaml).
Generate a Bruno collection that:
- Creates one folder per tag in the spec
- Creates one .bru file per endpoint
- Uses {{baseUrl}} variable for the host
- Includes example request bodies from the spec's examples field
- Adds a pre-request script to set Authorization: Bearer {{authToken}}
- For POST /payments/refund specifically, add a test script that asserts status 200 and response.payment_id is present
[paste spec here]
Bruno collections are just files in a folder — Claude Code can write them directly to your repo.
The agentic QA workflow
This is where Claude Code becomes genuinely powerful rather than just a glorified snippet generator. Here's an agentic loop that runs as part of your PR review process:
- Claude Code reads the PR diff (via
git diff main...HEAD) - Identifies changed or new functions
- Generates or updates corresponding tests
- Runs
pytestto check they pass - Iterates on failures until tests are green
The exact Claude Code commands to run this:
# Start Claude Code in the repo root
claude
# Then in the Claude Code session:
> Read the output of `git diff main...HEAD` and identify all changed Python functions and methods.
> For each changed function, check if a corresponding test exists in tests/.
> Generate tests for any function that has no coverage or whose tests don't cover the new code paths.
> Run pytest on the new test files only: pytest tests/test_payments.py -v
> Fix any failures and re-run until all tests pass.
You can also run this headlessly in CI:
claude --headless "Read git diff, generate tests for uncovered changed functions, run pytest, iterate until green. Output a summary of tests added."
Set this up as a GitHub Actions step and your team gets auto-generated test suggestions on every PR. Engineers still review and merge them — but the boilerplate is done.
Prompts that work well for QA tasks
These are copy-paste ready. Adjust to your stack.
1. Generate parametrized tests for input validation
Generate parametrized pytest tests for the [function/endpoint] above.
Create a parametrize decorator covering: valid input (should pass), empty string, None, too-long string, SQL injection attempt, XSS payload, Unicode edge cases (Tamil, Devanagari characters).
2. Generate test data factory
Generate a factory_boy factory class for the [ModelName] model above.
Include all required fields. Use Faker for string fields, realistic ranges for numeric fields. Add subfactories for related models [list them].
3. Generate Avro schema tests
Given the Avro schema in /schemas/avro/payment_initiated.avsc, generate pytest tests that:
- Validate the schema parses correctly
- Test that a sample payload serializes/deserializes without data loss
- Test that required fields missing from payload raise SchemaParseException
4. Generate API contract test
I have two services: payment-service (producer) and notification-service (consumer). The notification-service expects the payment_completed event to have fields: payment_id (string), amount (int), currency (string, 3-char), user_email (string).
Generate a pytest contract test that verifies the payment-service schema matches these consumer expectations.
5. Generate load test (k6)
Generate a k6 load test script for the POST /payments/initiate endpoint.
Profile: ramp from 0 to 100 VUs over 30s, hold 100 VUs for 2 minutes, ramp down.
Thresholds: p95 < 500ms, error rate < 1%.
Use realistic payload from the example above.
6. Generate BDD scenarios (Gherkin)
Write Gherkin feature file scenarios for the refund flow. Cover happy path, insufficient balance, payment not eligible. Use Indian rupee amounts and realistic merchant names.
7. Generate test for async Celery task
Write a pytest test for the send_refund_notification Celery task. Mock the email service. Verify the task is called with correct arguments when refund status changes to COMPLETED. Use pytest-celery fixtures.
8. Generate negative tests from error codes
Our payment service returns these error codes: INVALID_VPA, INSUFFICIENT_FUNDS, UPI_LIMIT_EXCEEDED, BANK_DOWN, DUPLICATE_TRANSACTION.
Generate pytest tests that verify each error code is returned under the correct conditions. Mock the UPI gateway at the httpx call level.
💡 Want to go deeper? The prompting patterns behind these — chain-of-thought, constrained generation, few-shot — are covered in our Advanced track.
Limitations to know
Claude Code doesn't know runtime behavior. It generates tests based on code structure and your instructions. If your UPI gateway behaves unexpectedly at runtime, Claude Code can't know that. Always run generated tests against a real test environment before trusting coverage numbers.
CLAUDE.md quality determines output quality. I've seen teams skip the CLAUDE.md setup and then complain that Claude Code generates "garbage tests." Every bad test I've seen traces back to missing context — wrong mock locations, wrong assertion style, unknown test utilities. Put the time into CLAUDE.md.
Review all generated auth and security tests. Generated tests for auth endpoints often have subtle gaps — they test status codes but not session invalidation, or they mock at the wrong layer. Security test coverage is not something to delegate entirely to AI.
Token limits on large test suites. Claude Code's context window is large but not unlimited. For test files over ~2,000 lines, split into modules before asking it to work across the full file.
Next steps
- How to write a CLAUDE.md for any project — the most important setup step
- Cursor AI prompt engineering guide — if you prefer IDE-integrated coding
- Evaluation frameworks for AI systems — how to measure if your AI-generated tests are actually good



