Serverless is great for websites. For AI workloads, it's a constant fight.
The problems are predictable: a GPT-4o call for a complex prompt takes 15–25 seconds. Vercel's default timeout is 10 seconds. You upgrade to Pro for the 60-second limit, then a chain-of-thought response exceeds that too. Streaming works — until a cold start adds 2–3 seconds before the first token arrives. And running a local model (Ollama, llama.cpp) isn't possible at all on serverless.
A VPS removes all of this. No timeouts, no cold starts, persistent connections to LLM APIs, and the option to run a local model alongside your app for zero-cost inference.
This is how I deploy AI apps on a Hostinger KVM 2 VPS.
Why AI Apps Need a Real Server
| Concern | Serverless (Vercel Hobby) | Serverless (Vercel Pro) | VPS |
|---|---|---|---|
| Max response time | 10s | 60s | Unlimited |
| Cold start latency | 500ms–3s | 500ms–3s | None |
| SSE streaming | Limited | Yes | Yes |
| Local model (Ollama) | No | No | Yes |
| Persistent connections | No | No | Yes |
| Monthly cost (₹) | Free / ~₹1,700+ | ~₹8,000+ | ~₹600–900 |
For a simple AI chatbot making fast API calls, serverless is fine. For anything with long chains, multi-step agents, RAG pipelines, or local model inference — you need a persistent server.
Prerequisites
- Hostinger KVM 2 VPS with Ubuntu 22.04
- A domain pointed at your VPS IP (A record)
- Your Next.js AI app in a GitHub repo
- OpenAI / Anthropic API keys ready
Step 1: Initial Server Setup
SSH in as root:
ssh root@YOUR_VPS_IP
Create a non-root user and configure the firewall:
adduser deploy
usermod -aG sudo deploy
# Copy SSH key to deploy user
mkdir /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh && chmod 600 /home/deploy/.ssh/authorized_keys
# Firewall
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw enable
Log in as deploy for all remaining steps:
ssh deploy@YOUR_VPS_IP
Step 2: Install Node.js via NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc
nvm install 20
nvm alias default 20
node --version # v20.x.x
Install PM2 — keeps your AI app running and restarts on crash:
npm install -g pm2
pm2 startup
# Run the sudo command it prints
Step 3: Clone Your App and Configure AI Keys
cd /home/deploy
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git app
cd app
npm install
Create your environment file with your LLM API keys:
nano .env.production
# LLM providers
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...
# Optional: local Ollama (if you set it up in Step 6)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
# App config
NEXT_PUBLIC_SITE_URL=https://yourdomain.com
NODE_ENV=production
Secure the file so only your user can read it:
chmod 600 .env.production
Build and start with PM2:
npm run build
pm2 start npm --name "ai-app" -- start
pm2 save
pm2 logs ai-app --lines 20 # confirm it started
Step 4: Nginx with Streaming Support
This is the critical part most guides skip. AI streaming responses (SSE / ReadableStream) require Nginx to not buffer the response. Without this, the user sees nothing until the entire AI response is complete — defeating the purpose of streaming.
Install Nginx:
sudo apt install nginx -y
sudo systemctl enable nginx
Create your site config:
sudo nano /etc/nginx/sites-available/yourdomain.com
server {
listen 80;
server_name yourdomain.com www.yourdomain.com;
# Increase timeouts for long AI requests
proxy_read_timeout 300s;
proxy_connect_timeout 10s;
proxy_send_timeout 300s;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
# Required for WebSocket / streaming connections
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# CRITICAL: disable buffering for SSE streaming
proxy_buffering off;
proxy_cache off;
add_header X-Accel-Buffering no;
}
# Cache static assets — these never need streaming
location /_next/static/ {
alias /home/deploy/app/.next/static/;
expires 1y;
add_header Cache-Control "public, immutable";
proxy_buffering on; # re-enable buffering for static files
}
}
Enable and test:
sudo ln -s /etc/nginx/sites-available/yourdomain.com /etc/nginx/sites-enabled/
sudo nginx -t # must print "ok"
sudo systemctl reload nginx
Step 5: SSL with Let's Encrypt
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com
Select option 2 to redirect all HTTP to HTTPS. Certbot auto-renews via a systemd timer. Test it:
sudo certbot renew --dry-run
Step 6: Optional — Add Ollama for Local LLM Inference
The KVM 2's 8 GB RAM can run a quantised Llama 3.1 8B (Q4, ~5 GB) alongside your app. This gives you zero-cost inference for routine tasks — only complex requests hit the paid API.
curl -fsSL https://ollama.ai/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama
ollama pull llama3.1:8b
Verify Ollama is running:
curl http://localhost:11434/api/tags # should list your models
In your Next.js API route, call Ollama for simple tasks and OpenAI/Anthropic for complex ones:
// app/api/chat/route.ts
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function POST(req: Request) {
const { messages, useLocal } = await req.json();
// Route cheap/simple tasks to local Ollama
if (useLocal) {
const res = await fetch("http://localhost:11434/api/chat", {
method: "POST",
body: JSON.stringify({
model: "llama3.1:8b",
messages,
stream: false,
}),
});
const data = await res.json();
return Response.json({ content: data.message.content });
}
// Complex tasks use OpenAI with streaming
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages,
stream: true,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? "";
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
controller.close();
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
Step 7: Automatic Deploys via GitHub Actions
Generate a deploy key on the server:
ssh-keygen -t ed25519 -C "github-deploy" -f ~/.ssh/github_deploy -N ""
cat ~/.ssh/github_deploy.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/github_deploy # copy the private key
Add to GitHub repo → Settings → Secrets → Actions:
VPS_HOST— your server IPVPS_USER—deployVPS_SSH_KEY— the private key output above
Create .github/workflows/deploy.yml in your repo:
name: Deploy AI App
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to Hostinger VPS
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.VPS_HOST }}
username: ${{ secrets.VPS_USER }}
key: ${{ secrets.VPS_SSH_KEY }}
script: |
cd /home/deploy/app
git pull origin main
npm ci --production=false
npm run build
pm2 restart ai-app
pm2 save
Every push to main triggers an automatic redeploy.
Cost Comparison: VPS vs Serverless for AI
| Setup | Monthly Cost (₹) | AI Timeout | Streaming | Local LLM |
|---|---|---|---|---|
| Vercel Hobby | Free | 10s | Limited | No |
| Vercel Pro | ~₹8,000 | 60s | Yes | No |
| Render Starter | ~₹1,700 | 30s | Yes | No |
| Hostinger KVM 2 | ~₹600–900 | None | Yes | Yes |
The VPS is cheaper than Vercel Pro and removes every serverless constraint that makes AI apps painful.
Monitoring Your AI App
pm2 status # all processes
pm2 logs ai-app # live logs (watch for API errors)
pm2 monit # CPU + memory per process
free -h # RAM — watch this if running Ollama
Watch for:
- 429 errors in logs → you're hitting OpenAI/Anthropic rate limits
- RAM over 7 GB → Ollama + app + OS is nearing the KVM 2 ceiling
- Nginx 504 →
proxy_read_timeoutneeds to be higher for that endpoint
What You've Built
At the end of this guide you have:
- An AI app running on your own infrastructure with no timeout ceiling
- SSE streaming that works correctly (Nginx buffering disabled)
- API keys secured in a 600-permission .env file, never in git
- Optional local Ollama for zero-cost inference on the same machine
- Auto-deploy on every git push to main
- Total cost: ~₹600–900/month for the VPS
For any serious AI app — multi-step agents, RAG pipelines, LLM routers, or anything that streams — a VPS beats serverless on cost and capability.
Get Hostinger KVM 2 VPS (affiliate link — same price, supports this site)
Also see: Self-Hosting OpenClaw on Hostinger →



