Why can't I run an AI app on Vercel or Netlify?

You can — with limitations. Serverless functions have execution time limits (10s on Vercel Hobby, 60s on Pro), which AI models regularly exceed for complex tasks. Streaming works on Pro plans but cold starts add latency. A VPS has no timeout ceiling and stays warm 24/7.

How do I handle streaming AI responses (SSE) on Nginx?

By default, Nginx buffers responses, which breaks server-sent events (SSE) streaming. You need to add `proxy_buffering off`, `X-Accel-Buffering no`, and proper cache-control headers to your Nginx location block. This guide covers the exact configuration.

Can I run a local LLM (Ollama) alongside my AI app on Hostinger?

Yes on KVM 2 (8 GB RAM) with a quantised 7B/8B model (Llama 3.1 8B Q4: ~5 GB). Ollama runs as a background service on port 11434. Your Next.js API routes call it at http://localhost:11434 — no API key, no per-token cost.

How do I keep my OpenAI and Anthropic API keys secure on a VPS?

Store keys in a .env file owned by your deploy user with 600 permissions. Never commit them to git. PM2 reads them at startup via the env_file option in ecosystem.config.js. Use a separate, rate-limited API key for production — not your personal key.

Deploy AI Apps on Hostinger VPS: No Timeouts

Serverless is great for websites. For AI workloads, it's a constant fight.

The problems are predictable: a GPT-4o call for a complex prompt takes 15–25 seconds. Vercel's default timeout is 10 seconds. You upgrade to Pro for the 60-second limit, then a chain-of-thought response exceeds that too. Streaming works — until a cold start adds 2–3 seconds before the first token arrives. And running a local model (Ollama, llama.cpp) isn't possible at all on serverless.

A VPS removes all of this. No timeouts, no cold starts, persistent connections to LLM APIs, and the option to run a local model alongside your app for zero-cost inference.

This is how I deploy AI apps on a Hostinger KVM 2 VPS.

Why AI Apps Need a Real Server

Concern	Serverless (Vercel Hobby)	Serverless (Vercel Pro)	VPS
Max response time	10s	60s	Unlimited
Cold start latency	500ms–3s	500ms–3s	None
SSE streaming	Limited	Yes	Yes
Local model (Ollama)	No	No	Yes
Persistent connections	No	No	Yes
Monthly cost (₹)	Free / ~₹1,700+	~₹8,000+	~₹600–900

For a simple AI chatbot making fast API calls, serverless is fine. For anything with long chains, multi-step agents, RAG pipelines, or local model inference — you need a persistent server.

Prerequisites

Hostinger KVM 2 VPS with Ubuntu 22.04
A domain pointed at your VPS IP (A record)
Your Next.js AI app in a GitHub repo
OpenAI / Anthropic API keys ready

Step 1: Initial Server Setup

SSH in as root:

ssh root@YOUR_VPS_IP

Create a non-root user and configure the firewall:

adduser deploy
usermod -aG sudo deploy

# Copy SSH key to deploy user
mkdir /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh && chmod 600 /home/deploy/.ssh/authorized_keys

# Firewall
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw enable

ssh deploy@YOUR_VPS_IP

Step 2: Install Node.js via NVM

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc

nvm install 20
nvm alias default 20
node --version  # v20.x.x

Install PM2 — keeps your AI app running and restarts on crash:

npm install -g pm2
pm2 startup
# Run the sudo command it prints

Step 3: Clone Your App and Configure AI Keys

cd /home/deploy
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git app
cd app
npm install

Create your environment file with your LLM API keys:

nano .env.production

# LLM providers
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...

# Optional: local Ollama (if you set it up in Step 6)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# App config
NEXT_PUBLIC_SITE_URL=https://yourdomain.com
NODE_ENV=production

Secure the file so only your user can read it:

chmod 600 .env.production

Build and start with PM2:

npm run build
pm2 start npm --name "ai-app" -- start
pm2 save
pm2 logs ai-app --lines 20  # confirm it started

Step 4: Nginx with Streaming Support

This is the critical part most guides skip. AI streaming responses (SSE / ReadableStream) require Nginx to not buffer the response. Without this, the user sees nothing until the entire AI response is complete — defeating the purpose of streaming.

Install Nginx:

sudo apt install nginx -y
sudo systemctl enable nginx

Create your site config:

sudo nano /etc/nginx/sites-available/yourdomain.com

server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;

    # Increase timeouts for long AI requests
    proxy_read_timeout 300s;
    proxy_connect_timeout 10s;
    proxy_send_timeout 300s;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;

        # Required for WebSocket / streaming connections
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # CRITICAL: disable buffering for SSE streaming
        proxy_buffering off;
        proxy_cache off;
        add_header X-Accel-Buffering no;
    }

    # Cache static assets — these never need streaming
    location /_next/static/ {
        alias /home/deploy/app/.next/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        proxy_buffering on;  # re-enable buffering for static files
    }
}

Enable and test:

sudo ln -s /etc/nginx/sites-available/yourdomain.com /etc/nginx/sites-enabled/
sudo nginx -t  # must print "ok"
sudo systemctl reload nginx

Step 5: SSL with Let's Encrypt

sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com

Select option 2 to redirect all HTTP to HTTPS. Certbot auto-renews via a systemd timer. Test it:

sudo certbot renew --dry-run

Step 6: Optional — Add Ollama for Local LLM Inference

The KVM 2's 8 GB RAM can run a quantised Llama 3.1 8B (Q4, ~5 GB) alongside your app. This gives you zero-cost inference for routine tasks — only complex requests hit the paid API.

curl -fsSL https://ollama.ai/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama

ollama pull llama3.1:8b

Verify Ollama is running:

curl http://localhost:11434/api/tags  # should list your models

In your Next.js API route, call Ollama for simple tasks and OpenAI/Anthropic for complex ones:

// app/api/chat/route.ts
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: Request) {
  const { messages, useLocal } = await req.json();

  // Route cheap/simple tasks to local Ollama
  if (useLocal) {
    const res = await fetch("http://localhost:11434/api/chat", {
      method: "POST",
      body: JSON.stringify({
        model: "llama3.1:8b",
        messages,
        stream: false,
      }),
    });
    const data = await res.json();
    return Response.json({ content: data.message.content });
  }

  // Complex tasks use OpenAI with streaming
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content ?? "";
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Step 7: Automatic Deploys via GitHub Actions

Generate a deploy key on the server:

ssh-keygen -t ed25519 -C "github-deploy" -f ~/.ssh/github_deploy -N ""
cat ~/.ssh/github_deploy.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/github_deploy  # copy the private key

Add to GitHub repo → Settings → Secrets → Actions:

VPS_HOST — your server IP
VPS_USER — deploy
VPS_SSH_KEY — the private key output above

Create .github/workflows/deploy.yml in your repo:

name: Deploy AI App

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Hostinger VPS
        uses: appleboy/ssh-action@v1.0.3
        with:
          host: ${{ secrets.VPS_HOST }}
          username: ${{ secrets.VPS_USER }}
          key: ${{ secrets.VPS_SSH_KEY }}
          script: |
            cd /home/deploy/app
            git pull origin main
            npm ci --production=false
            npm run build
            pm2 restart ai-app
            pm2 save

Every push to main triggers an automatic redeploy.

Cost Comparison: VPS vs Serverless for AI

Setup	Monthly Cost (₹)	AI Timeout	Streaming	Local LLM
Vercel Hobby	Free	10s	Limited	No
Vercel Pro	~₹8,000	60s	Yes	No
Render Starter	~₹1,700	30s	Yes	No
Hostinger KVM 2	~₹600–900	None	Yes	Yes

The VPS is cheaper than Vercel Pro and removes every serverless constraint that makes AI apps painful.

Monitoring Your AI App

pm2 status              # all processes
pm2 logs ai-app         # live logs (watch for API errors)
pm2 monit               # CPU + memory per process
free -h                 # RAM — watch this if running Ollama

Watch for:

429 errors in logs → you're hitting OpenAI/Anthropic rate limits
RAM over 7 GB → Ollama + app + OS is nearing the KVM 2 ceiling
Nginx 504 → proxy_read_timeout needs to be higher for that endpoint

What You've Built

At the end of this guide you have:

An AI app running on your own infrastructure with no timeout ceiling
SSE streaming that works correctly (Nginx buffering disabled)
API keys secured in a 600-permission .env file, never in git
Optional local Ollama for zero-cost inference on the same machine
Auto-deploy on every git push to main
Total cost: ~₹600–900/month for the VPS

For any serious AI app — multi-step agents, RAG pipelines, LLM routers, or anything that streams — a VPS beats serverless on cost and capability.

Also see: Self-Hosting OpenClaw on Hostinger →

Serverless is great for websites. For AI workloads, it's a constant fight.

A VPS removes all of this. No timeouts, no cold starts, persistent connections to LLM APIs, and the option to run a local model alongside your app for zero-cost inference.

This is how I deploy AI apps on a Hostinger KVM 2 VPS.

Why AI Apps Need a Real Server

Concern	Serverless (Vercel Hobby)	Serverless (Vercel Pro)	VPS
Max response time	10s	60s	Unlimited
Cold start latency	500ms–3s	500ms–3s	None
SSE streaming	Limited	Yes	Yes
Local model (Ollama)	No	No	Yes
Persistent connections	No	No	Yes
Monthly cost (₹)	Free / ~₹1,700+	~₹8,000+	~₹600–900

For a simple AI chatbot making fast API calls, serverless is fine. For anything with long chains, multi-step agents, RAG pipelines, or local model inference — you need a persistent server.

Prerequisites

Hostinger KVM 2 VPS with Ubuntu 22.04
A domain pointed at your VPS IP (A record)
Your Next.js AI app in a GitHub repo
OpenAI / Anthropic API keys ready

Step 1: Initial Server Setup

SSH in as root:

ssh root@YOUR_VPS_IP

Create a non-root user and configure the firewall:

adduser deploy
usermod -aG sudo deploy

# Copy SSH key to deploy user
mkdir /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh && chmod 600 /home/deploy/.ssh/authorized_keys

# Firewall
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw enable

ssh deploy@YOUR_VPS_IP

Step 2: Install Node.js via NVM

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc

nvm install 20
nvm alias default 20
node --version  # v20.x.x

Install PM2 — keeps your AI app running and restarts on crash:

npm install -g pm2
pm2 startup
# Run the sudo command it prints

Step 3: Clone Your App and Configure AI Keys

cd /home/deploy
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git app
cd app
npm install

Create your environment file with your LLM API keys:

nano .env.production

# LLM providers
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...

# Optional: local Ollama (if you set it up in Step 6)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# App config
NEXT_PUBLIC_SITE_URL=https://yourdomain.com
NODE_ENV=production

Secure the file so only your user can read it:

chmod 600 .env.production

Build and start with PM2:

npm run build
pm2 start npm --name "ai-app" -- start
pm2 save
pm2 logs ai-app --lines 20  # confirm it started

Step 4: Nginx with Streaming Support

Install Nginx:

sudo apt install nginx -y
sudo systemctl enable nginx

Create your site config:

sudo nano /etc/nginx/sites-available/yourdomain.com

server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;

    # Increase timeouts for long AI requests
    proxy_read_timeout 300s;
    proxy_connect_timeout 10s;
    proxy_send_timeout 300s;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;

        # Required for WebSocket / streaming connections
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # CRITICAL: disable buffering for SSE streaming
        proxy_buffering off;
        proxy_cache off;
        add_header X-Accel-Buffering no;
    }

    # Cache static assets — these never need streaming
    location /_next/static/ {
        alias /home/deploy/app/.next/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        proxy_buffering on;  # re-enable buffering for static files
    }
}

Enable and test:

sudo ln -s /etc/nginx/sites-available/yourdomain.com /etc/nginx/sites-enabled/
sudo nginx -t  # must print "ok"
sudo systemctl reload nginx

Step 5: SSL with Let's Encrypt

sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com

Select option 2 to redirect all HTTP to HTTPS. Certbot auto-renews via a systemd timer. Test it:

sudo certbot renew --dry-run

Step 6: Optional — Add Ollama for Local LLM Inference

The KVM 2's 8 GB RAM can run a quantised Llama 3.1 8B (Q4, ~5 GB) alongside your app. This gives you zero-cost inference for routine tasks — only complex requests hit the paid API.

curl -fsSL https://ollama.ai/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama

ollama pull llama3.1:8b

Verify Ollama is running:

curl http://localhost:11434/api/tags  # should list your models

In your Next.js API route, call Ollama for simple tasks and OpenAI/Anthropic for complex ones:

// app/api/chat/route.ts
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: Request) {
  const { messages, useLocal } = await req.json();

  // Route cheap/simple tasks to local Ollama
  if (useLocal) {
    const res = await fetch("http://localhost:11434/api/chat", {
      method: "POST",
      body: JSON.stringify({
        model: "llama3.1:8b",
        messages,
        stream: false,
      }),
    });
    const data = await res.json();
    return Response.json({ content: data.message.content });
  }

  // Complex tasks use OpenAI with streaming
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content ?? "";
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Step 7: Automatic Deploys via GitHub Actions

Generate a deploy key on the server:

ssh-keygen -t ed25519 -C "github-deploy" -f ~/.ssh/github_deploy -N ""
cat ~/.ssh/github_deploy.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/github_deploy  # copy the private key

Add to GitHub repo → Settings → Secrets → Actions:

VPS_HOST — your server IP
VPS_USER — deploy
VPS_SSH_KEY — the private key output above

Create .github/workflows/deploy.yml in your repo:

name: Deploy AI App

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Hostinger VPS
        uses: appleboy/ssh-action@v1.0.3
        with:
          host: ${{ secrets.VPS_HOST }}
          username: ${{ secrets.VPS_USER }}
          key: ${{ secrets.VPS_SSH_KEY }}
          script: |
            cd /home/deploy/app
            git pull origin main
            npm ci --production=false
            npm run build
            pm2 restart ai-app
            pm2 save

Every push to main triggers an automatic redeploy.

Cost Comparison: VPS vs Serverless for AI

Setup	Monthly Cost (₹)	AI Timeout	Streaming	Local LLM
Vercel Hobby	Free	10s	Limited	No
Vercel Pro	~₹8,000	60s	Yes	No
Render Starter	~₹1,700	30s	Yes	No
Hostinger KVM 2	~₹600–900	None	Yes	Yes

The VPS is cheaper than Vercel Pro and removes every serverless constraint that makes AI apps painful.

Monitoring Your AI App

pm2 status              # all processes
pm2 logs ai-app         # live logs (watch for API errors)
pm2 monit               # CPU + memory per process
free -h                 # RAM — watch this if running Ollama

Watch for:

429 errors in logs → you're hitting OpenAI/Anthropic rate limits
RAM over 7 GB → Ollama + app + OS is nearing the KVM 2 ceiling
Nginx 504 → proxy_read_timeout needs to be higher for that endpoint

What You've Built

At the end of this guide you have:

An AI app running on your own infrastructure with no timeout ceiling
SSE streaming that works correctly (Nginx buffering disabled)
API keys secured in a 600-permission .env file, never in git
Optional local Ollama for zero-cost inference on the same machine
Auto-deploy on every git push to main
Total cost: ~₹600–900/month for the VPS

For any serious AI app — multi-step agents, RAG pipelines, LLM routers, or anything that streams — a VPS beats serverless on cost and capability.

Also see: Self-Hosting OpenClaw on Hostinger →

Deploy an AI App on Hostinger VPS: No Timeouts, No Cold Starts

Why AI Apps Need a Real Server

Prerequisites

Step 1: Initial Server Setup

Step 2: Install Node.js via NVM

Step 3: Clone Your App and Configure AI Keys

Step 4: Nginx with Streaming Support

Step 5: SSL with Let's Encrypt

Step 6: Optional — Add Ollama for Local LLM Inference

Step 7: Automatic Deploys via GitHub Actions

Cost Comparison: VPS vs Serverless for AI

Monitoring Your AI App

What You've Built

Related articles

FastAPI + Claude API — Production Patterns for AI Backends

Claude API with TypeScript — Complete Guide for Node.js and Next.js

The AI agent production checklist — 20 things to do before you ship

Deploy an AI App on Hostinger VPS: No Timeouts, No Cold Starts

Why AI Apps Need a Real Server

Prerequisites

Step 1: Initial Server Setup

Step 2: Install Node.js via NVM

Step 3: Clone Your App and Configure AI Keys

Step 4: Nginx with Streaming Support

Step 5: SSL with Let's Encrypt

Step 6: Optional — Add Ollama for Local LLM Inference

Step 7: Automatic Deploys via GitHub Actions

Cost Comparison: VPS vs Serverless for AI

Monitoring Your AI App

What You've Built

Related articles

FastAPI + Claude API — Production Patterns for AI Backends

Claude API with TypeScript — Complete Guide for Node.js and Next.js

The AI agent production checklist — 20 things to do before you ship