Case Studies — Melon Bots

FreeJobAlert is one of India's largest government-jobs portals. Launched in 2011, it has grown into a content network covering central and state government recruitments, exam results, admit cards, and career guidance — all of it needing to be live, accurate, and fast on the day a notification drops, because that's when ten million job seekers hit the site simultaneously.

What makes this site interesting technically isn't the technology. It's the operational discipline required to keep a traffic-spiky content site up for fifteen years while the team behind it stays small. The ingredients are boring — Cloudflare in front, tuned Nginx on a Hetzner VPS, MariaDB replication, PHP 8.x with aggressive OPcache, a custom publishing pipeline, and cron-driven content workers. But each piece is tuned specifically for the traffic pattern of a jobs portal, and each piece has been through multiple rounds of being broken and fixed in production.

Traffic spikes that kill generic WordPress sites

When a major government notification drops, traffic can go from 200 concurrent users to 15,000 within a few minutes. A default WordPress setup on shared hosting dies under that load. FreeJobAlert absorbs it because the edge cache is doing 95%+ of the work, the origin is serving fully-warm pages out of OPcache, and the database is shielded by query-result caching for everything that doesn't need to be real-time.

What we learned here that we transfer to clients

Traffic planning is not about average load, it's about the peak-to-average ratio. Content sites have ratios of 10–50x. You size for peaks.
Edge caching discipline beats backend performance optimization. Every millisecond saved at the edge is a millisecond you don't have to save at the origin.
Boring databases, boring stacks. MariaDB with sensible defaults beats any exotic data store for 99% of content workloads.
Observability isn't optional. You cannot run a 10M/mo site blind. Every spike, every slow query, every failing cron needs to be visible within 60 seconds.

ClawdBot started in 2024 as a weekend script to parse government recruitment PDFs and has evolved through five major versions into a full AI content pipeline that now runs parts of the FreeJobAlert publishing operation. It's probably the most battle-tested non-venture-backed AI content system currently in production in its category — because it has to be, because the alternative is being wrong publicly about salary figures and exam dates to millions of readers.

The pipeline ingests source material (government PDFs, notification URLs, RSS feeds), runs structured extraction through multiple LLMs in parallel, composes articles from templated sections with traceable source spans, generates brand-consistent imagery via ImageMagick and ships output through a human review bridge into a WordPress-derived CMS. The whole thing runs on the same VPS that serves the site — no separate ML infrastructure, no Kubernetes, no MLOps platform. Just Python, systemd, and careful engineering.

The multi-model architecture

Every serious production AI pipeline we've seen eventually converges on multi-model. The reason is brutal: any single LLM will have days where it gets something important wrong on your specific content, and if your pipeline depends on that one model, those days ship bad articles. ClawdBot runs extraction through Gemini, Claude, and Qwen in parallel, compares outputs at the field level, publishes where they agree, escalates to review where they don't.

v5.3 and local inference

Version 5.3 introduced Ollama with Qwen 2.5 as the default model for the bulk of the pipeline, with hosted models used only for the hardest extraction steps. This cut per-article LLM costs to essentially zero for the high-volume workloads while keeping the quality floor intact. The pattern — local for bulk, hosted for hard — is now the default architecture we propose for any client AI build.

What transfers to client builds

Source-spanning and traceability from day one. No "trust the model"; every claim tied to source.
Review-first UI design. The editor's workflow is more important than the model's output format.
Explicit confidence scoring. The pipeline knows when it's uncertain and says so.
Hybrid inference architecture. Local models do the boring bulk; hosted models do the hard parts.

In spring 2026 we ran a migration program moving a complete content network off two cPanel servers — one of which was end-of-life with a hard deadline from the provider — onto a single properly-provisioned Hetzner VPS. Forty-plus sites ranging from small niche content properties to a WordPress property with meaningful traffic, all moved inside a month, with zero data loss and cumulative downtime measured in minutes per site.

The operational lessons from this project now form the backbone of the productized migration service we offer clients. Every migration pattern we recommend, every gotcha we flag on scoping calls, every line item in the SOW — it comes from this cohort of forty sites and the problems we hit on each of them.

The problems that actually come up

cPanel MySQL dumps default to utf8mb4_0900_ai_ci collation, which MariaDB 10/11 doesn't know about. If you miss this, imports silently corrupt. We now normalize every dump with a sed pass before import, every time.

Email is the biggest landmine. Every site we've migrated has had at least one forgotten contact form or transactional email flow that broke after cutover because it was relying on the cPanel mail server. We now force an email inventory as a prerequisite to the migration quote.

Cron jobs move easily. Scheduled scripts that depend on cPanel-specific environment variables do not. We audit every cron for host-specific assumptions before porting.

What this produced for the client

Hosting cost reduction of roughly 60% (two large cPanel servers to one tuned VPS).
Performance improvement: median TTFB across the network dropped from ~900ms to ~180ms.
Operational simplification: one server to manage, one monitoring dashboard, one backup routine, one SSL setup.
Security posture upgrade: from the cPanel default (a minefield) to a hardened Ubuntu box with fail2ban, UFW, and key-only SSH.

OpportunityRadar is an internal-plus-friends US equities screening system that pulls market data for roughly 6,600 US-listed tickers daily, runs them through a 100-point composite scoring model, and delivers ranked opportunities via Telegram and a dashboard. Not a content business — a data engineering one — but the same principles apply: reliable ingestion, tight operations, no drama.

Flask API on port 5055, systemd-managed, backed by PostgreSQL and driven by a cron-based pipeline that fans out to market data providers, collects results, runs the scoring model, writes to the database, and pushes the top-ranked tickers to Telegram for subscribed users. The whole system runs on a single VPS and costs less to operate than most people's Netflix subscription.

What this demonstrates

We're equally comfortable with content infrastructure and data infrastructure. Not all our work is PDFs and WordPress.
Single-node systems done well outperform multi-node systems done poorly. Complexity is not a sign of seriousness.
Boring observability — logs to files, Prometheus for metrics, Telegram for alerts — beats ornate observability that doesn't get read.

What production actually looks like.

FreeJobAlert: running a top-1000 Indian site.

Traffic spikes that kill generic WordPress sites

What we learned here that we transfer to clients

ClawdBot: a production AI content pipeline.

The multi-model architecture

v5.3 and local inference

What transfers to client builds

Exodus: 40+ sites off cPanel in a month.

The problems that actually come up

What this produced for the client

OpportunityRadar: screening 6,600 US stocks daily.

What this demonstrates

Shall we do this for you?