How to Build a News Aggregator with OpenAI API and Next.js 2025

Prerequisites and Project Setup Checklist

Before writing a single line of application code, confirm your environment matches the versions below. Mismatches between the OpenAI SDK v3 and v4 are the single most common source of broken tutorials—the APIs changed significantly.

| Tool | Version | Purpose | |---|---|---| | Node.js | 20.x LTS | Runtime for Next.js and SDK | | Next.js | 14.2+ (App Router) | Full-stack framework | | OpenAI Node SDK | 4.x (openai package) | GPT-4o-mini & embeddings | | Supabase JS | 2.x (@supabase/supabase-js) | Database + pgvector storage | | TypeScript | 5.x | Type safety across the project | | NewsAPI account | Free tier | Live article data source |

Prerequisites checklist:

  • [ ] Node.js 20 LTS installed (node -v confirms v20.x.x)
  • [ ] OpenAI account with API key generated at platform.openai.com
  • [ ] NewsAPI key from newsapi.org (free tier: 100 req/day, development only)
  • [ ] Supabase project created at supabase.com with project URL and anon key
  • [ ] Git initialized and .env.local added to .gitignore
  • [ ] Familiarity with React Server Components and the App Router mental model

Scaffold the project:

npx create-next-app@latest news-aggregator \
  --typescript \
  --tailwind \
  --eslint \
  --app \
  --src-dir \
  --import-alias "@/*"
cd news-aggregator
npm install openai @supabase/supabase-js

Create .env.local in the project root:

OPENAI_API_KEY=sk-...
NEWSAPI_KEY=your_newsapi_key_here
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key_here
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key_here

Note: Use SUPABASE_SERVICE_ROLE_KEY (not the anon key) for server-side upserts and vector operations so Row Level Security doesn't block your writes. Never expose this key in client components.

Estimated time: 90 minutes


Step 1: Fetching Live News Articles from a News API

Your aggregator is only as good as its data source. You need a reliable, typed pipeline that fetches articles server-side—never from the browser, because NewsAPI blocks cross-origin requests from non-localhost origins on the free tier.

Choosing a news data source

NewsAPI gives you structured JSON with title, description, url, urlToImage, publishedAt, and source.name—exactly the fields you'll normalize. GNews offers a similar schema with slightly more generous free-tier rate limits (100 req/day vs. NewsAPI's 100). RSS feeds via the rss-parser npm package are completely free but require you to discover and maintain feed URLs manually. For this tutorial, NewsAPI wins on developer experience; swapping it for GNews requires changing one URL and two field names.

Building the fetch utility

Create src/lib/newsapi.ts:

export interface Article {
  id: string; // SHA-256 hash of URL, generated client-side
  title: string;
  description: string;
  url: string;
  imageUrl: string | null;
  source: string;
  publishedAt: string; // ISO 8601
  // Populated after OpenAI processing:
  summary?: string;
  categoryTags?: string[];
  relevanceScore?: number;
  embedding?: number[];
}

const BASE_URL = 'https://newsapi.org/v2';

export async function fetchTopHeadlines(
  category: string = 'technology',
  country: string = 'us',
  pageSize: number = 20
): Promise<Article[]> {
  const url = new URL(`${BASE_URL}/top-headlines`);
  url.searchParams.set('category', category);
  url.searchParams.set('country', country);
  url.searchParams.set('pageSize', String(pageSize));
  url.searchParams.set('apiKey', process.env.NEWSAPI_KEY!);

  const res = await fetch(url.toString(), {
    next: { revalidate: 3600 }, // Cache at the fetch level too
  });

  if (!res.ok) {
    throw new Error(`NewsAPI error: ${res.status} ${await res.text()}`);
  }

  const data = await res.json();

  if (data.status !== 'ok') {
    throw new Error(`NewsAPI returned status: ${data.status} — ${data.message}`);
  }

  return (data.articles as any[]).map((a) => ({
    id: '', // Will be set via URL hash during upsert
    title: a.title ?? 'Untitled',
    description: a.description ?? '',
    url: a.url,
    imageUrl: a.urlToImage ?? null,
    source: a.source?.name ?? 'Unknown',
    publishedAt: a.publishedAt,
  }));
}

The next: { revalidate: 3600 } on the raw fetch call is your first caching layer—Next.js deduplicates identical requests within a render cycle and refreshes the cache every hour.


Step 2: Summarizing and Ranking Articles with the OpenAI API

Raw headlines and two-sentence descriptions aren't useful for a curated feed. GPT-4o-mini turns each article into a 2-3 sentence summary, attaches semantic category tags, and scores relevance—all for a fraction of a cent per article.

Designing the prompt

The key is requesting strict JSON output using response_format: { type: 'json_object' }. This eliminates markdown fences and prose wrapping in the response, which means you can JSON.parse() the output directly without fragile regex stripping.

Create src/lib/openai.ts:

import OpenAI from 'openai';
import { Article } from './newsapi';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const SYSTEM_PROMPT = `You are a news curation assistant. Given an article title and description,
return a JSON object with exactly these fields:
- summary: string (2-3 sentences, plain English, no jargon)
- category_tags: string[] (2-4 lowercase tags, e.g. ["ai", "startups"])
- relevance_score: number (0.0 to 1.0, how interesting this is to a tech-savvy reader)

Return ONLY valid JSON. No markdown, no explanation.`;

export async function processArticleWithAI(
  article: Pick<Article, 'title' | 'description'>
): Promise<{ summary: string; categoryTags: string[]; relevanceScore: number }> {
  const userMessage = `Title: ${article.title}\nDescription: ${article.description}`;

  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    response_format: { type: 'json_object' },
    messages: [
      { role: 'system', content: SYSTEM_PROMPT },
      { role: 'user', content: userMessage },
    ],
    max_tokens: 300,
    temperature: 0.3, // Low temp = consistent structured output
  });

  const raw = completion.choices[0].message.content ?? '{}';
  const parsed = JSON.parse(raw);

  return {
    summary: parsed.summary ?? '',
    categoryTags: Array.isArray(parsed.category_tags) ? parsed.category_tags : [],
    relevanceScore: typeof parsed.relevance_score === 'number' ? parsed.relevance_score : 0.5,
  };
}

export async function processArticlesBatch(articles: Article[]): Promise<Article[]> {
  // Process with a concurrency limit to avoid 429s
  const results: Article[] = [];
  const CONCURRENCY = 5;

  for (let i = 0; i < articles.length; i += CONCURRENCY) {
    const batch = articles.slice(i, i + CONCURRENCY);
    const processed = await Promise.all(
      batch.map(async (article) => {
        const aiData = await processArticleWithAI(article);
        return { ...article, ...aiData };
      })
    );
    results.push(...processed);
    if (i + CONCURRENCY < articles.length) {
      await new Promise((r) => setTimeout(r, 1000)); // 1s pause between batches
    }
  }
  return results;
}

Note: Streaming (stream: true) is useful for chat UIs but counterproductive here—you can't JSON.parse() a partial stream. Non-streaming with response_format: json_object is the right call for batch processing pipelines.


Step 3: Storing and Caching Results with Supabase

Without persistence, every page load would cost you OpenAI API credits. Supabase stores processed articles so the AI layer only runs on genuinely new content.

Creating the articles table

Run this SQL in your Supabase SQL editor:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS articles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  url TEXT UNIQUE NOT NULL,
  title TEXT NOT NULL,
  description TEXT,
  summary TEXT,
  image_url TEXT,
  source TEXT,
  published_at TIMESTAMPTZ,
  category_tags TEXT[],
  relevance_score FLOAT DEFAULT 0.5,
  embedding vector(1536),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON articles USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Upserting processed articles

Create src/lib/supabase.ts:

import { createClient } from '@supabase/supabase-js';
import { Article } from './newsapi';

// Server-side client with service role key
export const supabaseAdmin = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

export async function upsertArticles(articles: Article[]): Promise<void> {
  const rows = articles.map((a) => ({
    url: a.url,
    title: a.title,
    description: a.description,
    summary: a.summary,
    image_url: a.imageUrl,
    source: a.source,
    published_at: a.publishedAt,
    category_tags: a.categoryTags,
    relevance_score: a.relevanceScore,
    embedding: a.embedding ? JSON.stringify(a.embedding) : null,
  }));

  const { error } = await supabaseAdmin
    .from('articles')
    .upsert(rows, {
      onConflict: 'url',           // Deduplicate by canonical URL
      ignoreDuplicates: false,     // Update existing rows with fresh AI data
    });

  if (error) throw new Error(`Supabase upsert failed: ${error.message}`);
}

Now create the Route Handler at src/app/api/articles/route.ts:

import { NextResponse } from 'next/server';
import { fetchTopHeadlines } from '@/lib/newsapi';
import { processArticlesBatch } from '@/lib/openai';
import { upsertArticles, supabaseAdmin } from '@/lib/supabase';

export const revalidate = 3600; // Regenerate at most once per hour

export async function GET(request: Request) {
  const { searchParams } = new URL(request.url);
  const category = searchParams.get('category') ?? 'technology';

  // 1. Try to serve from cache first
  const { data: cached } = await supabaseAdmin
    .from('articles')
    .select('*')
    .contains('category_tags', [category])
    .order('relevance_score', { ascending: false })
    .limit(20);

  if (cached && cached.length >= 10) {
    return NextResponse.json(cached);
  }

  // 2. Fetch fresh, process with AI, store
  const raw = await fetchTopHeadlines(category);
  const processed = await processArticlesBatch(raw);
  await upsertArticles(processed);

  return NextResponse.json(processed);
}

Step 4: Building the Frontend Feed UI in Next.js App Router

With data flowing from Supabase through the Route Handler, the UI layer is straightforward. Server Components fetch the data, a client component handles filtering, and ArticleCard renders the AI-enriched content.

Server Component page

Create src/app/page.tsx:

import { Suspense } from 'react';
import ArticleFeed from '@/components/ArticleFeed';
import CategoryFilter from '@/components/CategoryFilter';

export default function HomePage({
  searchParams,
}: {
  searchParams: { category?: string };
}) {
  const category = searchParams.category ?? 'technology';

  return (
    <main className="max-w-4xl mx-auto px-4 py-8">
      <h1 className="text-3xl font-bold mb-6">AI News Feed</h1>
      <CategoryFilter />
      <Suspense fallback={<div>Loading articles...</div>}>
        <ArticleFeed category={category} />
      </Suspense>
    </main>
  );
}

Create src/components/ArticleFeed.tsx:

import ArticleCard from './ArticleCard';

async function getArticles(category: string) {
  const res = await fetch(
    `${process.env.NEXT_PUBLIC_SITE_URL}/api/articles?category=${category}`,
    { next: { revalidate: 3600 } }
  );
  if (!res.ok) throw new Error('Failed to fetch articles');
  return res.json();
}

export default async function ArticleFeed({ category }: { category: string }) {
  const articles = await getArticles(category);

  return (
    <div className="space-y-4 mt-6">
      {articles.map((article: any) => (
        <ArticleCard key={article.url} article={article} />
      ))}
    </div>
  );
}

Create src/components/ArticleCard.tsx:

export default function ArticleCard({ article }: { article: any }) {
  const score = article.relevance_score ?? 0;
  const badgeColor =
    score >= 0.8
      ? 'bg-green-100 text-green-800'
      : score >= 0.5
      ? 'bg-yellow-100 text-yellow-800'
      : 'bg-red-100 text-red-800';

  return (
    <article className="border rounded-xl p-5 hover:shadow-md transition-shadow">
      <div className="flex items-start justify-between gap-4">
        <a href={article.url} target="_blank" className="text-lg font-semibold hover:underline">
          {article.title}
        </a>
        <span className={`text-xs font-medium px-2 py-1 rounded-full shrink-0 ${badgeColor}`}>
          {Math.round(score * 100)}%
        </span>
      </div>
      <p className="text-sm text-gray-600 mt-2">{article.summary}</p>
      <div className="flex gap-2 mt-3 flex-wrap">
        {(article.category_tags ?? []).map((tag: string) => (
          <span key={tag} className="text-xs bg-gray-100 text-gray-700 px-2 py-0.5 rounded">
            #{tag}
          </span>
        ))}
      </div>
      <p className="text-xs text-gray-400 mt-2">{article.source} · {new Date(article.published_at).toLocaleDateString()}</p>
    </article>
  );
}

Step 5: Adding an AI-Powered Search with Embeddings

Keyword search breaks down with synonyms and paraphrased queries. Semantic search using text-embedding-3-small and Supabase's pgvector extension finds articles that mean what the user is looking for, not just articles containing the exact words.

Create the SQL similarity function

Run this in Supabase SQL editor:

CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_threshold float DEFAULT 0.7,
  match_count int DEFAULT 10
)
RETURNS TABLE (
  id uuid,
  url text,
  title text,
  summary text,
  category_tags text[],
  relevance_score float,
  similarity float
)
LANGUAGE sql STABLE
AS $$
  SELECT
    id, url, title, summary, category_tags, relevance_score,
    1 - (embedding <=> query_embedding) AS similarity
  FROM articles
  WHERE 1 - (embedding <=> query_embedding) > match_threshold
  ORDER BY similarity DESC
  LIMIT match_count;
$$;

Search Route Handler and embedding generation

Create src/app/api/search/route.ts:

import { NextResponse } from 'next/server';
import OpenAI from 'openai';
import { supabaseAdmin } from '@/lib/supabase';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function GET(request: Request) {
  const { searchParams } = new URL(request.url);
  const query = searchParams.get('q');
  if (!query) return NextResponse.json({ error: 'Missing query' }, { status: 400 });

  // Generate embedding for the search query
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });
  const queryEmbedding = embeddingResponse.data[0].embedding;

  // Perform cosine similarity search via pgvector
  const { data, error } = await supabaseAdmin.rpc('match_documents', {
    query_embedding: queryEmbedding,
    match_threshold: 0.7,
    match_count: 10,
  });

  if (error) return NextResponse.json({ error: error.message }, { status: 500 });
  return NextResponse.json(data);
}

For embeddings to work, update processArticlesBatch in src/lib/openai.ts to generate and attach them:

export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.slice(0, 8000), // Stay within token limits
  });
  return response.data[0].embedding;
}

Then in each batch loop, after processArticleWithAI, call generateEmbedding(article.title + ' ' + aiData.summary) and attach the result to article.embedding.

Note: text-embedding-3-small produces 1536-dimensional vectors and costs $0.02 per million tokens—processing 100 article titles costs roughly $0.0001. The tradeoff in search quality vs. text-embedding-3-large (3072 dims) is minimal for news headlines.


Common Issues & Fixes

| Error | Root Cause | Fix | |---|---|---| | 429 Too Many Requests from OpenAI | Hitting RPM/TPM limits during bulk processing | Add concurrency limit + 1s delay between batches (shown above) | | CORS error calling NewsAPI from browser | NewsAPI blocks browser origins on free tier | Always call NewsAPI inside Route Handlers or Server Components | | extension "vector" does not exist | pgvector not enabled in Supabase project | Run CREATE EXTENSION IF NOT EXISTS vector; in SQL editor | | Module not found: openai in Client Component | OpenAI SDK uses Node.js APIs unavailable in browser | Move all OpenAI calls to server files; mark client files with 'use client' |

Error: 429 from OpenAI during bulk article processing

The free-tier and Tier 1 OpenAI accounts cap at 500 RPM and 200,000 TPM for gpt-4o-mini. Processing 20 articles simultaneously will breach this. The processArticlesBatch function above uses a concurrency limit of 5 with a 1-second pause—that keeps you well under limits while finishing 20 articles in about 5 seconds.

Error: NewsAPI returns CORS error in the browser console

NewsAPI sets Access-Control-Allow-Origin to localhost only on the developer plan. If you see CORS errors, you've accidentally called fetchTopHeadlines inside a Client Component or from useEffect. Move it to a Server Component, Server Action, or Route Handler.

Error: pgvector extension not found when running similarity search

Supabase projects don't enable pgvector by default on older projects. Open the Supabase dashboard → SQL Editor and run:

CREATE EXTENSION IF NOT EXISTS vector;

Then re-create your table if vector column types were rejected during initial setup.

Error: Build fails importing OpenAI SDK in a Client Component

The openai package imports node:http and other Node.js built-ins. Adding 'use client' to any file that imports openai causes a build failure. Solution: keep a clean boundary. All OpenAI logic lives in src/lib/openai.ts (server-only). If you want to expose a type from that file, move the type to a separate src/types/article.ts that has no runtime imports.


FAQ

Q: How much does it cost to summarize 100 articles per day with gpt-4o-mini?

GPT-4o-mini is priced at $0.15 per million input tokens and $0.60 per million output tokens. A typical article title + description is ~150 tokens input; a summary + tags + score is ~100 tokens output. For 100 articles daily:

| Token type | Tokens/day | Cost/day | Cost/month | |---|---|---|---| | Input (150 × 100) | 15,000 | $0.0023 | ~$0.07 | | Output (100 × 100) | 10,000 | $0.0060 | ~$0.18 | | Total | 25,000 | $0.0083 | ~$0.25 |

With the Supabase upsert deduplication strategy, you only pay for genuinely new articles. Existing ones are served from the database for free.

Q: Can I replace NewsAPI with an RSS feed parser for free tier usage?

Yes. Install rss-parser (npm install rss-parser) and replace fetchTopHeadlines with a function that reads from any public RSS endpoint:

import Parser from 'rss-parser';
const parser = new Parser();
const feed = await parser.parseURL('https://feeds.arstechnica.com/arstechnica/index');

The returned feed.items map cleanly onto your Article interface. The tradeoff: you must maintain a list of feed URLs yourself, and some feeds omit full descriptions. Combine 3-5 high-quality RSS feeds to approximate NewsAPI coverage without any API key.

Q: How do I deploy this aggregator to Vercel with environment variables?

Install the Vercel CLI (npm i -g vercel) and run from your project root:

vercel --prod \
  --env OPENAI_API_KEY=sk-... \
  --env NEWSAPI_KEY=your_key \
  --env NEXT_PUBLIC_SUPABASE_URL=https://xxx.supabase.co \
  --env NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key \
  --env SUPABASE_SERVICE_ROLE_KEY=your_service_key \
  --env NEXT_PUBLIC_SITE_URL=https://your-app.vercel.app

Alternatively, add variables in the Vercel dashboard under Settings → Environment Variables, then redeploy. Remember to set NEXT_PUBLIC_SITE_URL to your production domain—the ArticleFeed server component uses it to construct the internal API URL. Set VERCEL_URL as a fallback: process.env.NEXT_PUBLIC_SITE_URL ?? https://${process.env.VERCEL_URL}``.

Recommended Tools