How to Use Gemini 2.5 Flash API in Node.js Apps 2025

Prerequisites and Setup Checklist

Before writing a single line of code, confirm your environment matches these requirements. Missing any one of them is the most common reason developers get stuck in the first five minutes.

Required Node.js version and package manager

Gemini's SDK uses top-level await patterns and modern ESM conventions internally, so you need Node.js 18 or higher. Run node -v to confirm. Any of npm, yarn, or pnpm works fine — the examples below use npm.

Getting your Gemini API key from Google AI Studio

Head to Google AI Studio and click Get API key. If you're on the free tier, you get 15 requests per minute (RPM) and 1 million tokens per day for Gemini 2.5 Flash — enough for development and light production use. The paid tier (via Google Cloud Vertex AI or AI Studio billing) removes rate limits and adds SLAs. Keep your key secret; never commit it to source control.

Installing the Google Generative AI SDK

The official package is @google/generative-ai, published by Google on npm. There's no need for any third-party wrapper.

Checklist:

  • [x] Node.js 18+ installed (node -v ≥ 18.0.0)
  • [x] npm 9+, yarn 1.22+, or pnpm 8+ available
  • [x] Gemini API key from Google AI Studio
  • [x] @google/generative-ai package installed
  • [x] dotenv package installed for secure key storage
  • [x] A .env file added to .gitignore

Note: Free-tier limits reset every minute (RPM) and every day (TPD). If you're building a product with real users, enable billing in AI Studio or move to Vertex AI before launch.

Estimated time: 25 minutes


Step 1: Initialize the Gemini Client in Node.js

Setting up the client correctly from the start saves you from cryptic authentication errors later. The pattern here — environment variable for the key, a single shared client instance — is what you'll use in every project.

Installing @google/generative-ai via npm

npm install @google/generative-ai dotenv

Storing your API key securely with dotenv

Create a .env file in your project root:

GEMINI_API_KEY=your_actual_api_key_here

Add .env to .gitignore immediately:

echo ".env" >> .gitignore

Instantiating the GenerativeModel for gemini-2.5-flash

// src/geminiClient.js
import { GoogleGenerativeAI } from '@google/generative-ai';
import 'dotenv/config';

if (!process.env.GEMINI_API_KEY) {
  throw new Error('GEMINI_API_KEY is not set in environment variables.');
}

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

// Use 'gemini-2.5-flash' — not 'gemini-2.5-flash-latest' or any other variant
export const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });

export default genAI;

The model string 'gemini-2.5-flash' is exact — a typo here produces an INVALID_ARGUMENT error that can be confusing. Export model as a named export so every other module can import it without re-instantiating the client.

Note: If you're using CommonJS (require), replace import with const { GoogleGenerativeAI } = require('@google/generative-ai') and require('dotenv').config(). The SDK supports both module systems.


Step 2: Send Your First Text Generation Request

With the client initialized, you can now send a prompt and get a completion. Understanding the response object structure upfront prevents hours of debugging undefined values later.

Calling generateContent with a plain text prompt

// src/textGeneration.js
import { model } from './geminiClient.js';

async function generateText(prompt) {
  try {
    const result = await model.generateContent(prompt);
    const response = result.response;

    // The .text() helper is the fast path — use it unless you need metadata
    const text = response.text();
    console.log('Generated text:', text);
    return text;
  } catch (error) {
    console.error('Error generating content:', error.message);
    throw error;
  }
}

// Run directly for testing
generateText('Explain the difference between TCP and UDP in two sentences.');

Parsing the response object and extracting text

The response object has more depth than .text() exposes. Here's how to traverse it manually when you need finish reasons, safety ratings, or token counts:

async function generateTextVerbose(prompt) {
  const result = await model.generateContent(prompt);
  const response = result.response;

  // Manual traversal of candidates
  const candidate = response.candidates[0];
  const finishReason = candidate.finishReason; // 'STOP', 'SAFETY', 'MAX_TOKENS', etc.
  const textContent = candidate.content.parts[0].text;

  console.log('Finish reason:', finishReason);
  console.log('Text:', textContent);

  // Usage metadata — available on the response object
  const usage = response.usageMetadata;
  console.log(`Tokens used — prompt: ${usage.promptTokenCount}, output: ${usage.candidatesTokenCount}`);

  return textContent;
}

Handling async/await vs. Promise chains

Always use async/await with try/catch rather than .then().catch() chains. The SDK throws typed errors that carry HTTP status codes, making them easy to handle granularly. You'll see specific examples of this in the Common Issues section.


Step 3: Enable Streaming Responses for Real-Time Output

For any user-facing application — a chatbot UI, a CLI tool, a writing assistant — streaming is non-negotiable. Without it, users stare at a blank screen for 3–8 seconds waiting for the full response. With streaming, the first token arrives in under 500ms.

Why streaming matters for latency-sensitive apps

generateContent waits for the entire response before returning. generateContentStream returns an async iterable that yields chunks as the model produces them. The total generation time is the same; the perceived latency is dramatically lower.

Using generateContentStream in Node.js

Piping streamed chunks to an Express response or stdout

// src/streamingServer.js
import express from 'express';
import { model } from './geminiClient.js';

const app = express();
app.use(express.json());

app.post('/generate-stream', async (req, res) => {
  const { prompt } = req.body;

  if (!prompt) {
    return res.status(400).json({ error: 'prompt is required' });
  }

  // Set headers before streaming begins
  res.setHeader('Content-Type', 'text/plain; charset=utf-8');
  res.setHeader('Transfer-Encoding', 'chunked');
  res.setHeader('X-Content-Type-Options', 'nosniff');

  try {
    const streamResult = await model.generateContentStream(prompt);

    for await (const chunk of streamResult.stream) {
      const chunkText = chunk.text();
      if (chunkText) {
        res.write(chunkText);
      }
    }

    // Final aggregated response available here if you need metadata
    const finalResponse = await streamResult.response;
    console.log('Total tokens:', finalResponse.usageMetadata?.totalTokenCount);

    res.end();
  } catch (error) {
    console.error('Streaming error:', error.message);
    if (!res.headersSent) {
      res.status(500).json({ error: 'Generation failed' });
    } else {
      res.end();
    }
  }
});

app.listen(3000, () => console.log('Server running on http://localhost:3000'));

Install Express first: npm install express. Test it with: curl -X POST http://localhost:3000/generate-stream -H 'Content-Type: application/json' -d '{"prompt":"Write a haiku about Node.js"}'

Note: Always check res.headersSent before calling res.status() inside a catch block when streaming. Once you've called res.write(), headers are committed and you can't send a 500 status — you can only call res.end().


Step 4: Build a Multi-Turn Chat Session

Single-turn generateContent calls are stateless — the model has no memory of previous exchanges. For anything conversational, you need a ChatSession, which maintains history automatically and sends it with every message.

Creating a ChatSession with startChat()

Maintaining conversation history across turns

Passing system instructions to control model behavior

// src/chatCLI.js
import readline from 'readline';
import { model } from './geminiClient.js';

const chat = model.startChat({
  systemInstruction: {
    role: 'system',
    parts: [{
      text: 'You are a senior Node.js engineer. Keep answers concise, always include code examples, and avoid unnecessary caveats.'
    }]
  },
  history: [
    {
      role: 'user',
      parts: [{ text: 'What is the event loop?' }]
    },
    {
      role: 'model',
      parts: [{ text: "The event loop is Node.js's mechanism for handling async operations. It processes the call stack, then checks the callback queue, executing queued callbacks when the stack is empty." }]
    }
  ],
  generationConfig: {
    maxOutputTokens: 1024,
    temperature: 0.7,
  }
});

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

const askQuestion = (query) => new Promise(resolve => rl.question(query, resolve));

async function runChat() {
  console.log('Node.js Expert Chat (type "exit" to quit)');
  console.log('------------------------------------------');

  while (true) {
    const userInput = await askQuestion('You: ');

    if (userInput.toLowerCase() === 'exit') {
      console.log('Goodbye!');
      rl.close();
      break;
    }

    try {
      const result = await chat.sendMessage(userInput);
      const responseText = result.response.text();
      console.log(`\nAssistant: ${responseText}\n`);
    } catch (error) {
      console.error('Error:', error.message);
    }
  }
}

runChat();

Run it with node src/chatCLI.js. The history array pre-populates context so the model already knows the conversation style. The systemInstruction field is supported in Gemini 2.5 Flash and constrains the model's persona for every turn — you don't need to repeat it in each message.

Note: The chat object holds history in memory. If your Node.js process restarts, history is lost. For persistent conversations, serialize chat.getHistory() to a database (Redis, Postgres, Firestore) and pass it back into startChat({ history: [...] }) on the next session.


Step 5: Add Multimodal Input (Image + Text)

Gemini 2.5 Flash is natively multimodal, meaning you can send images, audio, and video alongside text in a single API call. The most common use case in Node.js backend apps is reading a local file, encoding it, and asking the model to analyze it.

Reading a local image file as base64 in Node.js

Constructing an inlineData part with mimeType

Combining image and text parts in one generateContent call

// src/multimodal.js
import fs from 'fs';
import path from 'path';
import { model } from './geminiClient.js';

async function analyzeImage(imagePath, textPrompt) {
  // Read the image file and encode as base64
  const absolutePath = path.resolve(imagePath);
  const imageBuffer = fs.readFileSync(absolutePath);
  const base64String = imageBuffer.toString('base64');

  // Determine MIME type from extension
  const ext = path.extname(imagePath).toLowerCase();
  const mimeTypeMap = {
    '.png': 'image/png',
    '.jpg': 'image/jpeg',
    '.jpeg': 'image/jpeg',
    '.gif': 'image/gif',
    '.webp': 'image/webp'
  };
  const mimeType = mimeTypeMap[ext] || 'image/png';

  // Build the parts array — order matters: image first, then text
  const parts = [
    {
      inlineData: {
        mimeType: mimeType,
        data: base64String
      }
    },
    {
      text: textPrompt
    }
  ];

  try {
    const result = await model.generateContent(parts);
    const response = result.response;
    const text = response.text();
    console.log('Analysis result:', text);
    return text;
  } catch (error) {
    console.error('Multimodal error:', error.message);
    throw error;
  }
}

// Example usage
analyzeImage('./screenshot.png', 'Describe what you see in this image and identify any UI issues.');

For large images (over 1MB), prefer using the File API (genAI.getGenerativeModel + fileManager.uploadFile()) instead of inline base64. Base64 inflates payload size by ~33% and counts against your request body limits. The File API handles files up to 2GB and caches them for 48 hours, which is more efficient for repeated analysis.

Note: Gemini 2.5 Flash supports up to 3,600 image frames per request, making it practical for video frame analysis too. The 1M token context window can hold roughly 1,500 medium-resolution images.


Common Issues and Fixes

These are the errors you'll actually hit during development, not hypothetical edge cases.

Error: 429 Resource Exhausted

Cause: You've exceeded the free tier rate limit of 15 RPM or the per-day token quota.

Fix: Implement exponential backoff with jitter. Don't just retry immediately — that makes the problem worse.

async function generateWithRetry(prompt, maxRetries = 4) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        // Exponential backoff: 1s, 2s, 4s, 8s + random jitter
        const delay = Math.pow(2, attempt) * 1000 + Math.random() * 500;
        console.warn(`Rate limited. Retrying in ${Math.round(delay)}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}

Error: INVALID_ARGUMENT — model not found

Cause: The model string passed to getGenerativeModel is incorrect. Common mistakes: 'gemini-2.5-flash-latest', 'gemini-2.5-flash-001', 'gemini-flash-2.5'.

Fix: Use the exact string 'gemini-2.5-flash'. For Gemini 2.0 Flash, use 'gemini-2.0-flash'. Check the official model list if you're unsure.

| Error Code | Cause | Fix | |---|---|---| | 429 RESOURCE_EXHAUSTED | Rate limit exceeded (free tier: 15 RPM) | Exponential backoff; upgrade to paid tier | | 400 INVALID_ARGUMENT | Wrong model string or malformed request | Verify model ID: 'gemini-2.5-flash' | | undefined from .text() | Safety filter blocked the response | Check response.promptFeedback.blockReason | | CORS error | SDK called from browser bundle | Move API calls server-side; never expose API key in frontend | | 503 SERVICE_UNAVAILABLE | Temporary Google outage | Retry with backoff; check status.cloud.google.com |

Error: response.text() returns undefined

Cause: The model's safety filters blocked the response. candidates may be empty or finishReason is 'SAFETY'.

Fix: Check promptFeedback before calling .text():

async function safeGenerate(prompt) {
  const result = await model.generateContent(prompt);
  const response = result.response;

  // Check for safety blocks
  if (response.promptFeedback?.blockReason) {
    console.error('Blocked:', response.promptFeedback.blockReason);
    return null;
  }

  const candidate = response.candidates?.[0];
  if (!candidate || candidate.finishReason === 'SAFETY') {
    console.error('Response filtered. Safety ratings:', candidate?.safetyRatings);
    return null;
  }

  return response.text();
}

Error: CORS errors in the browser

Cause: You've accidentally bundled your Node.js Gemini code into a frontend build (Webpack, Vite, etc.) and the browser is trying to call the Gemini API directly, exposing your API key.

Fix: The @google/generative-ai SDK is server-side only in this context. Your Express server makes the Gemini calls; your frontend calls your Express endpoints. Never pass GEMINI_API_KEY to a browser environment variable (VITE_, REACT_APP_, etc.).


FAQ

Q: What is the context window size for Gemini 2.5 Flash?

Gemini 2.5 Flash has a 1 million token context window, which is one of the largest available from any production API. In practical terms, that's roughly 750,000 words of text, around 1,500 medium-resolution images, or about an hour of video. The output limit is 8,192 tokens by default, though this can be adjusted via generationConfig.maxOutputTokens. For most applications, you'll hit rate limits long before you hit context limits.

Q: How does Gemini 2.5 Flash differ from Gemini 2.5 Pro for API use?

Gemini 2.5 Flash is optimized for speed and cost — it's significantly cheaper per million tokens and returns responses faster, making it the right choice for high-throughput applications, chatbots, and real-time features. Gemini 2.5 Pro offers stronger reasoning, better performance on complex coding tasks, and higher accuracy on benchmarks, at a higher cost and with more latency. The rule of thumb: start with Flash, switch to Pro only if your evals show quality gaps on your specific use case. Both share the 1M token context window.

Q: Can I use the Gemini API for free in production Node.js apps?

Technically yes, but with significant constraints. The free tier gives you 15 RPM, 1 million TPM (tokens per minute), and 1,500 requests per day for Gemini 2.5 Flash as of mid-2025. For a low-traffic internal tool or a side project, that's workable. For any app with real users, you should enable billing through Google AI Studio — the pay-as-you-go pricing for Gemini 2.5 Flash is among the most competitive in the market. Check the Google AI Studio pricing page for current rates, as they change frequently.