How to Generate Accurate Numbers in AI Images Using Underdrawings with Gemini 3.0 Pro

How to Generate Accurate Numbers in AI Images Using Underdrawings with Gemini 3.0 Pro

Generating images with AI models has become increasingly powerful, but one persistent problem plagues most image generation pipelines: accurate text and number rendering. Whether you're building a game board generator, creating numbered diagrams, or designing sequential layouts, state-of-the-art image models like Gemini 3.0 Pro and ChatGPT-Images consistently fail at maintaining correct numbering sequences and text placement.

The solution? A hybrid technique called the "underdrawing method" that combines deterministic SVG/HTML rendering with generative AI for stunning, mathematically accurate results.

Why Standard Image Generation Fails at Numbers

Image generation models excel at creating photorealistic scenes and stylized visuals, but they're fundamentally unreliable with discrete, sequential data. When you ask Gemini 3.0 Pro or ChatGPT-Images to generate a spiral game board with numbered stepping stones (1-50), the model will:

  • Skip numbers or repeat them
  • Place numbers in wrong positions
  • Reverse digit order (generating "51" instead of "15")
  • Fail to maintain spiral geometry

This isn't a limitation of the model's visual capabilities—it's an architectural weakness. Image models learn statistical patterns from training data, not deterministic rules. Numbers require perfect precision.

The Underdrawing Method: Layer 1 (Deterministic)

The underdrawing approach separates concerns between what machines do best:

Use deterministic tools for text and positioning. SVG, HTML, Python visualizations, or Mermaid diagrams excel at precise layout. They guarantee correct number sequencing, positioning, and orientation because they follow algorithmic rules, not statistical patterns.

Here's a concrete example for Layer 1:

import svg
from math import sin, cos, pi

# Generate SVG with precise number placement
def create_spiral_underdrawing(num_stones=50):
    svg_content = '<svg width="1000" height="1000" xmlns="http://www.w3.org/2000/svg">'
    
    for i in range(1, num_stones + 1):
        # Calculate spiral position (counter-clockwise inward)
        angle = (i / num_stones) * 4 * pi
        radius = 400 * (1 - i / num_stones)
        x = 500 + radius * cos(angle)
        y = 500 + radius * sin(angle)
        
        # Draw stone placeholder (deterministic shape)
        stone_type = ['circle', 'square', 'triangle', 'hexagon'][i % 4]
        svg_content += f'<circle cx="{x}" cy="{y}" r="25" fill="none" stroke="black" stroke-width="2"/>'
        
        # Add precise number
        svg_content += f'<text x="{x}" y="{y}" text-anchor="middle" dominant-baseline="middle" font-size="20" font-weight="bold">{i}</text>'
    
    svg_content += '</svg>'
    return svg_content

# Export as PNG
underdrawing_svg = create_spiral_underdrawing()
# Use headless browser or ImageMagick to convert SVG → PNG

This generates a precise underdrawing image where every number is mathematically positioned and guaranteed correct. The visual quality is minimal—black lines and text on white background—but the data integrity is perfect.

The Underdrawing Method: Layer 2 (Generative)

Now pass your deterministic underdrawing to Gemini 3.0 Pro as an input image along with your stylistic prompt. The model "paints over" your precise outline:

Prompt for Gemini 3.0 Pro:

Transform this numbered spiral diagram into a photographed claymation diorama of assorted artisan chocolates and candies, each representing a stepping stone game board. Arrange them in a counter-clockwise spiral inward pattern from start (1) at the outside to finish (50) at the center. Viewed from a low-angle tilted perspective. Studio lighting, soft bokeh background, candy-bright colors.

Maintain the exact number positions and sequences from the input image.

Gemini 3.0 Pro will use your underdrawing as a constraint guide, respecting the number positions while applying photorealistic rendering, lighting, and style. The result: perfect numbers with stunning visual quality.

Step-by-Step Implementation Guide

| Step | Tool | Task | Output | |------|------|------|--------| | 1 | Python/SVG | Define layout algorithm (spiral, grid, tree, etc.) | Underdrawing PNG | | 2 | Code generation (Claude Code/Codex) | Auto-generate SVG from specifications | SVG template | | 3 | Headless browser/ImageMagick | Convert SVG to rasterized image | Precise PNG with numbers | | 4 | Gemini 3.0 Pro API | Pass underdrawing + stylistic prompt | Final polished image | | 5 | Optional: Iterate | Refine prompt if styling needs adjustment | Publication-ready image |

Working Code Example

Here's a minimal implementation combining both layers:

import anthropic
import base64
from pathlib import Path

def generate_accurate_game_board(num_stones: int, style_prompt: str):
    # Layer 1: Generate deterministic underdrawing (SVG → PNG)
    underdrawing_png = create_spiral_underdrawing(num_stones)  # From earlier example
    
    # Encode image to base64
    with open(underdrawing_png, 'rb') as f:
        image_data = base64.standard_b64encode(f.read()).decode('utf-8')
    
    # Layer 2: Pass to Gemini 3.0 Pro for style application
    client = anthropic.Anthropic()
    
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",  # Using Claude as example; adapt for Gemini
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": f"Transform this numbered diagram into: {style_prompt}. Maintain exact number positions and sequences."
                    }
                ],
            }
        ],
    )
    
    return message.content[0].text

# Usage
style = "A claymation diorama of artisan chocolates arranged as a spiral game board, studio-lit, candy-bright colors, soft bokeh"
result = generate_accurate_game_board(50, style)

When to Use Underdrawings: Developer Scenarios

Ideal use cases:

  • Game board generators with numbered tiles
  • Educational diagrams with sequential numbering
  • Process flow visualizations with step labels
  • Data visualizations requiring precise axis labels
  • UI mockups with specific text content
  • Numbered flowcharts and technical diagrams

Not ideal for:

  • Freeform creative images without precision requirements
  • Simple scenes where numbers aren't critical
  • Applications requiring real-time generation (multi-step process adds latency)

Important Limitations

The underdrawing method is powerful but imperfect:

  • Not 100% guaranteed: Complex stylistic prompts may still cause drift from the underdrawing
  • Multi-step latency: Requires two separate generation steps
  • Tool dependency: Requires a multi-modal model supporting image+text input (Gemini 3.0 Pro, Claude Vision, etc.)
  • Styling conflicts: Extremely detailed style prompts may override number placement

Troubleshooting Common Issues

Problem: Model still generates incorrect numbers despite underdrawing Solution: Strengthen your second-layer prompt. Add explicit instruction: "Preserve every number from the input image exactly as shown."

Problem: Numbers are visible but obscured by styling Solution: Adjust underdrawing contrast. Use high-contrast black text on white background. Add explicit spacing around numbers in the SVG.

Problem: Performance is too slow Solution: Reduce image resolution for underdrawing generation, or batch process multiple boards in parallel using async API calls.

Conclusion

The underdrawing technique solves a fundamental limitation of current image generation models: reliable text and number rendering. By separating deterministic positioning from generative styling, developers can now create publication-quality images with perfect accuracy that would otherwise be impossible.

For developers building educational software, games, or data visualization tools, this hybrid approach should become part of your standard toolkit—especially as teams like Google and Anthropic continue optimizing their vision-language models.

Recommended Tools

  • VercelDeploy web apps at the speed of inspiration