In the world of AI, where pixels respond to human instructions, Apple’s Pico Banana 400k dataset advances text-guided image editing. It provides nearly 400,000 text-image-edit triplets to train AI models for precise creative control. For researchers in multimodal AI or developers building editing tools, this dataset supports innovation.

Released under Apple’s open research, Pico Banana 400k uses the Open Images library, Gemini-generated prompts, and Nano-Banana edits verified by AI. It’s a benchmark with successes, failures, and multi-turn conversations for fine-tuning and preference learning. Here’s what makes Pico Banana 400k essential for instruction-aware image manipulation.

What Is Pico Banana 400k?

Pico Banana 400k is a dataset for text-guided image editing—transforming photos with natural language. Examples include adjusting brightness in a landscape or replacing a lion with butterflies in a savanna. It includes ~400K high-res images (512–1024 pixels) from Open Images, covering humans, objects, and text.

The name “Pico Banana” references precise edits and creative twists. It includes three types:

  • Single-Turn SFT Samples: ~257K triplets for one-shot changes.
  • Preference Learning Samples: ~56K pairs of positive and negative edits.
  • Multi-Turn SFT Samples: ~72K chains for iterative editing.

It covers 35 operations in eight categories, from photometric shifts to stylistic changes.

Key Features of Pico Banana 400k

Pico Banana 400k is crafted with care. Gemini-2.5-Flash generates prompts, Nano-Banana edits, and Gemini-2.5-Pro evaluates with a scorecard: 40% compliance, 25% realism, 20% preservation, 15% quality.

Edit categories:

CategoryDescriptionShare
Object-Level SemanticAdd/remove/replace/relocate objects35%
Scene CompositionLighting or environmental changes20%
Human-CentricOutfit or pose adjustments18%
StylisticArtistic styles like oil painting10%
Text & SymbolEdit billboards or graffiti8%
Pixel & PhotometricContrast or color adjustments5%
Scale & PerspectiveZooms or viewpoint changes2%
Spatial/LayoutExpansions or rearrangements2%

Prompts are relatable, like “Replace the red apple with a green one.” Failure cases improve model robustness. High resolution, diversity, and quality scores (~0.7 minimum) ensure reliability.

How Pico Banana 400k Is Built

It starts with Open Images’ 9M+ photos (CC BY 2.0). Gemini-2.5-Flash creates prompts, Qwen-2.5-Instruct-7B summarizes, Nano-Banana edits, and Gemini-2.5-Pro scores. Successful edits go to SFT, failures to preferences, and chains to multi-turn.

The JSONL format includes manifests for downloads. A Python script handles Open Images URL mapping.

Applications of Pico Banana 400k

Pico Banana 400k supports:

  • Editing Models: Fine-tune diffusion models for accuracy.
  • Conversational Tools: Enable iterative changes.
  • Training: Use pairs for reward modeling.
  • Benchmarking: Test compliance and quality.

It aids photographers, marketers, and educators. Early results show reduced artifacts and better intent adherence.

Getting Started with Pico Banana 400k

Hosted on Apple’s CDN. Visit the GitHub repo.

  1. Download Files:
  1. Source Images:
  • Use URLs or AWS CLI for TARs:
    aws s3 --no-sign-request --endpoint-url https://s3.amazonaws.com cp s3://open-images-dataset/tar/train_0.tar.gz . aws s3 --no-sign-request --endpoint-url https://s3.amazonaws.com cp s3://open-images-dataset/tar/train_1.tar.gz . mkdir openimage_source_images tar -xvzf train_0.tar.gz -C openimage_source_images tar -xvzf train_1.tar.gz -C openimage_source_images wget https://storage.googleapis.com/openimages/2018_04/train/train-images-boxable-with-rotation.csv python map_openimage_url_to_local.py
  1. Load Data: Parse JSONLs for training.

Licensing and Ethics

CC BY-NC-ND 4.0 for non-commercial use. Source images: CC BY 2.0. Prompts avoid toxicity. See the arXiv paper.

Also Read

Categorized in: