Close Menu
    What's Hot

    Goodbye AI Cluster Bills. Exo Runs AI on Your Own Devices

    December 31, 2025

    Cloudflare Speed Test CLI: Boost Your Network Diagnostics in Seconds

    December 30, 2025

    TuxMate: The Ultimate Linux Bulk App Installer for Streamlined Setup

    December 30, 2025
    Facebook X (Twitter) Instagram Threads
    Geniotimes
    • Android
    • AI
    • CLI
    • Gittool
    • Automation
    • UI
    Facebook X (Twitter) Instagram
    Subscribe
    Geniotimes
    Home»AI»Pico Banana 400k: Apple’s Dataset for Text Guided Image Editing

    Pico Banana 400k: Apple’s Dataset for Text Guided Image Editing

    geniotimesmdBy geniotimesmdNovember 1, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Pico Banana 400k
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    In the world of AI, where pixels respond to human instructions, Apple’s Pico Banana 400k dataset advances text-guided image editing. It provides nearly 400,000 text-image-edit triplets to train AI models for precise creative control. For researchers in multimodal AI or developers building editing tools, this dataset supports innovation.

    Released under Apple’s open research, Pico Banana 400k uses the Open Images library, Gemini-generated prompts, and Nano-Banana edits verified by AI. It’s a benchmark with successes, failures, and multi-turn conversations for fine-tuning and preference learning. Here’s what makes Pico Banana 400k essential for instruction-aware image manipulation.

    What Is Pico Banana 400k?

    Pico Banana 400k is a dataset for text-guided image editing—transforming photos with natural language. Examples include adjusting brightness in a landscape or replacing a lion with butterflies in a savanna. It includes ~400K high-res images (512–1024 pixels) from Open Images, covering humans, objects, and text.

    The name “Pico Banana” references precise edits and creative twists. It includes three types:

    • Single-Turn SFT Samples: ~257K triplets for one-shot changes.
    • Preference Learning Samples: ~56K pairs of positive and negative edits.
    • Multi-Turn SFT Samples: ~72K chains for iterative editing.

    It covers 35 operations in eight categories, from photometric shifts to stylistic changes.

    Key Features of Pico Banana 400k

    Pico Banana 400k is crafted with care. Gemini-2.5-Flash generates prompts, Nano-Banana edits, and Gemini-2.5-Pro evaluates with a scorecard: 40% compliance, 25% realism, 20% preservation, 15% quality.

    Edit categories:

    CategoryDescriptionShare
    Object-Level SemanticAdd/remove/replace/relocate objects35%
    Scene CompositionLighting or environmental changes20%
    Human-CentricOutfit or pose adjustments18%
    StylisticArtistic styles like oil painting10%
    Text & SymbolEdit billboards or graffiti8%
    Pixel & PhotometricContrast or color adjustments5%
    Scale & PerspectiveZooms or viewpoint changes2%
    Spatial/LayoutExpansions or rearrangements2%

    Prompts are relatable, like “Replace the red apple with a green one.” Failure cases improve model robustness. High resolution, diversity, and quality scores (~0.7 minimum) ensure reliability.

    How Pico Banana 400k Is Built

    It starts with Open Images’ 9M+ photos (CC BY 2.0). Gemini-2.5-Flash creates prompts, Qwen-2.5-Instruct-7B summarizes, Nano-Banana edits, and Gemini-2.5-Pro scores. Successful edits go to SFT, failures to preferences, and chains to multi-turn.

    The JSONL format includes manifests for downloads. A Python script handles Open Images URL mapping.

    Applications of Pico Banana 400k

    Pico Banana 400k supports:

    • Editing Models: Fine-tune diffusion models for accuracy.
    • Conversational Tools: Enable iterative changes.
    • Training: Use pairs for reward modeling.
    • Benchmarking: Test compliance and quality.

    It aids photographers, marketers, and educators. Early results show reduced artifacts and better intent adherence.

    Getting Started with Pico Banana 400k

    Hosted on Apple’s CDN. Visit the GitHub repo.

    1. Download Files:
    • Single-turn: Manifest | JSONL
    • Preferences: Manifest | JSONL
    • Multi-turn: Manifest | JSONL
    1. Load Data: Parse JSONLs for training.

    Licensing and Ethics

    CC BY-NC-ND 4.0 for non-commercial use. Source images: CC BY 2.0. Prompts avoid toxicity. See the arXiv paper.

    Also Read

    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    geniotimesmd
    • Website

    Related Posts

    Goodbye AI Cluster Bills. Exo Runs AI on Your Own Devices

    December 31, 2025

    Stop AI Scraping on Your Blog: Protect Your Content with Fuzzy Canary

    December 25, 2025

    Gemini Conductor CLI for AI-Driven Development

    December 25, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Download LineageOS 22 (Android 15): Official and Unofficial Supported Devices

    September 25, 2025128 Views

    Best React Bits Alternative for Stunning UI Components

    September 24, 202572 Views

    Uiverse.io: The Best React Bits Alternative for Open Source UI Components

    October 14, 202534 Views
    © 2026Copyright Geniotimes. All Rights Reserved. Geniotimes.
    • About Us
    • Privacy Policy
    • Terms of Use
    • Contact Us
    • Disclaimer
    • Our Authors

    Type above and press Enter to search. Press Esc to cancel.