미분류

Complete Guide to Google Vertex AI Image Generation and Analysis

0
Please log in or register to do it.

Google’s Vertex AI provides powerful tools for both generating images from text prompts and analyzing existing images using multimodal AI models. This comprehensive guide will walk you through implementing both functionalities using the correct Vertex AI APIs.

Prerequisites

Before getting started, ensure you have:

  • A Google Cloud Project with Vertex AI API enabled
  • Proper authentication set up (Service Account or Application Default Credentials)
  • The required Python package installed:
pip install google-cloud-aiplatform

Part 1: Image Generation with Imagen 3.0

Overview

The Imagen 3.0 model (imagen-3.0-generate-002) is Google’s state-of-the-art text-to-image generation model. It can create high-quality images from detailed text descriptions.

Implementation

import vertexai
from vertexai.preview.vision_models import ImageGenerationModel

def generate_image(project_id, location, output_file, prompt):
    """
    Generate an image using Imagen 3.0 model.
    
    Args:
        project_id (str): Your Google Cloud project ID
        location (str): The region where Vertex AI is available
        output_file (str): Path where the generated image will be saved
        prompt (str): Text description of the image to generate
    
    Returns:
        Generated images list
    """
    # Initialize Vertex AI with your project details
    vertexai.init(project=project_id, location=location)
    
    # Load the pre-trained Imagen model
    model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
    
    # Generate images based on the prompt
    images = model.generate_images(
        prompt=prompt,
        number_of_images=1,  # Generate single image
        seed=1,              # For reproducible results
        add_watermark=False, # Remove watermark
    )
    
    # Save the first generated image
    images[0].save(location=output_file)
    print(f"Image saved to {output_file}")
    
    return images

# Example usage for generating a bouquet image
generate_image(
    project_id="your-project-id",
    location="us-central1",
    output_file="bouquet.png",
    prompt="Create an image containing a bouquet of 2 sunflowers and 3 roses"
)

Key Parameters Explained

  • number_of_images: Controls how many variations to generate (1-8)
  • seed: Ensures reproducible results when set to a specific number
  • add_watermark: Set to False to generate clean images without watermarks

Part 2: Image Analysis with Gemini 2.0 Flash

Overview

Gemini 2.0 Flash (gemini-2.0-flash-001) is a multimodal model that can analyze images and generate contextual text responses. It’s perfect for creating descriptions, analyses, or creative content based on visual input.

Implementation

import vertexai
from vertexai.generative_models import GenerativeModel, Part, Image

def analyze_bouquet_image(image_path):
    """
    Analyze an image and generate birthday wishes based on its content.
    
    Args:
        image_path (str): Path to the image file to analyze
    """
    # Initialize Vertex AI
    vertexai.init(
        project='your-project-id',
        location='europe-west4',  # Use appropriate region
    )
    
    # Load the Gemini model
    model = GenerativeModel("gemini-2.0-flash-001")
    
    # Load the image from file
    image_input = Part.from_image(Image.load_from_file(location=image_path))
    
    # Create the analysis prompt
    messages = [
        "Generate a birthday wish based on the following image",
        image_input
    ]
    
    # Start a chat session with the model
    chat = model.start_chat()
    
    # Send the message and get response
    response = chat.send_message(content=messages, stream=False)
    
    # Extract and display the result
    result_text = response.text
    print(result_text)
    
    # Save the analysis to a log file
    with open("analysis_log.txt", "w") as log_file:
        log_file.write(result_text)
    
    print("✅ Log file created: analysis_log.txt")

# Run the analysis function
analyze_bouquet_image("bouquet.png")

Complete Workflow Example

Here’s how to combine both functions for a complete image generation and analysis workflow:

import vertexai
from vertexai.preview.vision_models import ImageGenerationModel
from vertexai.generative_models import GenerativeModel, Part, Image

def complete_bouquet_workflow():
    """Complete workflow: Generate image, then analyze it"""
    
    # Configuration
    project_id = "your-project-id"
    location = "us-central1"
    image_file = "bouquet.png"
    prompt = "Create an image containing a bouquet of 2 sunflowers and 3 roses"
    
    # Step 1: Generate the image
    print("🌻 Generating bouquet image...")
    vertexai.init(project=project_id, location=location)
    
    model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
    images = model.generate_images(
        prompt=prompt,
        number_of_images=1,
        seed=1,
        add_watermark=False,
    )
    
    images[0].save(location=image_file)
    print(f"✅ Image saved to {image_file}")
    
    # Step 2: Analyze the generated image
    print("\n🎂 Analyzing image for birthday wishes...")
    
    # Re-initialize for different region if needed
    vertexai.init(project=project_id, location='europe-west4')
    
    model = GenerativeModel("gemini-2.0-flash-001")
    image_input = Part.from_image(Image.load_from_file(location=image_file))
    
    messages = [
        "Generate a heartfelt birthday wish based on this beautiful bouquet image. "
        "Consider the types of flowers, their colors, and create a warm, personalized message.",
        image_input
    ]
    
    chat = model.start_chat()
    response = chat.send_message(content=messages, stream=False)
    
    result_text = response.text
    print("\n🎉 Generated Birthday Wish:")
    print("-" * 50)
    print(result_text)
    
    # Save results
    with open("birthday_wish.txt", "w") as f:
        f.write(result_text)
    
    print(f"\n✅ Birthday wish saved to birthday_wish.txt")

# Run the complete workflow
complete_bouquet_workflow()

Advanced Features and Tips

1. Enhanced Image Generation Parameters

# More advanced image generation with additional parameters
images = model.generate_images(
    prompt="Create an image containing a bouquet of 2 sunflowers and 3 roses in a crystal vase",
    number_of_images=4,           # Generate multiple variations
    seed=42,                      # Custom seed for reproducibility
    add_watermark=False,
    guidance_scale=15,            # Higher values follow prompt more closely
    negative_prompt="blurry, low quality"  # What to avoid
)

2. Streaming Analysis for Real-time Results

def analyze_with_streaming(image_path):
    """Analyze image with streaming response for real-time output"""
    vertexai.init(project='your-project-id', location='europe-west4')
    
    model = GenerativeModel("gemini-2.0-flash-001")
    image_input = Part.from_image(Image.load_from_file(location=image_path))
    
    messages = [
        "Analyze this bouquet and create detailed birthday wishes",
        image_input
    ]
    
    # Enable streaming for real-time response
    response_stream = model.generate_content(
        contents=messages,
        stream=True
    )
    
    full_response = ""
    print("🎂 Generating birthday wishes...")
    print("-" * 50)
    
    for chunk in response_stream:
        if chunk.text:
            print(chunk.text, end="", flush=True)
            full_response += chunk.text
    
    return full_response

3. Error Handling and Best Practices

def robust_image_generation(project_id, location, output_file, prompt):
    """Image generation with comprehensive error handling"""
    try:
        vertexai.init(project=project_id, location=location)
        model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
        
        images = model.generate_images(
            prompt=prompt,
            number_of_images=1,
            seed=1,
            add_watermark=False,
        )
        
        if images and len(images) > 0:
            images[0].save(location=output_file)
            print(f"✅ Image successfully saved to {output_file}")
            return output_file
        else:
            print("❌ No images were generated")
            return None
            
    except Exception as e:
        print(f"❌ Error generating image: {str(e)}")
        return None

Common Issues and Solutions

1. Authentication Problems

Ensure your service account has the necessary permissions:

  • Vertex AI User
  • Storage Object Admin (if saving to Cloud Storage)

2. Region Availability

Different models may be available in different regions:

  • Imagen 3.0: us-central1, us-east4, europe-west4
  • Gemini 2.0 Flash: us-central1, europe-west4, asia-southeast1

3. File Format Considerations

  • Imagen outputs can be saved as PNG or JPEG
  • Gemini can analyze most common image formats (PNG, JPEG, GIF, WebP)

Conclusion

This guide demonstrates the complete workflow for generating and analyzing images using Google’s Vertex AI platform. The combination of Imagen 3.0 for generation and Gemini 2.0 Flash for analysis creates powerful possibilities for creative AI applications.

Key takeaways:

  • Use ImageGenerationModel for creating images from text
  • Use GenerativeModel with image inputs for multimodal analysis
  • Always handle errors gracefully in production code
  • Consider regional availability when choosing locations
  • Save outputs for later reference and analysis

With these tools, you can build sophisticated applications that generate, analyze, and create content based on visual inputs, opening up endless possibilities for creative AI solutions.

네이버 예약 데이터 구조 분석 - 실제 데이터로 보는 자동화 가능성

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다