Google’s Vertex AI provides powerful tools for both generating images from text prompts and analyzing existing images using multimodal AI models. This comprehensive guide will walk you through implementing both functionalities using the correct Vertex AI APIs.
Prerequisites
Before getting started, ensure you have:
- A Google Cloud Project with Vertex AI API enabled
- Proper authentication set up (Service Account or Application Default Credentials)
- The required Python package installed:
pip install google-cloud-aiplatform
Part 1: Image Generation with Imagen 3.0
Overview
The Imagen 3.0 model (imagen-3.0-generate-002
) is Google’s state-of-the-art text-to-image generation model. It can create high-quality images from detailed text descriptions.
Implementation
import vertexai
from vertexai.preview.vision_models import ImageGenerationModel
def generate_image(project_id, location, output_file, prompt):
"""
Generate an image using Imagen 3.0 model.
Args:
project_id (str): Your Google Cloud project ID
location (str): The region where Vertex AI is available
output_file (str): Path where the generated image will be saved
prompt (str): Text description of the image to generate
Returns:
Generated images list
"""
# Initialize Vertex AI with your project details
vertexai.init(project=project_id, location=location)
# Load the pre-trained Imagen model
model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
# Generate images based on the prompt
images = model.generate_images(
prompt=prompt,
number_of_images=1, # Generate single image
seed=1, # For reproducible results
add_watermark=False, # Remove watermark
)
# Save the first generated image
images[0].save(location=output_file)
print(f"Image saved to {output_file}")
return images
# Example usage for generating a bouquet image
generate_image(
project_id="your-project-id",
location="us-central1",
output_file="bouquet.png",
prompt="Create an image containing a bouquet of 2 sunflowers and 3 roses"
)
Key Parameters Explained
number_of_images
: Controls how many variations to generate (1-8)seed
: Ensures reproducible results when set to a specific numberadd_watermark
: Set toFalse
to generate clean images without watermarks
Part 2: Image Analysis with Gemini 2.0 Flash
Overview
Gemini 2.0 Flash (gemini-2.0-flash-001
) is a multimodal model that can analyze images and generate contextual text responses. It’s perfect for creating descriptions, analyses, or creative content based on visual input.
Implementation
import vertexai
from vertexai.generative_models import GenerativeModel, Part, Image
def analyze_bouquet_image(image_path):
"""
Analyze an image and generate birthday wishes based on its content.
Args:
image_path (str): Path to the image file to analyze
"""
# Initialize Vertex AI
vertexai.init(
project='your-project-id',
location='europe-west4', # Use appropriate region
)
# Load the Gemini model
model = GenerativeModel("gemini-2.0-flash-001")
# Load the image from file
image_input = Part.from_image(Image.load_from_file(location=image_path))
# Create the analysis prompt
messages = [
"Generate a birthday wish based on the following image",
image_input
]
# Start a chat session with the model
chat = model.start_chat()
# Send the message and get response
response = chat.send_message(content=messages, stream=False)
# Extract and display the result
result_text = response.text
print(result_text)
# Save the analysis to a log file
with open("analysis_log.txt", "w") as log_file:
log_file.write(result_text)
print("✅ Log file created: analysis_log.txt")
# Run the analysis function
analyze_bouquet_image("bouquet.png")
Complete Workflow Example
Here’s how to combine both functions for a complete image generation and analysis workflow:
import vertexai
from vertexai.preview.vision_models import ImageGenerationModel
from vertexai.generative_models import GenerativeModel, Part, Image
def complete_bouquet_workflow():
"""Complete workflow: Generate image, then analyze it"""
# Configuration
project_id = "your-project-id"
location = "us-central1"
image_file = "bouquet.png"
prompt = "Create an image containing a bouquet of 2 sunflowers and 3 roses"
# Step 1: Generate the image
print("🌻 Generating bouquet image...")
vertexai.init(project=project_id, location=location)
model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
images = model.generate_images(
prompt=prompt,
number_of_images=1,
seed=1,
add_watermark=False,
)
images[0].save(location=image_file)
print(f"✅ Image saved to {image_file}")
# Step 2: Analyze the generated image
print("\n🎂 Analyzing image for birthday wishes...")
# Re-initialize for different region if needed
vertexai.init(project=project_id, location='europe-west4')
model = GenerativeModel("gemini-2.0-flash-001")
image_input = Part.from_image(Image.load_from_file(location=image_file))
messages = [
"Generate a heartfelt birthday wish based on this beautiful bouquet image. "
"Consider the types of flowers, their colors, and create a warm, personalized message.",
image_input
]
chat = model.start_chat()
response = chat.send_message(content=messages, stream=False)
result_text = response.text
print("\n🎉 Generated Birthday Wish:")
print("-" * 50)
print(result_text)
# Save results
with open("birthday_wish.txt", "w") as f:
f.write(result_text)
print(f"\n✅ Birthday wish saved to birthday_wish.txt")
# Run the complete workflow
complete_bouquet_workflow()
Advanced Features and Tips
1. Enhanced Image Generation Parameters
# More advanced image generation with additional parameters
images = model.generate_images(
prompt="Create an image containing a bouquet of 2 sunflowers and 3 roses in a crystal vase",
number_of_images=4, # Generate multiple variations
seed=42, # Custom seed for reproducibility
add_watermark=False,
guidance_scale=15, # Higher values follow prompt more closely
negative_prompt="blurry, low quality" # What to avoid
)
2. Streaming Analysis for Real-time Results
def analyze_with_streaming(image_path):
"""Analyze image with streaming response for real-time output"""
vertexai.init(project='your-project-id', location='europe-west4')
model = GenerativeModel("gemini-2.0-flash-001")
image_input = Part.from_image(Image.load_from_file(location=image_path))
messages = [
"Analyze this bouquet and create detailed birthday wishes",
image_input
]
# Enable streaming for real-time response
response_stream = model.generate_content(
contents=messages,
stream=True
)
full_response = ""
print("🎂 Generating birthday wishes...")
print("-" * 50)
for chunk in response_stream:
if chunk.text:
print(chunk.text, end="", flush=True)
full_response += chunk.text
return full_response
3. Error Handling and Best Practices
def robust_image_generation(project_id, location, output_file, prompt):
"""Image generation with comprehensive error handling"""
try:
vertexai.init(project=project_id, location=location)
model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
images = model.generate_images(
prompt=prompt,
number_of_images=1,
seed=1,
add_watermark=False,
)
if images and len(images) > 0:
images[0].save(location=output_file)
print(f"✅ Image successfully saved to {output_file}")
return output_file
else:
print("❌ No images were generated")
return None
except Exception as e:
print(f"❌ Error generating image: {str(e)}")
return None
Common Issues and Solutions
1. Authentication Problems
Ensure your service account has the necessary permissions:
- Vertex AI User
- Storage Object Admin (if saving to Cloud Storage)
2. Region Availability
Different models may be available in different regions:
- Imagen 3.0:
us-central1
,us-east4
,europe-west4
- Gemini 2.0 Flash:
us-central1
,europe-west4
,asia-southeast1
3. File Format Considerations
- Imagen outputs can be saved as PNG or JPEG
- Gemini can analyze most common image formats (PNG, JPEG, GIF, WebP)
Conclusion
This guide demonstrates the complete workflow for generating and analyzing images using Google’s Vertex AI platform. The combination of Imagen 3.0 for generation and Gemini 2.0 Flash for analysis creates powerful possibilities for creative AI applications.
Key takeaways:
- Use
ImageGenerationModel
for creating images from text - Use
GenerativeModel
with image inputs for multimodal analysis - Always handle errors gracefully in production code
- Consider regional availability when choosing locations
- Save outputs for later reference and analysis
With these tools, you can build sophisticated applications that generate, analyze, and create content based on visual inputs, opening up endless possibilities for creative AI solutions.