Core Generative AI Concepts

Foundation Models: A foundation model is a large, general-purpose model trained on massive amounts of unlabeled data. Think of it as the “data lake” of GenAI: broad, deep, and resource-intensive to build.

For instance, a foundation model for vision might be trained on billions of images. Because training these models is extremely compute-intensive, time-consuming, and costly, most organizations choose to consume existing foundation models rather than building them from scratch.

Generative AI (GenAI): GenAI refers to systems capable of generating new content, text, images, code, or audio that mirror the patterns of their training data. As a subset of deep learning, it is typically built on top of foundation models. Instead of simply predicting a label or a value, GenAI produces complex, creative outputs.

Large Language Models (LLMs): An LLM is a specific type of GenAI focused on text. These models are trained on vast datasets, articles, books, and code, with the goal of generating coherent, human-like responses. We interact with LLMs (like Gemini) via prompts, which guide the model’s output.

Prompts and Non-Determinism A prompt is the natural-language request submitted to the model. This can include questions, instructions, or specific contextual examples.

Crucially, LLM outputs are non-deterministic by design. The same prompt can yield slightly different results each time because the model selects every token based on probability. For example, a simple "Is this a dog?" could return "Yes, it is" or "Yes, it sure is."

Tokens and the Bottom Line

In the Google Cloud ecosystem, we don't measure usage in words; we measure it in tokens. A token is the atomic unit of text (a word, a fragment, or even punctuation).

From a FinOps and performance perspective, there are three key factors to track:

Context Window: The maximum number of tokens a model can process in a single request.

Pricing Structure: Costs are typically driven by the volume of input tokens (the prompt) and output tokens (the response).

Optimization: More verbose prompts result in higher token counts and increased spend.

In short, efficient prompt design isn't just about quality, it’s a direct cost-management strategy.

‍Vertex AI: Google Cloud’s Powerhouse

At the heart of GenAI on Google Cloud is Vertex AI, a fully managed, unified platform for building and deploying machine learning models. Vertex AI abstracts the underlying infrastructure, allowing teams to focus on production use cases rather than hardware management.

Google Models on Vertex AI:

Gemini: Google’s flagship multimodal model, built for complex reasoning. (Latest: Gemini 3.0).

Gemma: An open, lightweight model optimized for resource-constrained environments.

Embedding Models: Essential for converting data into vectors for semantic search and RAG (Retrieval-Augmented Generation).

Imagen & Veo: Specialized models for high-fidelity image and video generation.

Third-Party Flexibility: Beyond Google’s native models, Vertex AI supports over 200 third-party models. This allows teams to select the most cost-effective and performant model for their specific requirements without leaving the Google Cloud environment.

Generative AI on Google Cloud- An Introduction

Generative AI has rapidly moved from a buzzword to a production reality. What started as experimental chatbots is now powering critical workloads, automating support, accelerating software development, and reshaping how products are built on Google Cloud.

Model Garden vs. Vertex AI Studio: From Selection to Execution

When working with Generative AI on Google Cloud, you will rely on two primary components of Vertex AI: Model Garden and Vertex AI Studio.

The Real Cost of Fine-Tuning: Performance, Risk, and FinOps Reality

Model Fine-tuning on Vertex AI offers different strategies, ranging from Supervised Tuning with labeled input-output pairs to Continued Pre-training using raw, unlabeled data.

Let’s push your cloud to the max