Skip to main content

2 posts tagged with "Embeddings"

View All Tags

A Step-by-Step Guide to Building Your First AI Search Engine

· 9 min read

This is a step-by-step guide to building your first AI search engine using Cloudflare's Vectorize and Workers AI. It covers everything from setting up the environment to querying the vector database, with clear explanations and runnable code examples.


You'll see both paths:


  1. Manual embeddings (Euclidean, 32‑dim vectors) for learning and quick demos.
  2. AI embeddings (BGE Base, 768‑dim, cosine) using Workers AI, for real-world semantic search.

By the end, you’ll be able to seed your own index, query it via API, and understand exactly what’s going on.



What is Cloudflare Vectorize?


Definition:


A globally distributed vector database for AI-powered apps, tightly integrated with Cloudflare Workers.


Use cases:


  1. Semantic search
  2. Recommendations
  3. Anomaly detection
  4. LLM context support

Key Features of Vectorize


  1. Globally distributed, no additional infrastructure needed
  2. Store embeddings via Workers AI or external models
  3. Connect search results back to content in R2, KV, D1 — all within Workers

Meet Cloudflare Vectorize (Fun Version)


Think of Cloudflare Vectorize as your app’s super-powered librarian — except this one lives everywhere in the world at once, never sleeps, and can find what you want faster than you can say “AI.”


Instead of just searching for the exact words you type, Vectorize understands meaning.
It can match your "cute dog" search to a picture of a fluffy golden retriever, or "relaxing music" to an audio clip that feels like a spa day.

Real-World Uses


  1. Shopping → “Find me shoes like these” (and it actually gets it right).
  2. Customer Service → Instantly suggest relevant help articles before you even finish typing your problem.
  3. Streaming → Recommend movies that actually match your vibe, not just “because you watched one rom-com in 2018.”

The Cool Part


All this runs on Cloudflare’s global network, so your search results pop up in milliseconds,
even if your user is sipping coffee in Paris while your data’s hanging out in Tokyo.


Why Developers Love It


  1. No extra servers
  2. No complicated setup
  3. Just plug it into Cloudflare Workers, toss in your AI-generated “embeddings”
    (fancy word for math-y fingerprints of text, images, or audio)
    and you’ve got instant, intelligent search.

In short:


It’s like giving your app a brain, without giving yourself a headache.


Getting Started: Overview of Steps


Steps overview (from “Get started” docs):

  1. Create Worker
  2. Create Vectorize Index
  3. Bind Worker to Index
  4. (Optional) Add metadata
  5. Insert & query vectors
  6. Deploy & test

Step 1 – Create a Cloudflare Vectorize Index


First, ensure you have Wrangler installed:


npm install -g wrangler

Path A — Manual Embeddings (Euclidean)


Manual vectors are perfect for understanding how vector search works—no AI needed.

Create an index (Euclidean)


# 32 dimensions, Euclidean distance
wrangler vectorize create youtube-index --dimensions=32 --metric=euclidean

Worker code: insert & query (32 dims)


// src/index.ts (Manual demo)
// Run this as a separate Worker or bind to a different index than your AI demo.

export interface Env {
VECTORIZE: Vectorize; // bound to the 32-dim "youtube-index"
}

const sampleVectors: Array<VectorizeVector> = [
{
id: "1",
values: [
0.12, 0.45, 0.67, 0.89, 0.23, 0.56, 0.34, 0.78,
0.12, 0.90, 0.24, 0.67, 0.89, 0.35, 0.48, 0.70,
0.22, 0.58, 0.74, 0.33, 0.88, 0.66, 0.45, 0.27,
0.81, 0.54, 0.39, 0.76, 0.41, 0.29, 0.83, 0.55
],
metadata: { url: "/products/sku/13913913" },
},
{
id: "2",
values: [
0.14, 0.23, 0.36, 0.51, 0.62, 0.47, 0.59, 0.74,
0.33, 0.89, 0.41, 0.53, 0.68, 0.29, 0.77, 0.45,
0.24, 0.66, 0.71, 0.34, 0.86, 0.57, 0.62, 0.48,
0.78, 0.52, 0.37, 0.61, 0.69, 0.28, 0.80, 0.53
],
metadata: { url: "/products/sku/10148191" },
},
{
id: "3",
values: [
0.21, 0.33, 0.55, 0.67, 0.80, 0.22, 0.47, 0.63,
0.31, 0.74, 0.35, 0.53, 0.68, 0.45, 0.55, 0.70,
0.28, 0.64, 0.71, 0.30, 0.77, 0.60, 0.43, 0.39,
0.85, 0.55, 0.31, 0.69, 0.52, 0.29, 0.72, 0.48
],
metadata: { url: "/products/sku/97913813" },
},
{
id: "4",
values: [
0.17, 0.29, 0.42, 0.57, 0.64, 0.38, 0.51, 0.72,
0.22, 0.85, 0.39, 0.66, 0.74, 0.32, 0.53, 0.48,
0.21, 0.69, 0.77, 0.34, 0.80, 0.55, 0.41, 0.29,
0.70, 0.62, 0.35, 0.68, 0.53, 0.30, 0.79, 0.49
],
metadata: { url: "/products/sku/418313" },
},
{
id: "5",
values: [
0.11, 0.46, 0.68, 0.82, 0.27, 0.57, 0.39, 0.75,
0.16, 0.92, 0.28, 0.61, 0.85, 0.40, 0.49, 0.67,
0.19, 0.58, 0.76, 0.37, 0.83, 0.64, 0.53, 0.30,
0.77, 0.54, 0.43, 0.71, 0.36, 0.26, 0.80, 0.53
],
metadata: { url: "/products/sku/55519183" },
},
];

const DIMENSIONS = sampleVectors[0].values.length; // 32

export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
const path = url.pathname;

if (path === "/insert") {
const inserted = await env.VECTORIZE.insert(sampleVectors);
return Response.json({ ok: true, inserted });
}

if (path === "/query") {
// Demo vector that should be closest to id=4
const query = [
0.13, 0.25, 0.44, 0.53, 0.62, 0.41, 0.59, 0.68,
0.29, 0.82, 0.37, 0.50, 0.74, 0.46, 0.57, 0.64,
0.28, 0.61, 0.73, 0.35, 0.78, 0.58, 0.42, 0.32,
0.77, 0.65, 0.49, 0.54, 0.31, 0.29, 0.71, 0.57
];

if (query.length !== DIMENSIONS)
return Response.json({ ok: false, error: `Vector must have ${DIMENSIONS} dimensions` }, { status: 400 });

const matches = await env.VECTORIZE.query(query, {
topK: 3,
returnValues: true,
returnMetadata: "all",
});

return Response.json({ ok: true, matches });
}

return Response.json({ ok: false, error: "Try /insert then /query" }, { status: 404 });
},
} satisfies ExportedHandler<Env>;

Bindings for this manual Worker


// wrangler.jsonc (manual demo)
{
"$schema": "node_modules/wrangler/config-schema.json",
"name": "vectorize-manual-euclidean",
"main": "src/index.ts",
"compatibility_date": "2025-08-10",
"vectorize": { "binding": "VECTORIZE", "index_name": "youtube-index" }
}

Test with curl (Euclidean)


# Insert the sample 5 vectors
curl https://<your-manual-worker>.workers.dev/insert

# Query for top 3 nearest neighbors
curl https://<your-manual-worker>.workers.dev/query

Path B — AI Embeddings with Workers AI (Cosine)


This is the real-world path: you index human text, and search by meaning.


Create a preset index (Cosine)


wrangler vectorize create youtube-index-preset --preset=@cf/baai/bge-base-en-v1.5
# Preset sets: 768 dimensions, cosine distance

wrangler.jsonc bindings


{
"$schema": "node_modules/wrangler/config-schema.json",
"name": "vectorize-youtube",
"main": "src/index.ts",
"compatibility_date": "2025-08-10",
"assets": { "directory": "./public" },
"observability": { "enabled": true },

"vectorize": { "binding": "VECTORIZE", "index_name": "youtube-index-preset" },
"ai": { "binding": "AI" }
}

Worker code: index, seed, search, embed debug


This version contains a robust embedText that handles multiple response shapes from Workers AI.


// src/index.ts (AI demo)

export interface Env {
AI: any; // Workers AI binding
VECTORIZE: any; // Vectorize binding (youtube-index-preset)
}

interface InsertItem {
text: string;
id?: string;
metadata?: Record<string, any>;
}

async function embedText(env: Env, text: string): Promise<number[]> {
const model = "@cf/baai/bge-base-en-v1.5";
// send as array for best compatibility
const result: any = await env.AI.run(model, { text: [text] });

// Normalize across possible shapes:
if (Array.isArray(result?.data) && Array.isArray(result.data[0]) && result.shape) {
return (result.data[0] as number[]).map(Number); // { shape:[1,768], data:[[...]] }
}
if (Array.isArray(result?.data) && Array.isArray(result.data[0]?.embedding)) {
return (result.data[0].embedding as number[]).map(Number); // { data:[{ embedding:[...] }] }
}
if (Array.isArray(result?.embedding)) {
return (result.embedding as number[]).map(Number); // { embedding:[...] }
}
if (Array.isArray(result?.data) && typeof result.data[0] === "number") {
return (result.data as number[]).map(Number); // { data:[...] }
}

console.error("Unexpected embedding response shape:", JSON.stringify(result).slice(0, 500));
throw new Error("Unexpected embedding response");
}

function cors(resp: Response) {
resp.headers.set("Access-Control-Allow-Origin", "*");
resp.headers.set("Access-Control-Allow-Methods", "GET,POST,OPTIONS");
resp.headers.set("Access-Control-Allow-Headers", "Content-Type");
return resp;
}

export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
if (request.method === "OPTIONS") return cors(new Response(null, { status: 204 }));

if (url.pathname === "/health") return cors(Response.json({ ok: true }));

// Debug: see dims/sample
if (url.pathname === "/embed" && request.method === "GET") {
const text = url.searchParams.get("text") || "";
if (!text) return cors(Response.json({ ok: false, error: "Missing ?text" }, { status: 400 }));
const vec = await embedText(env, text);
return cors(Response.json({ ok: true, dims: vec.length, sample: vec.slice(0, 8) }));
}

// Index a single item
if (url.pathname === "/index" && request.method === "POST") {
const body = (await request.json()) as InsertItem;
const text = (body?.text || "").trim();
if (!text) return cors(Response.json({ ok: false, error: "text is required" }, { status: 400 }));

const values = await embedText(env, text);
const id = body.id || crypto.randomUUID();
const inserted = await env.VECTORIZE.insert([{
id,
values,
metadata: { text, ...(body.metadata || {}) }
}]);

return cors(Response.json({ ok: true, id, inserted }));
}

// Bulk seed
if (url.pathname === "/seed" && request.method === "POST") {
const items = (await request.json()) as InsertItem[];
if (!Array.isArray(items) || items.length === 0)
return cors(Response.json({ ok: false, error: "Provide a non-empty array" }, { status: 400 }));

const vectors: Array<{ id: string; values: number[]; metadata?: Record<string, any> }> = [];
for (const item of items) {
const text = (item?.text || "").trim();
if (!text) continue;
const values = await embedText(env, text);
vectors.push({
id: item.id || crypto.randomUUID(),
values,
metadata: { text, ...(item.metadata || {}) }
});
}

if (vectors.length === 0) return cors(Response.json({ ok: false, error: "No valid items" }, { status: 400 }));
const inserted = await env.VECTORIZE.insert(vectors);
return cors(Response.json({ ok: true, count: vectors.length, inserted }));
}

// Search by meaning
if (url.pathname === "/search" && request.method === "GET") {
const text = (url.searchParams.get("text") || "").trim();
if (!text) return cors(Response.json({ ok: false, error: "Missing ?text" }, { status: 400 }));
let topK = Number(url.searchParams.get("topK") || 3);
if (!Number.isFinite(topK) || topK <= 0) topK = 3;

const queryVec = await embedText(env, text);
const matches = await env.VECTORIZE.query(queryVec, {
topK,
returnValues: false,
returnMetadata: "all"
});

return cors(Response.json({ ok: true, query: text, matches }));
}

return cors(Response.json({ ok: false, error: "Not found" }, { status: 404 }));
}
};

Test with curl (Cosine)


# Deploy first
wrangler deploy

# Quick embedding sanity check
curl "https://<your-worker>.workers.dev/embed?text=lightweight+waterproof+jacket"

# Seed 3 items
curl -X POST "https://<your-worker>.workers.dev/seed" \
-H "content-type: application/json" \
-d '[
{"text":"Red running shoes with breathable mesh and foam sole","metadata":{"url":"/products/sku/1001","category":"shoes"}},
{"text":"Lightweight waterproof hiking jacket with hood","metadata":{"url":"/products/sku/2002","category":"jackets"}},
{"text":"Noise-cancelling wireless headphones with 30h battery","metadata":{"url":"/products/sku/3003","category":"audio"}}
]'

# Search by meaning
curl "https://<your-worker>.workers.dev/search?text=rain+proof+jacket+for+hiking&topK=2"

Troubleshooting & Logs


  1. “Worker threw exception” HTML page → open live logs:
wrangler tail
  1. AI binding errors → confirm wrangler binding:
"ai": { "binding": "AI" }

and that Workers AI is enabled for your account.

  1. Dimension mismatch → your index must match your embeddings:
  • Manual demo: 32 dims (Euclidean)
  • AI demo: 768 dims (Cosine preset)
  1. Debug embedding shape → add route /embed or /debug-embed to log result and see its shape.
  2. Check index config:
wrangler vectorize describe youtube-index-preset
wrangler vectorize describe youtube-index

FAQ

Q: Can I store images or PDFs?
Store their embeddings plus metadata/URLs. Fetch the original via the metadata you saved.

Q: Cosine vs Euclidean?

  1. Cosine is great for language/meaning.
  2. Euclidean is intuitive for numeric feature spaces.
    Use the model/preset’s recommended metric.

Q: Do I need Git?
No. wrangler deploy uploads directly from your machine.


How to Use Azure OpenAI Embeddings for Document Search — A Real-World Tutorial

· 10 min read

In this blog, we will explore the Azure OpenAI Service, how it compares to the OpenAI public API, and walk through a complete tutorial showing how to implement semantic search with embeddings using real legislative data.

If you have used ChatGPT and wondered, Why should I care about Azure OpenAI? — this blog will help you understand the key differences, enterprise benefits, and how to get started. This blog is based on a real spoken walkthrough that demonstrates:

  • What embeddings are
  • How to set up Azure OpenAI
  • How to prepare and search data semantically

The walkthrough focuses on practical application using PowerShell and .NET DataTables, with references to the official Azure OpenAI documentation.

🚀 What is Azure OpenAI Service?

Azure OpenAI provides REST API and SDK access (Python, Java, Go, etc.) to powerful models such as:

  • GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o Mini
  • GPT-3.5-Turbo
  • Embeddings models (like text-embedding-ada-002)
  • Vision & Speech models: DALL·E and Whisper

These models can power:

  • ✅ Natural language to code
  • ✅ Document summarization
  • ✅ Semantic search
  • ✅ Image understanding

🔍 Model Capabilities

Azure OpenAI supports text, image, and speech functionalities through models like:

  • GPT-4, GPT-4 Turbo with Vision, GPT-3.5-Turbo
  • GPT-4o, GPT-4o mini
  • Embeddings, DALL·E, Whisper (speech-to-text)

🛠️ Common Use Cases

✅ Natural language to code
✅ Document summarization
✅ Semantic search
✅ Image understanding

🤖 How Does This Compare?

FeatureOpenAI (Public)Azure OpenAI Service
Access✅ Open to public⚠️ Limited access registration
Security⚠️ Basic API key✅ Azure-native security stack
Networking⚠️ Internet-only✅ Private VNet / Private Link
Compliance & SLA❌ None✅ Enterprise-grade SLAs
Responsible AI⚠️ Basic filters✅ Microsoft filters + policy
Authentication

⚠️ OpenAI API key

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("OPENAI_API_KEY")

)

✅ Microsoft Entra ID

import os from openai import AzureOpenAI

client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-07-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") )

🧠 Why Embeddings?

Embeddings allow you to transform words, phrases, or documents into numerical vectors that represent semantic meaning. This enables search that understands meaning, not just keywords.

Think of it like organizing a library not by title, but by what books are about. Books about space go together — even if the words don't match exactly.

You can use this for:

  1. Vector search
  2. Question answering
  3. Document clustering

🔍 Tutorial

This tutorial explores how to set up and use Azure OpenAI Service to enable intelligent document search through embeddings. Rather than keyword matching, you'll leverage semantic understanding using vector representations.

You'll learn to:

  1. Set up Azure OpenAI and deploy the embedding model
  2. Preprocess and normalize textual data
  3. Generate vector embeddings using the text-embedding-ada-002 model
  4. Perform a cosine similarity-based search to retrieve relevant documents

🧱 What You Need Before You Start

Make sure you have:

  1. A valid Azure account with OpenAI resource access
  2. A deployed embedding model like text-embedding-ada-002 (v2) in a supported region
  3. Python 3.8 or above installed
  4. Required libraries: openai, pandas, tiktoken, scikit-learn, matplotlib, plotly, scipy, num2words
  5. Jupyter Notebooks for interactive development

⚙️ Initial Setup

Install the required libraries by running:

pip install openai pandas tiktoken scikit-learn matplotlib plotly scipy num2words

Download the sample dataset using:

curl "https://raw.githubusercontent.com/Azure-Samples/Azure-OpenAI-Docs-Samples/main/Samples/Tutorials/Embeddings/data/bill_sum_data.csv" --output bill_sum_data.csv

This dataset, BillSum, contains summaries of U.S. Congressional bills and is perfect for trying out semantic search.


🔐 Connect to Azure OpenAI

You will need to extract the endpoint and keys from your Azure portal's resource settings. Once noted, add them to your environment:

setx AZURE_OPENAI_API_KEY "<your-key>"
setx AZURE_OPENAI_ENDPOINT "<your-endpoint>"

Extract Endpoint and Keys

Note: We recommend storing secrets in Azure Key Vault to enhance security.


📥 Load and Prepare the Data

import os
import pandas as pd
import re

df = pd.read_csv("bill_sum_data.csv")
df_bills = df[['text', 'summary', 'title']]

def normalize_text(text):
text = re.sub(r'\s+', ' ', text).strip()
text = re.sub(r"\. ,", "", text)
return text.replace("..", ".").replace(". .", ".")

df_bills['text'] = df_bills['text'].apply(normalize_text)

✂️ Token Count Filtering

import tiktoken
tokenizer = tiktoken.get_encoding("cl100k_base")
df_bills['n_tokens'] = df_bills['text'].apply(lambda x: len(tokenizer.encode(x)))
df_bills = df_bills[df_bills.n_tokens < 8192]

This ensures your document size stays within the model's max token limit.


🧠 Embedding Creation

from openai import AzureOpenAI
import numpy as np

client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-02-01",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

def generate_embeddings(text):
return client.embeddings.create(input=[text], model="text-embedding-ada-002").data[0].embedding

df_bills['embedding'] = df_bills['text'].apply(generate_embeddings)

🔍 Semantic Search in Action

Now that embeddings are ready, define similarity logic:

def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def search_docs(df, query, top_n=3):
query_embed = generate_embeddings(query)
df['similarity'] = df['embedding'].apply(lambda x: cosine_similarity(x, query_embed))
return df.sort_values('similarity', ascending=False).head(top_n)

results = search_docs(df_bills, "Tax on cable company revenue")
results[['title', 'summary']]

This finds the most contextually relevant bills.


✅ Real Output Example

print(results['summary'].iloc[0])

“Taxpayer's Right to View Act of 1993 - Prevents cable providers from charging extra for events held in venues built or maintained with tax dollars...”

✅ Complete Code:

import os
import re
import requests
import sys
from num2words import num2words
import os
import pandas as pd
import numpy as np
import tiktoken
from openai import AzureOpenAI

df=pd.read_csv(os.path.join(os.getcwd(),'bill_sum_data.csv'))
df

df_bills = df[['text', 'summary', 'title']]
df_bills

pd.options.mode.chained_assignment = None

# s is input text
def normalize_text(s, sep_token = " \n "):
s = re.sub(r'\s+', ' ', s).strip()
s = re.sub(r". ,","",s)
# remove all instances of multiple spaces
s = s.replace("..",".")
s = s.replace(". .",".")
s = s.replace("\n", "")
s = s.strip()

return s

df_bills['text']= df_bills["text"].apply(lambda x : normalize_text(x))

tokenizer = tiktoken.get_encoding("cl100k_base")
df_bills['n_tokens'] = df_bills["text"].apply(lambda x: len(tokenizer.encode(x)))
df_bills = df_bills[df_bills.n_tokens<8192]
len(df_bills)

df_bills

sample_encode = tokenizer.encode(df_bills.text[0])
decode = tokenizer.decode_tokens_bytes(sample_encode)
decode

len(decode)

client = AzureOpenAI(
api_key = os.getenv("AZURE_OPENAI_API_KEY"),
api_version = "2024-02-01",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)

def generate_embeddings(text, model="text-embedding-ada-002"): # model = "deployment_name"
return client.embeddings.create(input = [text], model=model).data[0].embedding

df_bills['ada_v2'] = df_bills["text"].apply(lambda x : generate_embeddings (x, model = 'text-embedding-ada-002'))

df_bills

def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def get_embedding(text, model="text-embedding-ada-002"): # model = "deployment_name"
return client.embeddings.create(input = [text], model=model).data[0].embedding

def search_docs(df, user_query, top_n=4, to_print=True):
embedding = get_embedding(
user_query,
model="text-embedding-ada-002"
)
df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x, embedding))

res = (
df.sort_values("similarities", ascending=False)
.head(top_n)
)
if to_print:
display(res)
return res


res = search_docs(df_bills, "Can I get information on cable company tax revenue?", top_n=4)

res["summary"][9]

📈 Monitoring Usage and Performance

Once your Azure OpenAI model is deployed and you're actively using embeddings or completions, it's important to monitor both performance and cost.

You can access monitoring insights through the Azure Portal under your resource group:

📊 View Metrics

  • Go to your Azure resource group.
  • Open the OpenAI resource you've deployed.
  • In the Overview section, select Monitoring and then Metrics.
  • Here, you can review charts and data such as:
    • Total request counts (OpenAI SDP requests)
    • Time-to-first-byte and time-between-tokens (useful for latency analysis)
    • Token usage over time

View metrics

Note: You can choose different metrics from the dropdown and visualize performance and request throughput to understand model behavior.

🔔 Create Alerts

  • To proactively manage anomalies or over-usage, set up alert rules.
  • Click Create Alert Rule under Monitoring > Alerts.
  • You can define conditions like "Requests > 1000 in 1 hour" and choose your preferred notification method.

🪵 Enable Diagnostic Logging

  • Navigate to Diagnostic settings.
  • Click Add diagnostic setting and provide a name.
  • Choose what to log: audit logs, request logs, latency metrics, etc.
  • Send logs to:
    • Azure Storage Account (for long-term archival)
    • Log Analytics Workspace (for Kusto queries)
    • Event Hub (for real-time streaming)

🔍 Example Use Case

Let's say you want to investigate a drop in model accuracy. You could:

  • Check latency spikes in metrics.
  • View the number of requests hitting your embedding model.
  • Correlate this with recent changes in input data or prompt structure.

Azure Monitor provides all the tools needed to gain this visibility without external integrations.


🧹 Resource Cleanup

Once your testing or experimentation is done, it's important to clean up your Azure resources to avoid unnecessary charges — especially since deployed models can incur costs even when idle.

🔽 Step-by-step Cleanup

  1. Navigate to Azure AI Studio / Azure OpenAI in the Azure Portal
    Go to the resource you created earlier. You'll need to delete both the deployed model and the resource group itself.

  2. Delete the Deployed Model

    • In the Azure AI Foundry portal or your resource's Deployments tab, locate the deployed model (e.g., text-embedding-ada-002).
    • Click on the deployment entry, then choose Delete.
    • Confirm the deletion. This stops the model from incurring compute charges.
  3. Delete the Azure OpenAI Resource

    • After the model is removed, go back to your Resource Group in Azure (e.g., yt-research-group).
    • Click the Delete button.
    • Confirm your selection. This ensures you're not billed for any associated services.
  4. Stop Local Resources (Optional) If you ran a Jupyter Notebook or local development server (e.g., WSL, Ubuntu), you can safely terminate those now.

  5. Use Azure Monitor for Visibility (Optional but Recommended)

    • While in the portal, head to MonitoringMetrics under your Azure OpenAI resource.
    • You can inspect logs for token usage, latency (e.g., time to first byte), and total requests.
    • Set up Alerts or enable Diagnostic Settings to forward logs to Log Analytics or Azure Storage.

💡 Deleting unused resources helps manage cost, prevents service sprawl, and ensures security hygiene.


  1. What is Azure OpenAI?
  2. Using Jupyter Notebooks
  3. Azure Vector Search

🔚 Call to Action

Choosing the right platform depends on your organizations needs. For more insights, subscribe to our newsletter for insights on cloud computing, tips, and the latest trends in technology. or follow our video series on cloud comparisons.

Need help launching your app on AWS? Visit arinatechnologies.com for expert help in cloud architecture.

Interested in having your organization setup on cloud? If yes, please contact us and we'll be more than glad to help you embark on cloud journey.

💬 Drop a comment below if you'd like to see part 2 (add maps, filters, and REST APIs!)