Migración desde Azure SDK de inferencia de IA al SDK de OpenAI

En este artículo se proporcionan instrucciones sobre cómo migrar las aplicaciones desde el SDK de inferencia de Azure AI al SDK de OpenAI. El SDK de OpenAI ofrece una compatibilidad más amplia, acceso a las últimas características de OpenAI y código simplificado con patrones unificados en Azure modelos OpenAI y Foundry.

Nota

El SDK de OpenAI hace referencia a las bibliotecas cliente (como el paquete /Python openai o javaScript openai paquete npm) que se conectan a puntos de conexión de API de OpenAI v1. Estos SDK tienen su propio control de versiones independiente de la versión de la API. Por ejemplo, el SDK de Go OpenAI está en la versión 3, pero todavía se conecta a los puntos de conexión de API de OpenAI v1 con /openai/v1/ en la ruta de URL.

Ventajas de migrar

La migración al SDK de OpenAI proporciona varias ventajas:

Compatibilidad ampliada con modelos: funciona con Azure OpenAI en modelos de Foundry y otros modelos de Foundry de proveedores como DeepSeek y Grok.
Unified API: usa las mismas bibliotecas y clientes del SDK para los puntos de conexión de OpenAI y Azure OpenAI.
Últimas características: acceso a las funciones más nuevas de OpenAI sin tener que esperar actualizaciones específicas de Azure.
Autenticación simplificada: compatibilidad integrada con la autenticación mediante clave de API y Microsoft Entra ID
Control de versiones de API implícita: la API v1 elimina la necesidad de actualizar api-version con frecuencia los parámetros.

Diferencias clave

En la tabla siguiente se muestran las principales diferencias entre los dos SDK:

Aspecto	SDK de inferencia de IA de Azure	SDK de OpenAI
Clase de cliente	`ChatCompletionsClient`	`OpenAI`
Formato de punto de conexión	`https://<resource>.services.ai.azure.com/models`	`https://<resource>.openai.azure.com/openai/v1/`
Versión de API	Obligatorio en una URL o un parámetro	No obligatorio (usa la API v1)
Parámetro de modelo	Opcional (para puntos de conexión de varios modelos)	Obligatorio (nombre de implementación)
Autenticación	solo credenciales de Azure	Clave de API o credenciales de Azure

Configuración

Instale el SDK de OpenAI:

pip install openai

Para la autenticación de Microsoft Entra ID, instale también:

pip install azure-identity

Con la autenticación de clave de API:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://<resource>.openai.azure.com/openai/v1/",
)

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)

Con la autenticación Microsoft Entra ID:

OpenAI SDK
Azure SDK de inferencia de IA

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), 
    "https://ai.azure.com/.default"
)

client = OpenAI(
    base_url="https://<resource>.openai.azure.com/openai/v1/",
    api_key=token_provider,
)

from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=DefaultAzureCredential(),
    credential_scopes=["https://cognitiveservices.azure.com/.default"],
)

Completaciones del chat

OpenAI SDK
Azure SDK de inferencia de IA

response = client.chat.completions.create(
    model="DeepSeek-V3.1",  # Required: your deployment name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "How many languages are in the world?"}
    ]
)

print(response.choices[0].message.content)

La salida es la siguiente:

Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    model="DeepSeek-V3.1"  # Optional for single-model endpoints
)

print(response.choices[0].message.content)

La salida es la siguiente:

Response: <think>Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...</think>As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.

Transmisión en tiempo real

OpenAI SDK
Azure SDK de inferencia de IA

stream = client.chat.completions.create(
    model="DeepSeek-V3.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about Azure."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    stream=True,
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Write a poem about Azure."),
    ],
    model="DeepSeek-V3.1"
)

for update in response:
    if update.choices:
        print(update.choices[0].delta.content or "", end="")

Responses

La API de respuestas es la interfaz con estado de OpenAI que devuelve una matriz estructurada output que contiene elementos de mensaje, llamada a herramientas y razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

response = client.responses.create(
    model="DeepSeek-V3.1",  # Required: your deployment name
    input="How many languages are in the world?",
    max_output_tokens=2000,
)

print(response.output_text)

Reasoning

Nota

Esta información sobre el contenido de razonamiento no se aplica a los modelos de Azure OpenAI. Los modelos de razonamiento de Azure OpenAI usan la característica de resúmenes de razonamiento.

Algunos modelos de razonamiento, como DeepSeek-R1, generan conclusiones e incluyen el razonamiento detrás de estas. La API de respuestas muestra esto como un elemento de salida estructurado reasoning cuyo summary[].text contenido contiene el pensamiento del modelo, junto con la respuesta final.

OpenAI SDK
Azure SDK de inferencia de IA

response = client.responses.create(
    model="DeepSeek-R1-0528",  # Required: your deployment name
    input="How many languages are in the world?",
    max_output_tokens=2000,
)

# Walk response.output for items of type "reasoning" and join summary[].text.
parts = []
for item in getattr(response, "output", None) or []:
    if getattr(item, "type", None) != "reasoning":
        continue
    for s in getattr(item, "summary", None) or []:
        text = getattr(s, "text", None)
        if text:
            parts.append(text)
reasoning_summary = "\n".join(parts).strip()

print("Thinking:", reasoning_summary)
print("Answer:", response.output_text)

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer: There are approximately 7,000 languages spoken around the world today.

Nota

Problema conocido: En el caso de los modelos de Foundry (modelos que no son de Azure OpenAI), como DeepSeek-R1-0528, el texto del resumen de razonamiento en cada elemento de salida reasoning se completa de forma fiable, pero el recuento de tokens de razonamiento en los detalles de uso de la respuesta (reasoning_tokens en tránsito) actualmente muestra 0 incluso cuando el texto del resumen está presente. No confíe en el recuento de tokens de razonamiento para la facturación o la contabilidad de cuotas al usar modelos de Foundry. Esta advertencia no se aplica a Azure OpenAI en foundry Models.

El SDK de inferencia de Azure IA no expone la API de respuestas. Para obtener contenido de razonamiento, llame a la API de finalizaciones de chat en su lugar. El razonamiento está incluido en el contenido del mensaje, envuelto entre las etiquetas <think> y </think>, y puede extraerse mediante una coincidencia con una expresión regular.

import re
from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    model="DeepSeek-R1-0528"  # Optional for single-model endpoints
)

content = response.choices[0].message.content
match = re.match(r"<think>(.*?)</think>(.*)", content, re.DOTALL)
if match:
    print("Thinking:", match.group(1).strip())
    print("Answer: ", match.group(2).strip())
else:
    print("Response:", content)

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:  There are approximately 7,000 languages spoken around the world today.

Cuando mantengas conversaciones de varios turnos, evita enviar el contenido del razonamiento en el historial del chat, porque tiende a generar explicaciones largas.

Incrustaciones

OpenAI SDK
Azure SDK de inferencia de IA

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(DefaultAzureCredential(), 
"https://ai.azure.com/.default")

client = OpenAI(
    base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key = token_provider,
)

response = client.embeddings.create(
    input = "How do I use Python in VS Code?",
    model = "text-embedding-3-large" // Use the name of your deployment
)
print(response.data[0].embedding)

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

client = EmbeddingsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)

response = client.embed(
    input=["Your text string goes here"],
    model="text-embedding-3-small"
)

embedding = response.data[0].embedding

Configuración

Instale el SDK de OpenAI:

dotnet add package OpenAI

Para la autenticación de Microsoft Entra ID, instale también:

dotnet add package Azure.Identity

Configuración del cliente

Con la autenticación de clave de API:

OpenAI SDK
Azure SDK de inferencia de IA

using OpenAI;
using OpenAI.Chat;
using System.ClientModel;

ChatClient client = new(
    model: "gpt-4o-mini", // Your deployment name
    credential: new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")),
    options: new OpenAIClientOptions() { 
        Endpoint = new Uri("https://<resource>.openai.azure.com/openai/v1/")
    }
);

using Azure;
using Azure.AI.Inference;

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Con la autenticación Microsoft Entra ID:

OpenAI SDK
Azure SDK de inferencia de IA

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default"
);

ChatClient client = new(
    model: "gpt-4o-mini", // Your deployment name
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions() {
        Endpoint = new Uri("https://<resource>.openai.azure.com/openai/v1/")
    }
);

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new DefaultAzureCredential()
);

Completaciones del chat

OpenAI SDK
Azure SDK de inferencia de IA

using OpenAI.Chat;

ChatCompletion completion = client.CompleteChat(
    new SystemChatMessage("You are a helpful assistant."),
    new UserChatMessage("What is Azure AI?")
);

Console.WriteLine(completion.Content[0].Text);

using Azure.AI.Inference;

ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    Model = "DeepSeek-V3.1", // Optional for single-model endpoints
};

Response<ChatCompletions> response = client.Complete(requestOptions);
Console.WriteLine(response.Value.Choices[0].Message.Content);

Transmisión en tiempo real

OpenAI SDK
Azure SDK de inferencia de IA

using OpenAI.Chat;

CollectionResult<StreamingChatCompletionUpdate> updates = client.CompleteChatStreaming(
    new SystemChatMessage("You are a helpful assistant."),
    new UserChatMessage("Write a poem about Azure.")
);

foreach (StreamingChatCompletionUpdate update in updates)
{
    foreach (ChatMessageContentPart part in update.ContentUpdate)
    {
        Console.Write(part.Text);
    }
}

using Azure.AI.Inference;

ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("Write a poem about Azure.")
    },
    Model = "gpt-4o-mini",
};

StreamingResponse<StreamingChatCompletionsUpdate> response = client.CompleteStreaming(requestOptions);

await foreach (StreamingChatCompletionsUpdate update in response)
{
    if (update.ContentUpdate != null)
    {
        Console.Write(update.ContentUpdate);
    }
}

Responses

La API de respuestas es la interfaz con estado de OpenAI que devuelve una matriz estructurada output que contiene elementos de mensaje, llamada a herramientas y razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

using OpenAI.Responses;

var responseClient = client.GetResponsesClient("DeepSeek-V3.1");
var result = await responseClient.CreateResponseAsync(new CreateResponseOptions(
    [ResponseItem.CreateUserMessageItem("How many languages are in the world?")])
    { MaxOutputTokenCount = 2000 }
);

Console.WriteLine(result.Value.GetOutputText());

Reasoning

Nota

Esta información sobre el contenido de razonamiento no se aplica a los modelos de Azure OpenAI. Los modelos de razonamiento de Azure OpenAI usan la característica de resúmenes de razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

using System.Text;
using OpenAI.Responses;

var responseClient = client.GetResponsesClient("DeepSeek-R1-0528");
var result = await responseClient.CreateResponseAsync(new CreateResponseOptions(
    [ResponseItem.CreateUserMessageItem("How many languages are in the world?")])
    { MaxOutputTokenCount = 2000 }
);

// Walk OutputItems for ReasoningResponseItem entries and join SummaryParts text.
var sb = new StringBuilder();
foreach (var item in result.Value.OutputItems)
{
    if (item is not ReasoningResponseItem reasoning) continue;
    foreach (var part in reasoning.SummaryParts)
    {
        if (part is ReasoningSummaryTextPart textPart && !string.IsNullOrEmpty(textPart.Text))
        {
            if (sb.Length > 0) sb.Append('\n');
            sb.Append(textPart.Text);
        }
    }
}

Console.WriteLine($"Thinking: {sb.ToString().Trim()}");
Console.WriteLine($"Answer:   {result.Value.GetOutputText()}");

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

Nota

using Azure.AI.Inference;
using System.Text.RegularExpressions;

ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    Model = "DeepSeek-R1-0528", // Optional for single-model endpoints
};

Response<ChatCompletions> response = client.Complete(requestOptions);
string content = response.Value.Choices[0].Message.Content;

Regex regex = new Regex(@"<think>(.*?)</think>(.*)", RegexOptions.Singleline);
Match match = regex.Match(content);

if (match.Success)
{
    Console.WriteLine($"Thinking: {match.Groups[1].Value.Trim()}");
    Console.WriteLine($"Answer:   {match.Groups[2].Value.Trim()}");
}
else
{
    Console.WriteLine($"Response: {content}");
}

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

Cuando mantengas conversaciones de varios turnos, evita enviar el contenido del razonamiento en el historial del chat, porque tiende a generar explicaciones largas.

Incrustaciones

OpenAI SDK
Azure SDK de inferencia de IA

using OpenAI;
using OpenAI.Embeddings;
using System.ClientModel;

EmbeddingClient client = new(
    "text-embedding-3-small",
    credential: new ApiKeyCredential("API-KEY"),
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

string input = "This is a test";

OpenAIEmbedding embedding = client.GenerateEmbedding(input);
ReadOnlyMemory<float> vector = embedding.ToFloats();
Console.WriteLine($"Embeddings: [{string.Join(", ", vector.ToArray())}]");

using Azure;
using Azure.AI.Inference;

EmbeddingsClient client = new EmbeddingsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

EmbeddingsOptions embeddingsOptions = new EmbeddingsOptions()
{
    Input = { "Your text string goes here" },
    Model = "text-embedding-3-small"
};

Response<EmbeddingsResult> response = client.Embed(embeddingsOptions);
ReadOnlyMemory<float> embedding = response.Value.Data[0].Embedding;

Configuración

Instale el SDK de OpenAI:

npm install openai

Para la autenticación de Microsoft Entra ID, instale también:

npm install @azure/identity

Configuración del cliente

Con la autenticación de clave de API:

OpenAI SDK
Azure SDK de inferencia de IA

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://<resource>.openai.azure.com/openai/v1/",
    apiKey: process.env.AZURE_OPENAI_API_KEY
});

import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
);

Con la autenticación Microsoft Entra ID:

OpenAI SDK
Azure SDK de inferencia de IA

import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { OpenAI } from "openai";

const tokenProvider = getBearerTokenProvider(
    new DefaultAzureCredential(),
    'https://ai.azure.com/.default'
);

const client = new OpenAI({
    baseURL: "https://<resource>.openai.azure.com/openai/v1/",
    apiKey: tokenProvider
});

import ModelClient from "@azure-rest/ai-inference";
import { DefaultAzureCredential } from "@azure/identity";

const clientOptions = { 
    credentials: { 
        scopes: ["https://cognitiveservices.azure.com/.default"] 
    } 
};

const client = ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new DefaultAzureCredential(),
    clientOptions
);

Completaciones del chat

OpenAI SDK
Azure SDK de inferencia de IA

const completion = await client.chat.completions.create({
    model: "DeepSeek-V3.1", // Required: your deployment name
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "How many languages are in the world?" }
    ]
});

console.log(completion.choices[0].message.content);

const response = await client.path("/chat/completions").post({
    body: {
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            { role: "user", content: "How many languages are in the world?" }
        ],
        model: "DeepSeek-V3.1" // Optional for single-model endpoints
    }
});

console.log(response.body.choices[0].message.content);

Transmisión en tiempo real

OpenAI SDK
Azure SDK de inferencia de IA

const stream = await client.chat.completions.create({
    model: "DeepSeek-V3.1",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Write a poem about Azure." }
    ],
    stream: true
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}

const response = await client.path("/chat/completions").post({
    body: {
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            { role: "user", content: "Write a poem about Azure." }
        ],
        model: "DeepSeek-V3.1",
        stream: true
    }
}).asNodeStream();

for await (const chunk of response) {
    if (chunk.choices && chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}

Responses

La API de respuestas es la interfaz con estado de OpenAI que devuelve una matriz estructurada output que contiene elementos de mensaje, llamada a herramientas y razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

const response = await client.responses.create({
    model: "DeepSeek-V3.1", // Required: your deployment name
    input: "How many languages are in the world?",
    max_output_tokens: 2000,
});

console.log(response.output_text);

Reasoning

Nota

Esta información sobre el contenido de razonamiento no se aplica a los modelos de Azure OpenAI. Los modelos de razonamiento de Azure OpenAI usan la característica de resúmenes de razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

const response = await client.responses.create({
    model: "DeepSeek-R1-0528", // Required: your deployment name
    input: "How many languages are in the world?",
    max_output_tokens: 2000,
});

// Walk response.output for items of type "reasoning" and join summary[].text.
const parts = [];
for (const item of response?.output ?? []) {
    if (item?.type !== "reasoning") continue;
    for (const s of item?.summary ?? []) {
        if (s?.text) parts.push(s.text);
    }
}
const reasoningSummary = parts.join("\n").trim();

console.log("Thinking:", reasoningSummary);
console.log("Answer:  ", response.output_text);

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

Nota

Problema conocido: Para los modelos de Foundry (modelos que no son de Azure OpenAI), como DeepSeek-R1-0528, el texto de resumen del razonamiento de cada elemento de salida reasoning se completa de forma fiable, pero el recuento de tokens de razonamiento en los detalles de uso de la respuesta (reasoning_tokens en el cable) actualmente muestra 0 incluso cuando el texto de resumen está presente. No confíe en el recuento de tokens de razonamiento para la facturación o la contabilidad de cuotas al usar modelos de Foundry. Esta advertencia no se aplica a Azure OpenAI en foundry Models.

El SDK de inferencia de Azure IA no expone la API de respuestas. Para obtener contenido de razonamiento, utilice la API Chat Completions en su lugar. El razonamiento está incluido en el contenido del mensaje, envuelto entre las etiquetas <think> y </think>, y puede extraerse mediante una coincidencia con una expresión regular.

const response = await client.path("/chat/completions").post({
    body: {
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            { role: "user", content: "How many languages are in the world?" }
        ],
        model: "DeepSeek-R1-0528" // Optional for single-model endpoints
    }
});

const content = response.body.choices[0].message.content;
const match = content.match(/<think>(.*?)<\/think>(.*)/s);

if (match) {
    console.log("Thinking:", match[1].trim());
    console.log("Answer:  ", match[2].trim());
} else {
    console.log("Response:", content);
}

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

Al mantener conversaciones de varios turnos, evite enviar el contenido del razonamiento en el historial del chat, ya que este tiende a generar explicaciones largas.

Incrustaciones

OpenAI SDK
Azure SDK de inferencia de IA

import OpenAI from "openai";
import { getBearerTokenProvider, DefaultAzureCredential } from "@azure/identity";

const tokenProvider = getBearerTokenProvider(
    new DefaultAzureCredential(),
    'https://ai.azure.com/.default');
const client = new OpenAI({
    baseURL: "https://<resource>.openai.azure.com/openai/v1/",
    apiKey: tokenProvider
});

const embedding = await client.embeddings.create({
  model: "text-embedding-3-large", // Required: your deployment name
  input: "The quick brown fox jumped over the lazy dog",
  encoding_format: "float",
});

console.log(embedding);

import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = ModelClient(
    "https://<resource>.services.ai.azure.com/models",
    new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
);

const response = await client.path("/embeddings").post({
    body: {
        input: ["Your text string goes here"],
        model: "text-embedding-3-small"
    }
});

const embedding = response.body.data[0].embedding;

Configuración

Agregue el SDK de OpenAI al proyecto. Consulte el repositorio OpenAI Java GitHub para obtener las instrucciones de instalación y versión más recientes.

Para la autenticación de Microsoft Entra ID, agregue también:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.18.0</version>
</dependency>

Configuración del cliente

Con la autenticación de clave de API:

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;

OpenAIClient client = OpenAIOkHttpClient.builder()
    .baseUrl("https://<resource>.openai.azure.com/openai/v1/")
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .build();

import com.azure.ai.inference.ChatCompletionsClient;
import com.azure.ai.inference.ChatCompletionsClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(new AzureKeyCredential(System.getenv("AZURE_INFERENCE_CREDENTIAL")))
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

Con la autenticación Microsoft Entra ID:

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.azure.identity.DefaultAzureCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;

DefaultAzureCredential tokenCredential = new DefaultAzureCredentialBuilder().build();

OpenAIClient client = OpenAIOkHttpClient.builder()
    .baseUrl("https://<resource>.openai.azure.com/openai/v1/")
    .credential(BearerTokenCredential.create(
        AuthenticationUtil.getBearerTokenSupplier(
            tokenCredential, 
            "https://ai.azure.com/.default"
        )
    ))
    .build();

import com.azure.ai.inference.ChatCompletionsClient;
import com.azure.ai.inference.ChatCompletionsClientBuilder;
import com.azure.identity.DefaultAzureCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.core.credential.TokenCredential;

TokenCredential credential = new DefaultAzureCredentialBuilder().build();
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(credential)
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

Completaciones del chat

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.models.chat.completions.*;

ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
    .addSystemMessage("You are a helpful assistant.")
    .addUserMessage("How many languages are in the world?")
    .model("DeepSeek-V3.1") // Required: your deployment name
    .build();

ChatCompletion completion = client.chat().completions().create(params);
System.out.println(completion.choices().get(0).message().content());

import com.azure.ai.inference.models.*;
import java.util.List;

List<ChatRequestMessage> messages = List.of(
    new ChatRequestSystemMessage("You are a helpful assistant."),
    new ChatRequestUserMessage("How many languages are in the world?")
);

ChatCompletionsOptions options = new ChatCompletionsOptions(messages);
options.setModel("DeepSeek-V3.1"); // Optional for single-model endpoints

ChatCompletions response = client.complete(options);
System.out.println(response.getChoices().get(0).getMessage().getContent());

Transmisión en tiempo real

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.models.chat.completions.*;
import java.util.stream.Stream;

ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
    .addSystemMessage("You are a helpful assistant.")
    .addUserMessage("Write a poem about Azure.")
    .model("DeepSeek-V3.1") // Required: your deployment name
    .build();

Stream<ChatCompletionChunk> stream = client.chat().completions().createStreaming(params);

stream.forEach(chunk -> {
    if (chunk.choices() != null && !chunk.choices().isEmpty()) {
        String content = chunk.choices().get(0).delta().content();
        if (content != null) {
            System.out.print(content);
        }
    }
});

import com.azure.ai.inference.models.*;

List<ChatRequestMessage> messages = List.of(
    new ChatRequestSystemMessage("You are a helpful assistant."),
    new ChatRequestUserMessage("Write a poem about Azure.")
);

ChatCompletionsOptions options = new ChatCompletionsOptions(messages);
options.setModel("DeepSeek-V3.1");

IterableStream<ChatCompletions> response = client.completeStream(options);

response.forEach(update -> {
    if (update.getChoices() != null && !update.getChoices().isEmpty()) {
        String content = update.getChoices().get(0).getDelta().getContent();
        if (content != null) {
            System.out.print(content);
        }
    }
});

Responses

La API de respuestas es la interfaz con estado de OpenAI que devuelve una matriz estructurada output que contiene elementos de mensaje, llamada a herramientas y razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.models.responses.Response;
import com.openai.models.responses.ResponseCreateParams;

Response response = client.responses().create(
    ResponseCreateParams.builder()
        .model("DeepSeek-V3.1") // Required: your deployment name
        .input("How many languages are in the world?")
        .maxOutputTokens(2000)
        .build()
);

System.out.println(response.outputText());

Reasoning

Nota

Esta información sobre el contenido de razonamiento no se aplica a los modelos de Azure OpenAI. Los modelos de razonamiento de Azure OpenAI usan la característica de resúmenes de razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

import com.openai.models.responses.Response;
import com.openai.models.responses.ResponseCreateParams;

Response response = client.responses().create(
    ResponseCreateParams.builder()
        .model("DeepSeek-R1-0528") // Required: your deployment name
        .input("How many languages are in the world?")
        .maxOutputTokens(2000)
        .build()
);

// Walk response.output() for items of type "reasoning" and join summary[].text.
StringBuilder sb = new StringBuilder();
response.output().stream()
    .flatMap(item -> item.reasoning().stream())
    .flatMap(reasoning -> reasoning.summary().stream())
    .forEach(summary -> {
        String text = summary.text();
        if (text != null && !text.isEmpty()) {
            if (sb.length() > 0) sb.append("\n");
            sb.append(text);
        }
    });

System.out.println("Thinking: " + sb.toString().trim());

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...

Nota

Problema conocido: En el caso de los modelos de Foundry (no modelos de Azure OpenAI), como DeepSeek-R1-0528, el texto del resumen del razonamiento en cada elemento de salida reasoning se completa de forma fiable, pero el recuento de tokens de razonamiento en los detalles de uso de la respuesta (reasoning_tokens a nivel de protocolo) informa actualmente de 0 incluso cuando el texto del resumen está presente. No confíe en el recuento de tokens de razonamiento para la facturación o la contabilidad de cuotas al usar modelos de Foundry. Esta advertencia no se aplica a Azure OpenAI en foundry Models.

El SDK de inferencia de Azure IA no expone la API de respuestas. Para obtener contenido de razonamiento, use en su lugar la API de completaciones de chat. El razonamiento se incluye en el contenido del mensaje, envuelto entre las etiquetas <think> y </think>, y puede extraerse mediante una expresión regular.

import com.azure.ai.inference.models.*;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

List<ChatRequestMessage> messages = List.of(
    new ChatRequestSystemMessage("You are a helpful assistant."),
    new ChatRequestUserMessage("How many languages are in the world?")
);

ChatCompletionsOptions options = new ChatCompletionsOptions(messages);
options.setModel("DeepSeek-R1-0528"); // Optional for single-model endpoints

ChatCompletions response = client.complete(options);
String content = response.getChoices().get(0).getMessage().getContent();

Pattern pattern = Pattern.compile("<think>(.*?)</think>(.*)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(content);

if (matcher.find()) {
    System.out.println("Thinking: " + matcher.group(1).trim());
    System.out.println("Answer:   " + matcher.group(2).trim());
} else {
    System.out.println("Response: " + content);
}

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

En las conversaciones de varios turnos, evite enviar el contenido del razonamiento en el historial del chat, ya que el razonamiento tiende a generar explicaciones largas.

Incrustaciones

OpenAI SDK
Azure SDK de inferencia de IA

package com.openai.example;

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.embeddings.EmbeddingCreateParams;
import com.openai.models.embeddings.EmbeddingModel;

public final class EmbeddingsExample {
    private EmbeddingsExample() {}

    public static void main(String[] args) {
        // Configures using one of:
        // - The `OPENAI_API_KEY` environment variable
        // - The `OPENAI_BASE_URL` and `AZURE_OPENAI_KEY` environment variables
        OpenAIClient client = OpenAIOkHttpClient.fromEnv();

        EmbeddingCreateParams createParams = EmbeddingCreateParams.builder()
                .input("The quick brown fox jumped over the lazy dog")
                .model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL)
                .build();

        System.out.println(client.embeddings().create(createParams));
    }
}

import com.azure.ai.inference.EmbeddingsClient;
import com.azure.ai.inference.EmbeddingsClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

EmbeddingsClient client = new EmbeddingsClientBuilder()
    .credential(new AzureKeyCredential(System.getenv("AZURE_INFERENCE_CREDENTIAL")))
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

EmbeddingsOptions embeddingsOptions = new EmbeddingsOptions(
    List.of("Your text string goes here")
);
embeddingsOptions.setModel("text-embedding-3-small");

EmbeddingsResult response = client.embed(embeddingsOptions);
List<Float> embedding = response.getData().get(0).getEmbedding();

Configuración

Instale el SDK de OpenAI:

go get github.com/openai/openai-go/v3

Para la autenticación de Microsoft Entra ID, instale también:

go get -u github.com/Azure/azure-sdk-for-go/sdk/azidentity

Configuración del cliente

Con la autenticación de clave de API:

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "github.com/openai/openai-go/v3"
    "github.com/openai/openai-go/v3/option"
)

client := openai.NewClient(
    option.WithBaseURL("https://<resource>.openai.azure.com/openai/v1/"),
    option.WithAPIKey(os.Getenv("AZURE_OPENAI_API_KEY")),
)

Con la autenticación Microsoft Entra ID:

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "github.com/Azure/azure-sdk-for-go/sdk/azidentity"
    "github.com/openai/openai-go/v3"
    "github.com/openai/openai-go/v3/azure"
    "github.com/openai/openai-go/v3/option"
)

tokenCredential, err := azidentity.NewDefaultAzureCredential(nil)
if err != nil {
    panic(err)
}

client := openai.NewClient(
    option.WithBaseURL("https://<resource>.openai.azure.com/openai/v1/"),
    azure.WithTokenCredential(tokenCredential),
)

Completaciones del chat

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "context"
    "fmt"
    "github.com/openai/openai-go/v3"
)

chatCompletion, err := client.Chat.Completions.New(context.TODO(), openai.ChatCompletionNewParams{
    Messages: []openai.ChatCompletionMessageParamUnion{
        openai.SystemMessage("You are a helpful assistant."),
        openai.UserMessage("What is Azure AI?"),
    },
    Model: "DeepSeek-V3.1", // Required: your deployment name
})

if err != nil {
    panic(err.Error())
}

fmt.Println(chatCompletion.Choices[0].Message.Content)

Transmisión en tiempo real

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "context"
    "fmt"
    "github.com/openai/openai-go/v3"
)

stream := client.Chat.Completions.NewStreaming(context.TODO(), openai.ChatCompletionNewParams{
    Messages: []openai.ChatCompletionMessageParamUnion{
        openai.SystemMessage("You are a helpful assistant."),
        openai.UserMessage("Write a poem about Azure."),
    },
    Model: "DeepSeek-V3.1", // Required: your deployment name
})

for stream.Next() {
    chunk := stream.Current()
    if len(chunk.Choices) > 0 && chunk.Choices[0].Delta.Content != "" {
        fmt.Print(chunk.Choices[0].Delta.Content)
    }
}

if err := stream.Err(); err != nil {
    panic(err.Error())
}

Responses

La API de respuestas es la interfaz con estado de OpenAI que devuelve una matriz estructurada output que contiene elementos de mensaje, llamada a herramientas y razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "context"
    "fmt"

    "github.com/openai/openai-go/v3"
    "github.com/openai/openai-go/v3/responses"
)

resp, err := client.Responses.New(context.TODO(), responses.ResponseNewParams{
    Model: "DeepSeek-V3.1", // Required: your deployment name
    Input: responses.ResponseNewParamsInputUnion{
        OfString: openai.String("How many languages are in the world?"),
    },
    MaxOutputTokens: openai.Int(2000),
})
if err != nil {
    panic(err.Error())
}

fmt.Println(resp.OutputText())

Reasoning

Nota

Esta información sobre el contenido de razonamiento no se aplica a los modelos de Azure OpenAI. Los modelos de razonamiento de Azure OpenAI usan la característica de resúmenes de razonamiento.

OpenAI SDK
Azure SDK de inferencia de IA

import (
    "context"
    "fmt"
    "strings"

    "github.com/openai/openai-go/v3"
    "github.com/openai/openai-go/v3/responses"
)

resp, err := client.Responses.New(context.TODO(), responses.ResponseNewParams{
    Model: "DeepSeek-R1-0528", // Required: your deployment name
    Input: responses.ResponseNewParamsInputUnion{
        OfString: openai.String("How many languages are in the world?"),
    },
    MaxOutputTokens: openai.Int(2000),
})
if err != nil {
    panic(err.Error())
}

// Walk resp.Output for items of type "reasoning" and join summary[].text.
var parts []string
for _, item := range resp.Output {
    if item.Type != "reasoning" {
        continue
    }
    for _, s := range item.Summary {
        if s.Text != "" {
            parts = append(parts, s.Text)
        }
    }
}
reasoningSummary := strings.TrimSpace(strings.Join(parts, "\n"))

fmt.Println("Thinking:", reasoningSummary)
fmt.Println("Answer:  ", resp.OutputText())

La salida es la siguiente:

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
Answer:   There are approximately 7,000 languages spoken around the world today.

Nota

Cuando mantengas conversaciones de varios turnos, evita enviar el contenido del razonamiento en el historial del chat, porque tiende a generar explicaciones largas.

Incrustaciones

OpenAI SDK
Azure SDK de inferencia de IA

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/Azure/azure-sdk-for-go/sdk/azidentity"
    "github.com/openai/openai-go/v3"
    "github.com/openai/openai-go/v3/azure"
    "github.com/openai/openai-go/v3/option"
)

func main() {
    tokenCredential, err := azidentity.NewDefaultAzureCredential(nil)
    if err != nil {
        log.Fatalf("Error creating credential:%s", err)
    }
    // Create a client with Azure OpenAI endpoint and Entra ID credentials
    client := openai.NewClient(
        option.WithBaseURL("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"),
        azure.WithTokenCredential(tokenCredential),
    )

    inputText := "The quick brown fox jumped over the lazy dog"

    // Make the embedding request synchronously
    resp, err := client.Embeddings.New(context.Background(), openai.EmbeddingNewParams{
        Model: openai.EmbeddingModel("text-embedding-3-large"), // Use your deployed model name on Azure
        Input: openai.EmbeddingNewParamsInputUnion{
            OfArrayOfStrings: []string{inputText},
        },
    })
    if err != nil {
        log.Fatalf("Failed to get embedding: %s", err)
    }

    if len(resp.Data) == 0 {
        log.Fatalf("No embedding data returned.")
    }

    // Print embedding information
    embedding := resp.Data[0].Embedding
    fmt.Printf("Embedding Length: %d\n", len(embedding))
    fmt.Println("Embedding Values:")
    for _, value := range embedding {
        fmt.Printf("%f, ", value)
    }
    fmt.Println()
}

Patrones comunes de migración

Control de parámetros de modelo

Azure SDK de inferencia de IA: el parámetro model es opcional para los puntos de conexión de modelo único, pero necesarios para los puntos de conexión de varios modelos.
SDK de OpenAI: el model parámetro siempre es necesario y debe establecerse en el nombre de la implementación.

Formato de dirección URL del punto de conexión

SDK de inferencia de Azure IA: utiliza https://<resource>.services.ai.azure.com/models.
SDK de OpenAI: usa https://<resource>.openai.azure.com/openai/v1 (se conecta a la API de OpenAI v1).

Estructura de respuesta

La estructura de respuesta es similar, pero tiene algunas diferencias:

SDK de inferencia de IA de Azure: devuelve un ChatCompletions objeto con choices[].message.content.
SDK de OpenAI: devuelve ChatCompletion el objeto con choices[].message.content.

Ambos SDK proporcionan patrones de acceso similares a los datos de respuesta, entre los que se incluyen:

Contenido del mensaje
Uso de tokens
Información del modelo
Motivo de finalización

Lista de comprobación de migración

Use esta lista de comprobación para garantizar una migración sin problemas:

Instalación del SDK de OpenAI para el lenguaje de programación
Actualización del código de autenticación (clave de API o Microsoft Entra ID)
Cambio de las direcciones URL del punto de conexión de .services.ai.azure.com/models a .openai.azure.com/openai/v1/
Cambiar el ámbito de credencial de https://cognitiveservices.azure.com/.default a https://ai.azure.com/.default
Actualización del código de inicialización del cliente
Especifique siempre el parámetro model con su nombre de implementación.
Actualizar las llamadas al método de petición (complete → chat.completions.create)
Actualización del código de streaming si procede
Actualización del control de errores para usar excepciones del SDK de OpenAI
Prueba exhaustiva de todas las funcionalidades
Actualizar documentación y comentarios de código

Solución de problemas

Errores de autenticación

Si experimenta errores de autenticación:

Compruebe que la clave de API es correcta y no ha expirado
Para Microsoft Entra ID, asegúrese de que la aplicación tiene los permisos correctos.
Compruebe que el ámbito de credencial está establecido en https://ai.azure.com/.default

Errores de punto de conexión

Si recibe errores de punto de conexión:

Compruebe que el formato de dirección URL del punto de conexión incluye /openai/v1/ al final.
Asegúrese de que el nombre del recurso es correcto.
Compruebe que la implementación del modelo existe y está activa.

Errores de modelo no encontrados

Si recibe el mensaje de error 'modelo no encontrado':

Compruebe que está usando el nombre de la implementación, no el nombre del modelo.
Compruebe que la implementación está activa en el recurso Microsoft Foundry.
Asegúrese de que el nombre de implementación coincida exactamente (sensible a mayúsculas y minúsculas).

Comentarios

¿Le ha resultado útil esta página?

Last updated on 2026-06-11

Migración desde Azure SDK de inferencia de IA al SDK de OpenAI

Ventajas de migrar

Diferencias clave

Configuración

Configuración del cliente

Completaciones del chat

Transmisión en tiempo real

Responses

Reasoning

Incrustaciones

Configuración

Configuración del cliente

Completaciones del chat

Transmisión en tiempo real

Responses

Reasoning

Incrustaciones

Configuración

Configuración del cliente

Completaciones del chat

Transmisión en tiempo real

Responses

Reasoning

Incrustaciones

Configuración

Configuración del cliente

Completaciones del chat

Transmisión en tiempo real

Responses

Reasoning

Incrustaciones

Configuración

Configuración del cliente

Completaciones del chat

Transmisión en tiempo real

Responses

Reasoning

Incrustaciones

Patrones comunes de migración

Control de parámetros de modelo

Formato de dirección URL del punto de conexión

Estructura de respuesta

Lista de comprobación de migración

Solución de problemas

Errores de autenticación

Errores de punto de conexión

Errores de modelo no encontrados

Contenido relacionado

Comentarios

Recursos adicionales