Modelos de razonamiento de Azure OpenAI

Azure modelos de razonamiento de OpenAI están diseñados para abordar tareas de razonamiento y resolución de problemas con un mayor enfoque y capacidad. Estos modelos invierten más tiempo en procesar y comprender la solicitud del usuario, lo que hace que sean excepcionalmente fuertes en áreas como ciencia, codificación y matemáticas en comparación con las iteraciones anteriores.

Funcionalidades clave de los modelos de razonamiento:

Compleja generación de código: capaz de generar algoritmos y gestionar tareas avanzadas de codificación para apoyar a los desarrolladores.
Solución avanzada de problemas: ideal para sesiones completas de lluvia de ideas y abordar desafíos multifacéticos.
Comparación compleja de documentos: perfecto para analizar contratos, archivos de casos o documentos legales para identificar diferencias sutiles.
Seguimiento de Instrucciones y la Administración de flujos de trabajo: especialmente eficaz para administrar flujos de trabajo que requieren contextos más cortos.

Requisitos previos

Un modelo de razonamiento de OpenAI implementado en Azure.
Si usa los ejemplos de REST:
- Instale el CLI de Azure. Para obtener más información, vea Install the CLI de Azure.
- Inicie sesión con az loginy, a continuación, genere un token de portador y almacénelo en la variable de AZURE_OPENAI_AUTH_TOKEN entorno.
```
az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv
```

Uso

Actualmente, estos modelos no admiten el mismo conjunto de parámetros que otros modelos que usan la API de finalizaciones de chat.

API de finalizaciones de chat

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Microsoft Entra ID:

Si no está familiarizado con el uso de Microsoft Entra ID para la autenticación, consulte How to configure Azure OpenAI in Microsoft Foundry Models with Microsoft Entra ID authentication.

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.chat.completions.create(
  model="YOUR-DEPLOYMENT-NAME", # replace with your model deployment name
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

Clave de API:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

response = client.chat.completions.create(
  model="YOUR-DEPLOYMENT-NAME", # replace with your model deployment name
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
  -d '{
      "model": "gpt-5",
      "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "What steps should I think about when writing my first Python API?"}
      ],
      "max_completion_tokens": 1000
  }'

Salida de la API de Completaciones de chat de Python:

{
  "id": "chatcmpl-AEj7pKFoiTqDPHuxOcirA9KIvf3yz",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Writing your first Python API is an exciting step in developing software that can communicate with other applications. An API (Application Programming Interface) allows different software systems to interact with each other, enabling data exchange and functionality sharing. Here are the steps you should consider when creating your first Python API...truncated for brevity.",
        "refusal": null,
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1728073417,
  "model": "o1-2024-12-17",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_503a95a7d8",
  "usage": {
    "completion_tokens": 1843,
    "prompt_tokens": 20,
    "total_tokens": 1863,
    "completion_tokens_details": {
      "audio_tokens": null,
      "reasoning_tokens": 448
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "custom_blocklists": {
          "filtered": false
        },
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

Esfuerzo de razonamiento

Nota

Los modelos de razonamiento tienen reasoning_tokens como parte de completion_tokens_details en la respuesta del modelo. Estos son tokens ocultos que no se devuelven como parte del contenido de respuesta del mensaje, pero que el modelo usa para ayudar a generar una respuesta final a la solicitud. reasoning_effort se puede establecer en low, mediumo high para todos los modelos de razonamiento, excepto o1-mini. Cuanto mayor sea la configuración de esfuerzo, más tiempo pasará el modelo procesando la solicitud, lo que generalmente dará lugar a un mayor número de reasoning_tokens.

Mensajes del desarrollador

Los mensajes de desarrollador ("role": "developer") son funcionalmente los mismos que los mensajes del sistema.

Agregar un mensaje de desarrollador al ejemplo de código anterior tendría el siguiente aspecto:


using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    ReasoningEffortLevel = ChatReasoningEffortLevel.Low,
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Microsoft Entra ID:

Si no está familiarizado con el uso de Microsoft Entra ID para la autenticación, consulte Cómo configurar Azure OpenAI con Microsoft Entra ID autenticación.

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
  DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = OpenAI(
  base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=token_provider,
)

response = client.chat.completions.create(
  model="YOUR-DEPLOYMENT-NAME",  # replace with your model deployment name
  messages=[
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
  ],
  max_completion_tokens=5000,
  reasoning_effort="medium",  # low, medium, or high
)

print(response.model_dump_json(indent=2))

Clave de API:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

response = client.chat.completions.create(
    model="gpt-5-mini", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high
)

print(response.model_dump_json(indent=2))

curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
  -d '{
      "model": "gpt-5",
      "messages": [
        {"role": "developer", "content": "You are a helpful assistant."},
          {"role": "user", "content": "What steps should I think about when writing my first Python API?"}
      ],
      "max_completion_tokens": 1000,
      "reasoning_effort": "medium"
  }'

Salida de la API de Completaciones de chat de Python:

{
  "id": "chatcmpl-CaODNsQOHoRLcb9JVSKYY1e2Iss5s",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Here's a practical, beginner‑friendly checklist to guide you through writing your first Python API, from idea to production.\n\n1) Clarify goals and constraints\n- Who will use it (internal team, public), what problems it solves, expected traffic, latency requirements.\n- Resources you'll expose (users, orders, etc.) and core operations.\n- Non‑functional needs: security, compliance, uptime, scalability.\n\n2) Choose your API style\n- REST (most common for CRUD and simple integrations).\n- GraphQL (flexible queries, more complex to secure/monitor).\n- gRPC (high‑performance, strongly typed, good for service‑to‑service).\n- For a first API, REST + JSON is usually best.\n\n3) Design the contract first\n- Draft an OpenAPI/Swagger spec: endpoints, request/response schemas, status codes, error model.\n- Decide naming conventions, pagination, filtering, sorting.\n- Define consistent time/date format (ISO‑8601, UTC), ID format, and field casing.\n- Plan versioning strategy (e.g., /v1) and deprecation policy.\n\n4) Plan security and auth\n- Pick auth: API keys for simple internal use; OAuth2/JWT for user auth; mTLS for service‑to‑service.\n- CORS policy for browsers; HTTPS everywhere; security headers.\n- Validate all inputs; avoid leaking stack traces; define rate limits and quotas.\n\n5) Pick your Python stack\n- Frameworks: FastAPI (great typing, validation, auto docs), Flask (minimal), Django REST Framework (batteries included).\n- ASGI/WSGI server: Uvicorn or Gunicorn.\n- Data layer: PostgreSQL + SQLAlchemy/Django ORM; migrations with Alembic/Django migrations.\n- Caching: Redis (optional).\n- Background jobs: Celery/RQ (if needed).\n\n6) Set up the project\n- Create a virtual environment; choose dependency management (pip, Poetry).\n- Establish project structure (app, api, models, services, tests).\n- Add linting/formatting/type checks: black, isort, flake8, mypy; pre‑commit hooks.\n- Configuration via environment variables; secrets via a manager (not in code).\n\n7) Implement core functionality\n- Build endpoints that match your spec; keep business logic in a service layer, not in route handlers.\n- Schema validation (Pydantic with FastAPI, Marshmallow for Flask).\n- Consistent responses and errors; use clear status codes (201 create, 204 no content, 400/404/409/422, 500).\n- Pagination and filtering; idempotency for certain POST operations; ETags/conditional requests if useful.\n\n8) Error handling and an error model\n- Define a standard error body (code, message, details, correlation_id).\n- Log errors with context; don't expose internal details to clients.\n\n9) Testing strategy\n- Unit tests for services/validators.\n- Integration tests for endpoints (pytest + httpx/requests) with a test database.\n- Contract tests to assert the API matches the OpenAPI spec.\n- Mock external services; measure coverage and focus on critical paths.\n\n10) Documentation and developer experience\n- Auto‑generated docs (FastAPI provides Swagger/ReDoc).\n- Write examples for each endpoint; onboarding and usage notes.\n- Keep a changelog and release notes.\n\n11) Observability and reliability\n- Structured logging (JSON), include request IDs/correlation IDs.\n- Metrics (requests, latency, error rates), health/readiness endpoints.\n- Tracing (OpenTelemetry) if you have multiple services.\n- Error reporting (Sentry or similar).\n\n12) Deployment and operations\n- Containerize with Docker; follow 12‑factor app principles.\n- CI/CD pipeline: run tests, build image, deploy, run migrations.\n- Choose hosting (Render, Fly.io, Railway, Heroku, AWS/GCP/Azure).\n- Configure scaling, connection pools, and timeouts; use a reverse proxy if needed.\n\n13) Performance and data concerns\n- Index your database; avoid N+1 queries; use connection pooling.\n- Load test key endpoints; profile hotspots.\n- Caching strategies where appropriate; consider async I/O for high‑concurrency workloads.\n\n14) Versioning and lifecycle management\n- Keep backward compatibility for minor changes; add fields rather than changing semantics.\n- Communicate deprecations; sunset old versions with a timeline.\n\n15) Governance, compliance, and safety\n- Handle PII correctly; data retention and audit logs if required.\n- Least‑privilege DB access; rotate secrets; review third‑party dependencies.\n\nBeginner‑friendly defaults\n- FastAPI + Pydantic + Uvicorn\n- PostgreSQL + SQLAlchemy + Alembic\n- pytest + httpx + coverage\n- black, isort, flake8, mypy, pre‑commit\n- Docker + simple CI (GitHub Actions) + a managed host\n\nCommon pitfalls to avoid\n- Inconsistent status codes or error formats.\n- Weak input validation and missing authentication.\n- Business logic inside route handlers (hard to test/maintain).\n- No migrations or tests; no logging/metrics.\n- Ignoring pagination and timezones; returning unbounded lists.\n\nIf you share whether it's public vs internal, expected traffic, and preferred framework, I can tailor this to a concrete starter plan and recommended tools.",
        "refusal": null,
        "role": "assistant",
        "annotations": [],
        "audio": null,
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1762788925,
  "model": "gpt-5-2025-08-07",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 2919,
    "prompt_tokens": 29,
    "total_tokens": 2948,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 1792,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

Resumen de razonamiento

Al usar los modelos de razonamiento más recientes con la API de respuestas , puede usar el parámetro de resumen de razonamiento para recibir resúmenes de la cadena de razonamiento de pensamiento del modelo.

Importante

No se admiten los intentos de extraer el razonamiento sin procesar a través de métodos distintos del parámetro de resumen de razonamiento, ya que puede infringir la Directiva de uso aceptable y dar lugar a limitaciones o suspensiones cuando se detecten.

using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

OpenAIResponseClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {
        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

OpenAIResponse response = await client.CreateResponseAsync(
    userInputText: "What's the optimal strategy to win at poker?",
    new ResponseCreationOptions()
    {
        ReasoningOptions = new ResponseReasoningOptions()
        {
            ReasoningEffortLevel = ResponseReasoningEffortLevel.High,
            ReasoningSummaryVerbosity = ResponseReasoningSummaryVerbosity.Auto,
        },
    });

// Get the reasoning summary from the first OutputItem (ReasoningResponseItem)
Console.WriteLine("=== Reasoning Summary ===");
foreach (var item in response.OutputItems)
{
    if (item is ReasoningResponseItem reasoningItem)
    {
        foreach (var summaryPart in reasoningItem.SummaryParts)
        {
            if (summaryPart is ReasoningSummaryTextPart textPart)
            {
                Console.WriteLine(textPart.Text);
            }
        }
    }
}

Console.WriteLine("\n=== Assistant Response ===");
// Get the assistant's output
Console.WriteLine(response.GetOutputText());

Deberá actualizar la biblioteca cliente de OpenAI para acceder a los parámetros más recientes.

pip install openai --upgrade

Microsoft Entra ID:

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.responses.create(
    input="Tell me about the curious case of neural text degeneration",
    model="gpt-5", # replace with model deployment name
    reasoning={
        "effort": "medium",
        "summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise 
    },
    text={
        "verbosity": "low" # New with GPT-5 models
    }
)

print(response.model_dump_json(indent=2))

Clave de API:

import os
from openai import OpenAI

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)

response = client.responses.create(
    input="Tell me about the curious case of neural text degeneration",
    model="gpt-5", # replace with model deployment name
    reasoning={
        "effort": "medium",
        "summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise 
    },
    text={
        "verbosity": "low" # New with GPT-5 models
    }
)

print(response.model_dump_json(indent=2))

curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
 -d '{
     "model": "gpt-5",
     "input": "Tell me about the curious case of neural text degeneration",
     "reasoning": {"summary": "auto"},
     "text": {"verbosity": "low"}
    }'

{
  "id": "resp_689a0a3090808190b418acf12b5cc40e0fc1c31bc69d8719",
  "created_at": 1754925616.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-5",
  "object": "response",
  "output": [
    {
      "id": "rs_689a0a329298819095d90c34dc9b80db0fc1c31bc69d8719",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_689a0a33009881909fe0fcf57cba30200fc1c31bc69d8719",
      "content": [
        {
          "annotations": [],
          "text": "Neural text degeneration refers to the ways language models produce low-quality, repetitive, or vacuous text, especially when generating long outputs. It's "curious" because models trained to imitate fluent text can still spiral into unnatural patterns. Key aspects:\n\n- Repetition and loops: The model repeats phrases or sentences ("I'm sorry, but..."), often due to high-confidence tokens reinforcing themselves.\n- Loss of specificity: Vague, generic, agreeable text that avoids concrete details.\n- Drift and contradiction: The output gradually departs from context or contradicts itself over long spans.\n- Exposure bias: During training, models see gold-standard prefixes; at inference, they must condition on their own imperfect outputs, compounding errors.\n- Likelihood vs. quality mismatch: Maximizing token-level likelihood doesn't align with human preferences for diversity, coherence, or factuality.\n- Token over-optimization: Frequent, safe tokens get overused; certain phrases become attractors.\n- Entropy collapse: With greedy or low-temperature decoding, the distribution narrows too much, causing repetitive, low-entropy text.\n- Length and beam search issues: Larger beams or long generations can favor bland, repetitive sequences (the "likelihood trap").\n\nCommon mitigations:\n\n- Decoding strategies:\n  - Top-k, nucleus (top-p), or temperature sampling to keep sufficient entropy.\n  - Typical sampling and locally typical sampling to avoid dull but high-probability tokens.\n  - Repetition penalties, presence/frequency penalties, no-repeat n-grams.\n  - Contrastive decoding (and variants like DoLa) to filter generic continuations.\n  - Min/max length, stop sequences, and beam search with diversity/penalties.\n\n- Training and alignment:\n  - RLHF/DPO to better match human preferences for non-repetitive, helpful text.\n  - Supervised fine-tuning on high-quality, diverse data; instruction tuning.\n  - Debiasing objectives (unlikelihood training) to penalize repetition and banned patterns.\n  - Mixture-of-denoisers or latent planning to improve long-range coherence.\n\n- Architectural and planning aids:\n  - Retrieval-augmented generation to ground outputs.\n  - Tool use and structured prompting to constrain drift.\n  - Memory and planning modules, hierarchical decoding, or sentence-level control.\n\n- Prompting tips:\n  - Ask for concise answers, set token limits, and specify structure.\n  - Provide concrete constraints or content to reduce generic filler.\n  - Use "say nothing if uncertain" style instructions to avoid vacuity.\n\nRepresentative papers/terms to search:\n- Holtzman et al., "The Curious Case of Neural Text Degeneration" (2020): nucleus sampling.\n- Welleck et al., "Neural Text Degeneration with Unlikelihood Training."\n- Li et al., "A Contrastive Framework for Decoding."\n- Su et al., "DoLa: Decoding by Contrasting Layers."\n- Meister et al., "Typical Decoding."\n- Ouyang et al., "Training language models to follow instructions with human feedback."\n\nIn short, degeneration arises from a mismatch between next-token likelihood and human preferences plus decoding choices; careful decoding, training objectives, and grounding help prevent it.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "minimal",
    "generate_summary": null,
    "summary": "detailed"
  },
  "safety_identifier": null,
  "service_tier": "default",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 16,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 657,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 673
  },
  "user": null,
  "content_filters": null,
  "store": true
}

Nota

Incluso cuando está habilitado, no se garantiza que se generen resúmenes de razonamiento para cada paso o solicitud. Este es el comportamiento esperado.

Python lark

Los modelos de razonamiento de la serie GPT-5 tienen la capacidad de llamar a un nuevo custom_tool denominado lark_tool. Esta herramienta se basa en Python lark y se puede usar para una restricción más flexible de la salida del modelo.

API de respuestas

{
  "model": "gpt-5-2025-08-07",
  "input": "please calculate the area of a circle with radius equal to the number of 'r's in strawberry",
  "tools": [
    {
      "type": "custom",
      "name": "lark_tool",
      "format": {
        "type": "grammar",
        "syntax": "lark",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
      }
    }
  ],
  "tool_choice": "required"
}

Microsoft Entra ID:

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.responses.create(  
    model="gpt-5",  # replace with your model deployment name  
    tools=[  
        {  
            "type": "custom",
            "name": "lark_tool",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
            }
        }  
    ],  
    input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],  
)  

print(response.model_dump_json(indent=2))

Clave de API:

import os
from openai import OpenAI

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)

response = client.responses.create(  
    model="gpt-5",  # replace with your model deployment name  
    tools=[  
        {  
            "type": "custom",
            "name": "lark_tool",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
            }
        }  
    ],  
    input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],  
)  

print(response.model_dump_json(indent=2))

Salida:

{
  "id": "resp_689a0cf927408190b8875915747667ad01c936c6ffb9d0d3",
  "created_at": 1754926332.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-5",
  "object": "response",
  "output": [
    {
      "id": "rs_689a0cfd1c888190a2a67057f471b5cc01c936c6ffb9d0d3",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_689a0d00e60c81908964e5e9b2d6eeb501c936c6ffb9d0d3",
      "content": [
        {
          "annotations": [],
          "text": ""strawberry" has 3 r's, so the radius is 3.\nArea = πr<sup>2</sup> = π × 3<sup>2</sup> = 9π ≈ 28.27 square units.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [
    {
      "name": "lark_tool",
      "parameters": null,
      "strict": null,
      "type": "custom",
      "description": null,
      "format": {
        "type": "grammar",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/",
        "syntax": "lark"
      }
    }
  ],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 139,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 240,
    "output_tokens_details": {
      "reasoning_tokens": 192
    },
    "total_tokens": 379
  },
  "user": null,
  "content_filters": null,
  "store": true
}

Finalizaciones de chat

{
  "messages": [
    {
      "role": "user",
      "content": "Which one is larger, 42 or 0?"
    }
  ],
  "tools": [
    {
      "type": "custom",
      "name": "custom_tool",
      "custom": {
        "name": "lark_tool",
        "format": {
          "type": "grammar",
          "grammar": {
            "syntax": "lark",
            "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
          }
        }
      }
    }
  ],
  "tool_choice": "required",
  "model": "gpt-5-2025-08-07"
}

Disponibilidad

Disponibilidad de regiones

Modelo	Región	Acceso limitado
`gpt-chat-latest`	Estándar global: Este de EE. UU. 2 Centro de Suecia Centro-sur de EE. UU. Centro de Polonia	No se necesita ninguna solicitud de acceso.
`gpt-5.5`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso. Solicitud de cuota necesaria en función del nivel de cuota. Las suscripciones de nivel 5 y 6 tienen cuota de forma predeterminada.
`gpt-5.4-mini`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`gpt-5.4-nano`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`gpt-5.4-pro`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.4`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.3-codex`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.2-codex`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.2`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.1-codex-max`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.1`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.1-chat`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`gpt-5.1-codex`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5.1-codex-mini`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`gpt-5-pro`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5-codex`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`gpt-5-mini`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`gpt-5-nano`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`o3-pro`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`codex-mini`	Disponibilidad del modelo	No se necesita ninguna solicitud de acceso.
`o4-mini`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`o3`	Disponibilidad del modelo	El acceso ya no está restringido para este modelo.
`o3-mini`	Disponibilidad del modelo.	El acceso ya no está restringido para este modelo.
`o1`	Disponibilidad del modelo.	El acceso ya no está restringido para este modelo.

Característica	gpt-5.5, 2026-04-24	gpt-5.4-nano, 2026-03-17	gpt-5.4-mini, 2026-03-17	gpt-5.4-pro	gpt-5.4, 2026-03-05	gpt-5.3-codex, 2026-02-24	gpt-5.2-codex, 2026-01-14	gpt-5.2, 2025-12-11	gpt-5.1-codex-max, 2025-12-04	gpt-5.1, 2025-11-13	gpt-5.1-chat, 2025-11-13	gpt-5.1-codex, 2025-11-13	gpt-5.1-codex-mini, 2025-11-13	gpt-5-pro, 2025-10-06	gpt-5-codex, 2025-09-011	gpt-5, 2025-08-07	gpt-5-mini, 2025-08-07	gpt-5-nano, 2025-08-07
Mensajes de desarrollador	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Salidas estructuradas	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Ventana de contexto	1,050,000 Entrada: 922,000 Salida: 128,000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	1,050,000 Entrada: 922,000 Salida: 128,000	1,050,000 Entrada: 922,000 Salida: 128,000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	128,000 Entrada: 111 616 Salida: 16 384	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000	400,000 Entrada: 272 000 Salida: 128 000
Esfuerzo de razonamiento⁷	✅	✅	✅	✅	✅	✅	✅	✅	✅ ⁶	✅ ⁴	✅	✅	✅	✅ ⁵	✅	✅	✅	✅
Entrada de imagen	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
API de finalizaciones de chat	✅	✅	✅	-	✅	-	-	✅	-	✅	✅	-	-	-	-	✅	✅	✅
API de respuestas	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Funciones y herramientas	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Llamadas a herramientas paralelas¹	✅	✅	✅	-	✅	✅	✅	✅	✅	✅	✅	✅	✅	-	✅	✅	✅	✅
`max_completion_tokens` ²	✅	✅	✅	-	✅	-	-	✅	-	✅	✅	-	-	-	-	✅	✅	✅
Mensajes del sistema ³	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Resumen de razonamiento	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Transmisión en directo	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	-	✅	✅	✅	✅

¹ No se admiten llamadas a herramientas paralelas cuando reasoning_effort se establece en minimal

^{2 Los} modelos de razonamiento solo funcionarán con el max_completion_tokens parámetro al usar la API de finalizaciones de chat. Use max_output_tokens con la API de respuestas.

³ Los modelos de razonamiento más recientes admiten mensajes del sistema para facilitar la migración. No debe usar tanto un mensaje de desarrollador como un mensaje del sistema en la misma solicitud de API.

⁴gpt-5.1reasoning_effort tiene nonecomo valor predeterminado . Al actualizar desde modelos de razonamiento anteriores a gpt-5.1, tenga en cuenta que es posible que necesite actualizar su código para pasar explícitamente un nivel de reasoning_effort si desea que se realice.

⁵gpt-5-pro solo admite reasoning_efforthigh, este es el valor predeterminado incluso cuando no se pasa explícitamente al modelo.

⁶gpt-5.1-codex-max agrega soporte para un nuevo reasoning_effort nivel de xhigh que es el nivel más alto en el que se puede establecer el esfuerzo de razonamiento.

⁷gpt-5.2, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-maxy gpt-5.1-codex-mini admiten 'None' como un valor para el reasoning_effort parámetro . Si desea usar estos modelos para generar respuestas sin razonamiento, establezca reasoning_effort='None'. Esta configuración puede aumentar la velocidad.

Nuevas características de razonamiento GPT-5

Característica	Descripción
`reasoning_effort`	`xhigh` se admite solo con `gpt-5.1-codex-max` `minimal` se admite solo con los modelos de razonamiento originales de GPT-5. `minimal` no se admite con `gpt-5.1` o mayor ^* Opciones: `none`, `minimal`, `low`, `medium`, , `high`, `xhigh`
`verbosity`	Un nuevo parámetro que proporciona un control más pormenorizado sobre cómo será concisa la salida del modelo. Options:`low`, `medium`, `high`.
`preamble`	Los modelos de razonamiento de la serie GPT-5 tienen la capacidad de dedicar tiempo adicional "pensando" antes de ejecutar una llamada a función o herramienta. Cuando se produce este planeamiento, el modelo puede proporcionar información sobre los pasos de planeación de la respuesta del modelo a través de un nuevo objeto denominado `preamble` objeto . No se garantiza la generación de preámbulos en la respuesta del modelo, aunque puede animar el modelo mediante el `instructions` parámetro y pasar contenido como "Debe planear ampliamente antes de cada llamada de función. SIEMPRE muestra tu plan al usuario antes de llamar a cualquier función.
herramientas permitidas	Puede especificar varias herramientas en `tool_choice` en lugar de solo una.
tipo de herramienta personalizado	Habilita los resultados de texto sin formato (no JSON)
`lark_tool`	Permite usar algunas de las funcionalidades de Python lark para una restricción más flexible de las respuestas del modelo.

^* gpt-5-codex tampoco admite reasoning_effortminimal.

Para obtener más información, recomendamos leer el manual de indicaciones GPT-5 de OpenAI y su guía de funciones de GPT-5.

Característica	codex-mini, 2025-05-16	o3-pro, 2025-06-10	o4-mini, 2025-04-16	o3, 2025-04-16	o3-mini, 2025-01-31	o1, 2024-12-17
Mensajes de desarrollador	✅	✅	✅	✅	✅	✅
Salidas estructuradas	✅	✅	✅	✅	✅	✅
Ventana de contexto	Entrada: 200 000 Salida: 100 000	Entrada: 200 000 Salida: 100 000	Entrada: 200 000 Salida: 100 000	Entrada: 200 000 Salida: 100 000	Entrada: 200 000 Salida: 100 000	Entrada: 200 000 Salida: 100 000
Esfuerzo de razonamiento	✅	✅	✅	✅	✅	✅
Entrada de imagen	✅	✅	✅	✅	-	✅
API de finalizaciones de chat	-	-	✅	✅	✅	✅
API de respuestas	✅	✅	✅	✅	✅	✅
Funciones y herramientas	✅	✅	✅	✅	✅	✅
Llamadas a herramientas paralelas	-	-	-	-	-	-
`max_completion_tokens` ¹	✅	✅	✅	✅	✅	✅
Mensajes ^{del sistema 2}	✅	✅	✅	✅	✅	✅
Resumen de razonamiento	✅	-	✅	✅	-	-
Streaming ³	✅	-	✅	✅	✅	-

^{1 Los} modelos de razonamiento solo funcionarán con el max_completion_tokens parámetro al usar la API de finalizaciones de chat. Use max_output_tokens con la API de respuestas.

² El modelo de la serie O^* más reciente admite mensajes del sistema para facilitar la migración. Cuando se usa un mensaje del sistema con o4-mini, o3, o3-miniy o1 se tratará como un mensaje de desarrollador. No debe usar tanto un mensaje de desarrollador como un mensaje del sistema en la misma solicitud de API. ³ Streaming para o3 es solo de acceso limitado.

Nota

Para evitar tiempos de espera, se recomienda el modo en segundo plano para o3-pro.
o3-pro no admite actualmente la generación de imágenes.

No compatible

Actualmente, no se admiten los siguientes modelos de razonamiento:

temperature, top_p, presence_penalty, frequency_penalty, logprobs, top_logprobs, , logit_biasmax_tokens

Salida de Markdown

De forma predeterminada, los o3-mini modelos y o1 no intentarán generar resultados que incluyan formato markdown. Un caso de uso común en el que este comportamiento no es deseable es cuando se quiere que el modelo produzca código contenido dentro de un bloque de código markdown. Cuando el modelo genera una salida sin formato Markdown, pierde características, como el resaltado de sintaxis y bloques de código que se pueden copiar en experiencias interactivas del área de juegos. Para invalidar este nuevo comportamiento predeterminado y fomentar la inclusión de Markdown en las respuestas del modelo, agregue la cadena Formatting re-enabled al principio del mensaje del desarrollador.

Agregar Formatting re-enabled al principio del mensaje del desarrollador no garantiza que el modelo incluya formato markdown en su respuesta, solo aumenta la probabilidad. Hemos encontrado a partir de pruebas internas que Formatting re-enabled es menos eficaz por sí mismo con o1 modelo que con o3-mini.

Para mejorar el rendimiento de Formatting re-enabled , puede aumentar aún más el principio del mensaje del desarrollador, lo que a menudo dará lugar a la salida deseada. En lugar de agregar Formatting re-enabled al principio del mensaje del desarrollador, puede experimentar con la adición de una instrucción inicial más descriptiva, como uno de los ejemplos siguientes:

Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
Formatting re-enabled - code output should be wrapped in markdown.

Dependiendo de la salida esperada, es posible que tenga que personalizar el mensaje inicial del desarrollador para dirigirse a su caso de uso específico.

Comentarios

¿Le ha resultado útil esta página?

Last updated on 2026-06-01

Modelos de razonamiento de Azure OpenAI

Requisitos previos

Uso

API de finalizaciones de chat

Esfuerzo de razonamiento

Mensajes del desarrollador

Resumen de razonamiento

Python lark

API de respuestas

Finalizaciones de chat

Disponibilidad

Disponibilidad de regiones

API y soporte de características

Nuevas características de razonamiento GPT-5

No compatible

Salida de Markdown

Comentarios

Recursos adicionales