Skip to content

Refactor: get signer/config from sync client's base_client#147

Merged
YouNeedCryDear merged 4 commits intooracle:mainfrom
fede-kamel:feature/async-support
Mar 2, 2026
Merged

Refactor: get signer/config from sync client's base_client#147
YouNeedCryDear merged 4 commits intooracle:mainfrom
fede-kamel:feature/async-support

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

@fede-kamel fede-kamel commented Feb 26, 2026

Summary

Addresses review feedback from PR #124, improving async support with proper thread-safety and cleaner architecture.

Why Async Support Matters

Without native async, LangChain falls back to asyncio.to_thread() which wraps sync calls in a thread pool. This works but has overhead and doesn't scale well.

With native async:

  • Multiple LLM calls run concurrently instead of sequentially (10 requests that take 30s one-by-one finish in ~3s)
  • In FastAPI/aiohttp servers, await llm.ainvoke() doesn't block the event loop
  • Streaming with async for chunk in llm.astream() delivers tokens without thread overhead

Changes

  • Added asyncio.Lock with double-checked locking to prevent race conditions when multiple coroutines initialize the async client
  • Access signer/config directly from self.client.base_client instead of storing redundant attributes
  • OCIAsyncClient now implements __aenter__/__aexit__ for proper resource cleanup
  • Added helper to detect API version from OCI SDK before falling back to known version
  • Using LangChain's UsageMetadata directly instead of custom class
  • Fixed async integration tests (env var consistency, updated default model)

Tested Models

Model Tests Time
meta.llama-3.3-70b-instruct 8/8 ✓ ~12s
meta.llama-4-maverick-17b-128e-instruct 8/8 ✓ ~15s
cohere.command-r-plus-08-2024 8/8 ✓ ~14s
google.gemini-2.5-flash 8/8 ✓ ~11s
xai.grok-2-1212 8/8 ✓ ~13s
openai.gpt-4o 8/8 ✓ ~11s
openai.gpt-4o-mini 8/8 ✓ ~10s
openai.gpt-5 8/8 ✓ ~125s

SDK Migration

Once the OCI SDK adds native async support, replacing OCIAsyncClient is straightforward:

# Current
self._async_client = OCIAsyncClient(
    service_endpoint=self.service_endpoint,
    signer=base_client.signer,
    config=getattr(base_client, "config", {}),
)

# Future (when SDK supports async)
self._async_client = GenerativeAiInferenceAsyncClient(
    config=self.client.base_client.config,
    signer=self.client.base_client.signer,
    service_endpoint=self.service_endpoint,
)

The mixin architecture keeps async logic isolated, so migration is a swap.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 26, 2026
@fede-kamel fede-kamel marked this pull request as ready for review February 26, 2026 02:12
@fede-kamel fede-kamel force-pushed the feature/async-support branch 2 times, most recently from 34a3b1b to 3b92588 Compare February 26, 2026 02:27
- Remove oci_signer and oci_config attributes from OCIGenAIBase;
  access signer/config directly from self.client.base_client
- Use LangChain's UsageMetadata directly instead of OCIUtils wrapper
- Add asyncio.Lock for thread-safe async client initialization
- Add __aenter__ and __aexit__ to OCIAsyncClient for context manager support
- Add _get_oci_genai_api_version() helper for SDK version detection
@fede-kamel fede-kamel force-pushed the feature/async-support branch 2 times, most recently from a10e107 to 98460c7 Compare February 26, 2026 05:25
- Fix import block sorting in async_support.py (ruff I001)
- Fix UsageMetadata test to not use isinstance with TypedDict
@fede-kamel fede-kamel force-pushed the feature/async-support branch from 98460c7 to 89a0a35 Compare February 26, 2026 05:29
@fede-kamel
Copy link
Copy Markdown
Contributor Author

fede-kamel commented Feb 26, 2026

Usage Example

from langchain_oci import ChatOCIGenAI
import asyncio

llm = ChatOCIGenAI(
    model_id="meta.llama-3.3-70b-instruct",
    compartment_id="ocid1.compartment...",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    auth_type="API_KEY",
)

async def main():
    # Single async call
    response = await llm.ainvoke("What is the capital of France?")
    print(response.content)

    # Async streaming
    async for chunk in llm.astream("Write a haiku about clouds"):
        print(chunk.content, end="", flush=True)

    # Concurrent requests (this is where async shines)
    questions = [
        "What is 2+2?",
        "What is the capital of Japan?",
        "Who wrote Hamlet?",
    ]
    responses = await asyncio.gather(*[llm.ainvoke(q) for q in questions])
    for q, r in zip(questions, responses):
        print(f"{q} -> {r.content}")

    # Clean up the async client
    await llm.aclose()

asyncio.run(main())

The concurrent example is the key win - those 3 requests run in parallel instead of waiting for each one sequentially.

@YouNeedCryDear YouNeedCryDear self-requested a review February 26, 2026 17:41
else:
chat_request = self._provider.oci_chat_request(**chat_params) # type: ignore[attr-defined]

return {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to use the _prepare_request inside then modify the return type using to_dict.

Address review feedback by calling _prepare_request internally
and converting OCI model objects to dicts using to_dict, instead
of duplicating the request preparation logic.
@fede-kamel
Copy link
Copy Markdown
Contributor Author

@YouNeedCryDear Addressed your feedback - now using _prepare_request internally and converting to dict with to_dict.

Tested with OCI GenAI:

Model Tests Time
meta.llama-3.3-70b-instruct 8/8 ✓ ~9s
google.gemini-2.5-flash 8/8 ✓ ~27s

Ready for re-review when you have a chance. Thanks!

@YouNeedCryDear
Copy link
Copy Markdown
Member

I think the get async client and lock implementation is still no optimal, which might introduce race condition. Could you please refer to what langchain-aws is doing? either use threading.lock or use pydantic post init? @fede-kamel

Replace asyncio.Lock-based lazy initialization with @cached_property,
following the pattern used by langchain-anthropic and langchain-aws.

The previous implementation had a potential race condition where the
lock creation itself could race (two coroutines seeing None and each
creating their own lock). While unlikely in practice, using
@cached_property is cleaner and avoids this class of issues entirely.

Changes:
- Replace _get_async_client_lock() and _get_async_client() with
  @cached_property _async_client
- Update aclose() to delete from __dict__ for proper cache invalidation
- Update tests to use property access instead of await
@fede-kamel
Copy link
Copy Markdown
Contributor Author

@YouNeedCryDear Good catch on the race condition - you're right that the lock creation itself could race:

def _get_async_client_lock(self) -> asyncio.Lock:
    if self._async_client_lock is None:      # Coroutine A checks, sees None
        self._async_client_lock = asyncio.Lock()  # Coroutine B also sees None
    return self._async_client_lock            # They return different locks!

While this would be unlikely to occur in practice (requires two async calls hitting a brand-new instance simultaneously), you're absolutely right that it's better to eliminate this class of issues entirely.

I've switched to using @cached_property, following the pattern used by langchain-anthropic and langchain-aws:

@cached_property
def _async_client(self) -> OCIAsyncClient:
    """Get the async client, creating it on first access."""
    base_client = self.client.base_client
    return OCIAsyncClient(
        service_endpoint=self.service_endpoint,
        signer=base_client.signer,
        config=getattr(base_client, "config", {}),
    )

This is cleaner, more idiomatic, and cached_property handles the thread-safety concerns. Tests updated and passing.

fede-kamel added a commit to fede-kamel/langchain-oracle that referenced this pull request Feb 27, 2026
Changes:
- Remove Async Support from CHANGELOG 0.2.0 (in separate PR oracle#147)
- Fix duplicate gemini-2.5-flash row, add gemini-2.5-flash-lite
- Add OpenAI column to feature matrix in MODELS.md
- Add note that feature matrix reflects tested capabilities
- Fix top_k range to note it's model-dependent (not 1-500 for all)
- Add ChatOCIOpenAI to provider table in Tutorial 09
- Fix OpenAI Vision capability (GPT-4o supports vision)
@YouNeedCryDear YouNeedCryDear merged commit 2908166 into oracle:main Mar 2, 2026
12 checks passed
fede-kamel added a commit to fede-kamel/langchain-oracle that referenced this pull request Mar 19, 2026
Remove changes that inadvertently reverted PR oracle#147 refactoring:
- Restore cached_property async client init in async_mixin.py
- Restore dynamic API version detection in async_support.py
- Restore __aenter__/__aexit__ on OCIAsyncClient
- Remove oci_signer/oci_config from OCIGenAIBase
- Restore test files to main state
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants