Refactor: get signer/config from sync client's base_client#147
Refactor: get signer/config from sync client's base_client#147YouNeedCryDear merged 4 commits intooracle:mainfrom
Conversation
34a3b1b to
3b92588
Compare
- Remove oci_signer and oci_config attributes from OCIGenAIBase; access signer/config directly from self.client.base_client - Use LangChain's UsageMetadata directly instead of OCIUtils wrapper - Add asyncio.Lock for thread-safe async client initialization - Add __aenter__ and __aexit__ to OCIAsyncClient for context manager support - Add _get_oci_genai_api_version() helper for SDK version detection
a10e107 to
98460c7
Compare
- Fix import block sorting in async_support.py (ruff I001) - Fix UsageMetadata test to not use isinstance with TypedDict
98460c7 to
89a0a35
Compare
Usage Examplefrom langchain_oci import ChatOCIGenAI
import asyncio
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
compartment_id="ocid1.compartment...",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
auth_type="API_KEY",
)
async def main():
# Single async call
response = await llm.ainvoke("What is the capital of France?")
print(response.content)
# Async streaming
async for chunk in llm.astream("Write a haiku about clouds"):
print(chunk.content, end="", flush=True)
# Concurrent requests (this is where async shines)
questions = [
"What is 2+2?",
"What is the capital of Japan?",
"Who wrote Hamlet?",
]
responses = await asyncio.gather(*[llm.ainvoke(q) for q in questions])
for q, r in zip(questions, responses):
print(f"{q} -> {r.content}")
# Clean up the async client
await llm.aclose()
asyncio.run(main())The concurrent example is the key win - those 3 requests run in parallel instead of waiting for each one sequentially. |
| else: | ||
| chat_request = self._provider.oci_chat_request(**chat_params) # type: ignore[attr-defined] | ||
|
|
||
| return { |
There was a problem hiding this comment.
My suggestion is to use the _prepare_request inside then modify the return type using to_dict.
Address review feedback by calling _prepare_request internally and converting OCI model objects to dicts using to_dict, instead of duplicating the request preparation logic.
|
@YouNeedCryDear Addressed your feedback - now using Tested with OCI GenAI:
Ready for re-review when you have a chance. Thanks! |
|
I think the get async client and lock implementation is still no optimal, which might introduce race condition. Could you please refer to what langchain-aws is doing? either use threading.lock or use pydantic post init? @fede-kamel |
Replace asyncio.Lock-based lazy initialization with @cached_property, following the pattern used by langchain-anthropic and langchain-aws. The previous implementation had a potential race condition where the lock creation itself could race (two coroutines seeing None and each creating their own lock). While unlikely in practice, using @cached_property is cleaner and avoids this class of issues entirely. Changes: - Replace _get_async_client_lock() and _get_async_client() with @cached_property _async_client - Update aclose() to delete from __dict__ for proper cache invalidation - Update tests to use property access instead of await
|
@YouNeedCryDear Good catch on the race condition - you're right that the lock creation itself could race: def _get_async_client_lock(self) -> asyncio.Lock:
if self._async_client_lock is None: # Coroutine A checks, sees None
self._async_client_lock = asyncio.Lock() # Coroutine B also sees None
return self._async_client_lock # They return different locks!While this would be unlikely to occur in practice (requires two async calls hitting a brand-new instance simultaneously), you're absolutely right that it's better to eliminate this class of issues entirely. I've switched to using @cached_property
def _async_client(self) -> OCIAsyncClient:
"""Get the async client, creating it on first access."""
base_client = self.client.base_client
return OCIAsyncClient(
service_endpoint=self.service_endpoint,
signer=base_client.signer,
config=getattr(base_client, "config", {}),
)This is cleaner, more idiomatic, and |
Changes: - Remove Async Support from CHANGELOG 0.2.0 (in separate PR oracle#147) - Fix duplicate gemini-2.5-flash row, add gemini-2.5-flash-lite - Add OpenAI column to feature matrix in MODELS.md - Add note that feature matrix reflects tested capabilities - Fix top_k range to note it's model-dependent (not 1-500 for all) - Add ChatOCIOpenAI to provider table in Tutorial 09 - Fix OpenAI Vision capability (GPT-4o supports vision)
Remove changes that inadvertently reverted PR oracle#147 refactoring: - Restore cached_property async client init in async_mixin.py - Restore dynamic API version detection in async_support.py - Restore __aenter__/__aexit__ on OCIAsyncClient - Remove oci_signer/oci_config from OCIGenAIBase - Restore test files to main state
Summary
Addresses review feedback from PR #124, improving async support with proper thread-safety and cleaner architecture.
Why Async Support Matters
Without native async, LangChain falls back to
asyncio.to_thread()which wraps sync calls in a thread pool. This works but has overhead and doesn't scale well.With native async:
await llm.ainvoke()doesn't block the event loopasync for chunk in llm.astream()delivers tokens without thread overheadChanges
asyncio.Lockwith double-checked locking to prevent race conditions when multiple coroutines initialize the async clientself.client.base_clientinstead of storing redundant attributesOCIAsyncClientnow implements__aenter__/__aexit__for proper resource cleanupUsageMetadatadirectly instead of custom classTested Models
meta.llama-3.3-70b-instructmeta.llama-4-maverick-17b-128e-instructcohere.command-r-plus-08-2024google.gemini-2.5-flashxai.grok-2-1212openai.gpt-4oopenai.gpt-4o-miniopenai.gpt-5SDK Migration
Once the OCI SDK adds native async support, replacing
OCIAsyncClientis straightforward:The mixin architecture keeps async logic isolated, so migration is a swap.