-
Notifications
You must be signed in to change notification settings - Fork 76
feat: async search #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: async search #181
Conversation
Summary of ChangesHello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the search capabilities by introducing asynchronous operations across the search infrastructure. It establishes an asynchronous contract for searchers, provides a dedicated HTTP client for efficient and rate-limited async requests, and integrates these changes into the InterPro search implementation and the core search service. This refactoring aims to improve performance and responsiveness by allowing concurrent execution of search queries. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces asynchronous search capabilities, enhancing performance by adding a new HTTPClient for managing asynchronous HTTP requests with rate limiting and retries, and refactoring InterProSearch to use async/await. No security vulnerabilities were found. However, there are a few suggestions for improvement: consider running network requests concurrently in InterProSearch for better performance, update HTTPClient to use a more modern asyncio API, and clean up dead code in SearchService.
| for result in data.get("results", []): | ||
| interpro_acc = result.get("metadata", {}).get("accession") | ||
| if interpro_acc: | ||
| entry_details = self.get_entry_details(interpro_acc) | ||
| entry_details = await self.get_entry_details(interpro_acc) | ||
| if entry_details: | ||
| result["entry_details"] = entry_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The await inside this for-loop causes the API calls to get_entry_details to be executed sequentially. This can be a significant performance bottleneck as each API call will wait for the previous one to complete. To take full advantage of asyncio for I/O-bound operations, these calls should be made concurrently.
You can refactor this part to use asyncio.gather to perform the requests in parallel. You'll also need to add import asyncio at the top of the file.
Example refactoring to replace lines 76-81:
results = data.get("results", [])
async def fetch_and_update(result):
interpro_acc = result.get("metadata", {}).get("accession")
if interpro_acc:
entry_details = await self.get_entry_details(interpro_acc)
if entry_details:
result["entry_details"] = entry_details
tasks = [fetch_and_update(r) for r in results]
if tasks:
await asyncio.gather(*tasks)| if self._executor is None: | ||
| self._executor = ThreadPoolExecutor(max_workers=self._max_workers) | ||
|
|
||
| loop = asyncio.get_event_loop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asyncio.get_event_loop() is deprecated since Python 3.10 and its usage is discouraged. It's recommended to use asyncio.get_running_loop() instead, which is the modern and safer way to get the current event loop in an async context.
| loop = asyncio.get_event_loop() | |
| loop = asyncio.get_running_loop() |
| valid_seeds.append(seed) | ||
| queries.append(query) | ||
|
|
||
| if not queries: | ||
| return [], {} | ||
|
|
||
| # Perform concurrent searches | ||
| results = run_concurrent( | ||
| partial( | ||
| self._perform_search, | ||
| searcher_obj=self.searcher, | ||
| data_source=self.data_source, | ||
| ), | ||
| seed_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring is a good improvement. As a result of this change, the _perform_search method is no longer used and can be removed to clean up the code. Additionally, the now-unused _perform_search method contains a bug where it calls the new async search method without await, so removing it would also prevent this latent bug from being reintroduced later.
No description provided.