Create enhanced AudioToTextEnhancedProvider that also reformats text from transcription#363
Create enhanced AudioToTextEnhancedProvider that also reformats text from transcription#363lukasdotcom wants to merge 6 commits intomainfrom
Conversation
…g handler Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
There was a problem hiding this comment.
Pull request overview
This PR extends the app’s Task Processing providers by introducing a paragraph-reformatting text provider and an “enhanced” audio-to-text provider that chains transcription → paragraph reformatting (when the Nextcloud task type is available).
Changes:
- Add
ReformatParagraphsProvider(core:text2text:reformatparagraphs) that inserts paragraph breaks based on LLM-returned “anchor” lines. - Add
AudioToTextEnhancedProviderthat runsAudioToTextProviderand then invokes the reformat-paragraphs task as a follow-up step. - Register the new providers conditionally based on task type availability and add a unit test + psalm reference for the new task type class.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
lib/TaskProcessing/ReformatParagraphsProvider.php |
New synchronous provider that requests anchor lines from the LLM and inserts paragraph breaks into the original input text. |
lib/TaskProcessing/AudioToTextEnhancedProvider.php |
New provider extending AudioToTextProvider and post-processing transcription via task processing manager. |
lib/TaskProcessing/AudioToTextProvider.php |
Makes dependencies protected to enable subclass access from the enhanced provider. |
lib/AppInfo/Application.php |
Registers the enhanced audio provider and the new reformat provider when the task type exists. |
tests/unit/Providers/OpenAiProviderTest.php |
Adds a unit test for ReformatParagraphsProvider and gates it by task type availability. |
psalm.xml |
Adds TextToTextReformatParagraphs as a referenced class for analysis. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (isset($input['model']) && is_string($input['model'])) { | ||
| $model = $input['model']; | ||
| } else { | ||
| $model = $this->appConfig->getValueString(Application::APP_ID, 'default_completion_model_id', Application::DEFAULT_MODEL_ID, lazy: true) ?: Application::DEFAULT_MODEL_ID; |
There was a problem hiding this comment.
This could be changed in the other providers too
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
…from transcription Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz> # Conflicts: # lib/TaskProcessing/ReformatParagraphsProvider.php
df57549 to
fb23749
Compare
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
fb23749 to
a85993e
Compare
| } | ||
|
|
||
| public function getOptionalOutputShape(): array { | ||
| return $this->audioToTextProvider->getOptionalOutputShape(); |
There was a problem hiding this comment.
Can we hardcode these? Otherwise changes in the audioToTextProvider will automatically change this provider as well, which feels like to much magic.
There was a problem hiding this comment.
Some of them it makes sense to change like for the output ones, but I would say the input ones should not be changed because they are passed into the audioToTextProvider directly anyway and should always be the same.
There was a problem hiding this comment.
Alright, that makes sense, yes.
| . 'A break is allowed only when the subject matter changes significantly. ' | ||
| . 'Output format: For each identified paragraph, return only the first 8 to 12 words verbatim from the input. ' | ||
| . 'Structure: Return exactly one paragraph per line. Do not include bullets, html tags, numbering, summaries, quotes, or any additional text. ' | ||
| . 'Single topic: If the text covers only one topic, return exactly one line.'; |
There was a problem hiding this comment.
Tested with output from stt_whisper2 and Mistral Small 24B from IONOS and it doesn't seem to work for me :/
There was a problem hiding this comment.
I get the complete blob back from the task type, same as the input
There was a problem hiding this comment.
Interesting what prompt did you use. I used the one below and tested it on Mistral Small 24B and it splits it into two paragraphs:
Details
Dogs have been humanity's most devoted companions for thousands of years, evolving from wild wolves into loyal members of our families. Renowned for their incredible intelligence, adaptability, and emotional connection, they serve in countless roles ranging from herding flocks and searching for disaster victims to providing comfort to those with anxiety. Whether they are playful puppies chasing a ball, energetic adults running off-leash, or quiet guardians watching over the home, dogs bring a unique blend of energy, affection, and unconditional love into our lives. Their ability to read human body language and emotions has fostered a deep bond between species, making them not just pets, but true partners in our daily journeys. Cats are captivating creatures that have held a unique place in human history for millennia, revered both as loyal companions and mysterious symbols of grace. With their sleek, muscular bodies and agile movements, they are masters of the hunt, capable of pouncing with pinpoint accuracy even in the dimmest light. Beyond their physical prowess, cats possess an enigmatic personality that ranges from aloof independence to devoted affection, often choosing their owners with a quiet intuition. Whether lounging lazily in a sunbeam, watching the world with wide, golden eyes, or performing a silent, graceful leap, cats embody a blend of elegance and mystery that continues to enchant people everywhere.There was a problem hiding this comment.
I tried with the transcript of the last company call from may 5th
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
Built off of #362 and creates an AudioToTextEnhancedProvider that can be used instead that also reformats text. (582825f is the important commit)