Interact with the instruction-tuned Qwen2.5 Omni checkpoint fine-tuned via the aud2seq pipeline. Attach an audio clip and describe the segmentation or caption you expect.