Ticket #271 (closed bug: fixed)

Opened 5 years ago

Last modified 4 years ago

XRUNs with fglrx/nvidia/nouveau and FA-66: Instantanous samplerate more than 1% off nominal

Reported by: joh Assigned to:
Priority: major Milestone:
Component: devices/bebob Version: FFADO 2.0-rc2 (1.999.42)
Keywords: Cc: nils
The device the bug applies to:

Description

jackd 0.116.1 with ffado 1.999.43 on Ubuntu Karmic:

libffado1 2.0~rc2+svn1569-2ubuntu1 jackd 0.116.1-4ubuntu2

Sound card: Edirol FA-66 FireWire? card: Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)

I start jackd with:

jackd -R -d firewire -r 44100 -n 4 -p 2048 -v3

I get lots of "Execute: negative step" errors, so this might be related to #270. I'm attaching the log stripped for these errors.

I get XRUNs with this setup even though jackd has realtime privileges in limits.conf. The XRUNs seem to occur when CPU and/or I/O usage is high, but I'm unable to reproduce it with the 'stress' tool.

Running 'stress --cpu 8 --io 4' does not provoke any XRUNs.

It might be related to conflicting IRQs, so I'm attaching /proc/interrupts as well. Let me know if you think this might be the issue.

How do I approach this problem? I'm stuck at this point...

Attachments

jackd-ffado-xruns.log (11.5 kB) - added by joh on 04/11/10 10:39:10.
interrupts.txt (1.6 kB) - added by joh on 04/11/10 10:39:40.
/proc/interrupts
ffado-diag.txt (6.2 kB) - added by joh on 04/14/10 09:00:09.
Output of ffado-diag
jackd-ffado-nouveau-xruns.log (93.1 kB) - added by joh on 09/24/10 03:15:35.
jackd.log (49.4 kB) - added by joh on 06/08/11 09:28:33.

Change History

04/11/10 10:39:10 changed by joh

  • attachment jackd-ffado-xruns.log added.

04/11/10 10:39:40 changed by joh

  • attachment interrupts.txt added.

/proc/interrupts

04/11/10 10:42:11 changed by joh

Forgot to mention the warnings I get from ffado during the XRUNs:

109927533199: Warning (StreamProcessor?.cpp)[ 391] putPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 7350.106546, diff: 36749.893454 ( 0.833331)] 109927576741: Warning (devicemanager.cpp)[ 981] waitForPeriod: XRUN detected 109927657994: Error (CycleTimerHelper?.cpp)[ 508] Execute: negative step: -34914035024621.218750! (correcting to nominal) 109927836814: Warning (StreamProcessor?.cpp)[ 709] getPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 30.603749, diff: 44069.396251 ( 0.999306)] 109927852812: Warning (StreamProcessor?.cpp)[ 709] getPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 22051.144011, diff: 22048.855989 ( 0.499974)]

04/13/10 15:37:06 changed by joh

The XRUNs get very frequent when playing video, which leads me to believe they might be related to the video card / driver somehow.

ATI Radeon HD 5770, catalyst 10.3 driver

Please let me know if there's any other information you need.

For what it's worth, the FA-66 works on Windows 7 without any issues...

04/14/10 08:59:27 changed by joh

After some testing I've discovered that using the x11 video driver does not seem to cause any XRUNs (mplayer -vo x11), however it kills my CPU as it has to do software scaling.

With the Xv video driver I get loads of XRUNs (mplayer -vo xv).

04/14/10 09:00:09 changed by joh

  • attachment ffado-diag.txt added.

Output of ffado-diag

04/14/10 09:55:01 changed by joh

Just tested with 2.6.32-21-generic from lucid with the new firewire stack using libffado 2.0.0+svn1806-2 and libraw1394 2.0.5-1 (packages from debian unstable).

I also tested with different periods 256, 512 and 1024.

Unfortunately I get the same behavior as before:

00737746441: Warning (StreamProcessor?.cpp)[ 707] getPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 22051.144011, diff: 22048.855989 ( 0.499974)] 00738617790: Warning (StreamProcessor?.cpp)[ 389] putPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 3392.425157, diff: 40707.574843 ( 0.923074)] 00738622848: Warning (devicemanager.cpp)[ 994] waitForPeriod: XRUN detected 00738858483: Warning (StreamProcessor?.cpp)[ 707] getPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 33.536128, diff: 44066.463872 ( 0.999240)] 00738882369: Warning (StreamProcessor?.cpp)[ 707] getPacket: Instantanous samplerate more than 1% off nominal. [Nom fs: 44100.000000, Instantanous fs: 22058.566139, diff: 22041.433861 ( 0.499806)]

04/14/10 14:00:30 changed by arnonym

Both the disk and the third usb-port share the interrupt with the via-firewire port. And the ti-firewire port shares the interrupt with many usb-ports. That is a bad sign. (You seem to use the TI-card from looking at the interrupt-counts:)

Take a look at the output of lsusb to determine which devices are connected to which port and then re-plug your usb-devices to make sure the ports sharing the interrupt with the firewire controller are not connected to anything. That can give you some headroom.

Running a cpu-stress-test should not result in more xruns as the processing threads of jackd+ffado run at more then normal priority. This makes sure they get the cpu-time they need regardless how many normal- and lower-priority apps and stress-tests run.

04/14/10 23:19:02 changed by joh

I disconnected all USB devices that were doing interrupts on IRQ18. A 'watch -n 0.5 cat /proc/interrupts' revealed that there were no interrupts going on when I started jackd. That's enough to indicate that the ti-firewire port has exclusive use of IRQ18, right?

Anyway, this did not seem to help any - lots of XRUNs still when playing a movie with mplayer -vo xv.

It seems the fglrx driver locks some resource for too long and this causes the XRUNs on the firewire bus. Might be that the PCI bus is locked for too long?

Not sure what else to try now but wait for updates to fglrx / open source driver.

(follow-up: ↓ 8 ) 04/15/10 00:04:15 changed by cladisch

The fglrx and nvidia drivers are known to lock the CPU for too long. I'd strongly suggest to try the newest open-source driver, if possible.

Shouldn't it be possible to make the streaming more robust by adjusting some parameters like ieee1394.isomanager.min_interrupts_per_period or .max_nb_buffers_recv/xmit?

(in reply to: ↑ 7 ; follow-up: ↓ 9 ) 08/25/10 11:24:55 changed by joh

Replying to cladisch:

The fglrx and nvidia drivers are known to lock the CPU for too long. I'd strongly suggest to try the newest open-source driver, if possible. Shouldn't it be possible to make the streaming more robust by adjusting some parameters like ieee1394.isomanager.min_interrupts_per_period or .max_nb_buffers_recv/xmit?

For what it's worth, I've switched to an NVIDIA card (GeForce? GTS 250) and experience the same problems with XRUNs during graphic activity (switching workspace, activating screen saver etc.)

How can those parameters be adjusted?

(in reply to: ↑ 8 ; follow-up: ↓ 10 ) 08/26/10 03:45:00 changed by nils

  • cc set to nils.

I strongly concur with cladisch that people experiencing this should try out their latest open source drivers -- IMO tweaking firewire parameters will hardly make proprietary video drivers behave. I think it boils down to whether you rather want higher video or audio/firewire performance.

(in reply to: ↑ 9 ; follow-up: ↓ 11 ) 09/24/10 00:46:11 changed by joh

Replying to nils:

I strongly concur with cladisch that people experiencing this should try out their latest open source drivers -- IMO tweaking firewire parameters will hardly make proprietary video drivers behave. I think it boils down to whether you rather want higher video or audio/firewire performance.

Having to make a choice between video and audio is unacceptable. The open source drivers are not an alternative either, as they lack many features such as 3D acceleration.

Is this another argument for having ffado in kernel space?

(in reply to: ↑ 10 ; follow-up: ↓ 12 ) 09/24/10 02:35:37 changed by arnonym

Replying to joh:

Replying to nils:

I strongly concur with cladisch that people experiencing this should try out their latest open source drivers -- IMO tweaking firewire parameters will hardly make proprietary video drivers behave. I think it boils down to whether you rather want higher video or audio/firewire performance.

Having to make a choice between video and audio is unacceptable. The open source drivers are not an alternative either, as they lack many features such as 3D acceleration.

Its not a choice between audio and video, its a choice between things we can control and things we can't.

Open source drivers are something we can control. We can see the code and see what they do in kernel- and userspace. Closed source drivers (regardless whether its video or something else) in kernel-space have complete access to all kernel things and can do whatever they like. And we don't know what they actually do because we don't know the code. They say its a video driver but we don't know what parts also effect other parts of the kernel, they could do all nasty things slowing down all other devices (like firewire-io) in order to optimize your video. This makes it basically impossible to debug anything.

And because of this uncertainties, most (all?) kernel- and driver-developers refuse to debug anything if a proprietary driver is loaded. (Apart from any philosophical issues.)

Is this another argument for having ffado in kernel space?

No, its an argument for open-source-drivers.

We are not asking you to completely drop your nvidia driver forever, we just ask you to check whether the same problems happen with the free drivers and then help debugging it. And if it only happens with the closed-source drivers, well...

(in reply to: ↑ 11 ) 09/24/10 03:14:21 changed by joh

Replying to arnonym:

We are not asking you to completely drop your nvidia driver forever, we just ask you to check whether the same problems happen with the free drivers and then help debugging it. And if it only happens with the closed-source drivers, well...

Ok, testing with the nouveau driver and I get plenty of XRUNs here as well. Especially with a video played or when moving a window around. Attaching jackd output.

Nouveau version info: ii libdrm-nouveau1 2.4.18-1ubuntu3 ii nouveau-firmware 20091212-0ubuntu1 ii xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu5

Let me know if I should try with a more recent version of the driver.

09/24/10 03:15:35 changed by joh

  • attachment jackd-ffado-nouveau-xruns.log added.

06/08/11 08:04:59 changed by joh

FWIW I'm still having this issue on Ubuntu Natty, running ffado 2.0.99+svn1949-1 and jackd 1.9.6~dfsg.1-5ubuntu1. I get loads of warnings like these now:

45434774127: Warning (IsoHandlerManager?.cpp)[1620] getPacket: reconstructed CTR counter discrepancy 45434774129: Warning (IsoHandlerManager?.cpp)[1626] getPacket: ingredients: E11, A4E11000, A2E11500, A2E142FC, A2E13500, 82, 81, 81, 2001719552 45434774136: Warning (IsoHandlerManager?.cpp)[1627] getPacket: diffcy = -2

Not sure if they are somehow related to the XRUNs. Attaching new log. Please let me know if there's anything else I can try.

06/08/11 09:27:27 changed by joh

Just tested with the nouveau drivers (1:0.0.16+git20110107+b795ca6e-0ubuntu7) and experience the same problem here. Whenever there's a lot of activity on the screen, there are XRUNs.

06/08/11 09:28:33 changed by joh

  • attachment jackd.log added.

06/08/11 11:54:53 changed by stefanr

  • summary changed from XRUNs on FA-66: Instantanous samplerate more than 1% off nominal to XRUNs with fglrx/nvidia/nouveau and FA-66: Instantanous samplerate more than 1% off nominal.

For what its worth, AMD's radeon driver (the GPL one, not the proprietary fglrx driver) has lost any high-latency behavior at least on my HD 3200 based system *many* kernel releases ago.

If you require graphics hardware from a vendor who does not provide specifications nor a mainlined open source driver, but you are OK with a reverse-engineered drivers from volunteer developers, then you can help out those developers by contacting them about the high latency problems. Whether they are going to be able to do anything about it in the near time remains to be seen though, as lack of manpower and specs does not make things any easier. Due to the reverse engineering effort, these guys may have a specific roadmap into which some end-user requests may not fit in right away.

Anyway. Have a look at http://nouveau.freedesktop.org/wiki/ on how to get in touch with the nouveau developers. For anything to start happening at all, they need to get to know about the issue.

The alternative is to use the IME trouble-free mainline drivers for ATI/AMD graphics, or consider one of the Intel graphics chips that have open source driver support from Intel.

Or another more theoretical alternative would be to report the nvidia latency problems to NVidia in hope that they look into it.

06/09/11 09:05:15 changed by joh

So, I talked one of the developers in #nouveau, who pointed me to the 'latencytop' tool. The tests involved running jackd and provoking XRUNs by activity in, without any audio playing. He was rather puzzled by the results:

Page fault 72.3 ms [nouveau_channel_get] 46.5 ms [nouveau_fence_wait] 20.5 ms

The nouveau calls stem from gnome-shell and Xorg, while the Page fault is exclusive for gnome-shell. He said the latency in gnome-shell was expected, but shouldn't be able to affect jack noticeably.

Deeper profiling of the high latency page faults would be a next step, but he was unsure how to pursue this.

06/09/11 16:53:32 changed by joh

  • status changed from new to closed.
  • resolution set to fixed.

Good news, everyone!

I found a working setup which doesn't exhibit this problem, even with the proprietary nvidia driver.

First of all, a preemptive kernel was needed. I had success with 2.6.39.1.

Secondly, I had to use a sample rate of 96000. Not sure exactly why, but neither 44100 nor 48000 worked and resulted in the same problems described here.

Cheers :)