Ticket #345 (closed bug: fixed)

Opened 9 years ago

Last modified 8 years ago

Starting ffado-dbus-server kills jackd

Reported by: froh Assigned to:
Priority: major Milestone:
Component: devices/dice Version: FFADO SVN (trunk)
Keywords: ffado-mixer Cc: nils
The device the bug applies to: Focusrite Saffire Pro 24

Description

If I start ffado-mixer and ffado-dbus-server is not already running, jackd dies. See attached log files. My configuration is as follows:

  • HP Compaq nx7400 laptop
  • AV-Linux 5.0.3
  • Saffire Pro 24
  • Builtin firewire: Texas Instruments PCIxx12 OHCI Compliant IEEE 1394 Host Controller
  • libffado Version: 2.0.99+svn1995-3
  • jackd Version: 1:0.121.3avlinux-1
  • linux-kernel: 2.6.33.7.2-rt30-avlinux-realtime (old fw stack)

Attachments

jackd.log (6.0 kB) - added by froh on 03/06/12 05:45:32.
Log of jackd crashing
ffado-dbus-server start (2.1 kB) - added by froh on 03/06/12 05:46:45.
ffado-dbus-server output

Change History

03/06/12 05:45:32 changed by froh

  • attachment jackd.log added.

Log of jackd crashing

03/06/12 05:46:45 changed by froh

  • attachment ffado-dbus-server start added.

ffado-dbus-server output

03/28/12 02:09:11 changed by jwoithe

Hmm, it's a bit hard to work out what might be happening here. Could you post the output from ffado-diag - that may give us some clues. In particular, the version of libraw1394 would be interesting.

I've just tried the following myself after having jackd running:

  • manually start ffado-dbus-server, then run ffado-mixer
  • run ffado-mixer and have it start ffado-dbus-server

In both cases everything worked as expected and jackd did not crash. This was with an RME Fireface-800 device on a 3.2.4 kernel with the new stack. Whatever the problem, it's not reproducible on every system.

Given that you're running an RT kernel it seems unlikely that any load created on your machine could be causing jackd to get hung up. A few other things to think about:

  • if you're running NTP to synchronise your computer time to the network, temporarily disable this and try again
  • if you are running a binary video driver, try disabling that (if possible) and try again. We've seen the binary video drives stuff things up in the past, even on RT kernels.
  • You are currently running with 64-sample periods and 3 buffers (-p64 -n3). It might be worth bumping up the "-p" number a bit and see if that improves things. The number must be a power of 2, so 128 and 256 would be worth trying.

03/28/12 04:30:03 changed by jwoithe

Ticket #314 is a duplicate of this one.

04/01/12 07:58:03 changed by jwoithe

Another thing to try is an upgrade to FFADO trunk. This ticket is against the FFADO 2.0 branch, and trunk contains a substantial collection of work related to the DICE platform which as far as I know isn't in the 2.0 branch.

04/01/12 12:26:44 changed by stefanr

2.0.99+svn1995-3 is already from trunk; the 2.0 branch does actually not contain any DICE support at all. I have no idea though if r1995 was a good one for this particular setup.

04/01/12 12:27:32 changed by stefanr

  • version changed from FFADO SVN (2.0 branch) to FFADO SVN (trunk).

(follow-up: ↓ 8 ) 04/18/12 09:57:47 changed by nils

  • cc set to nils.

I can confirm this with current SVN trunk and a Saffire Pro40 (which would correlate well with the reporter's Pro24). I don't run an RT kernel however, and haven't done so for quite a while as this hasn't been necessary to get decent performance.

04/18/12 16:37:10 changed by jwoithe

Thanks for the confirmation Nils. As far as I can tell this must be some strange interaction between the DICE driver and dbus, unless I've misunderstood the steps required to recreate the crash. Could you review the steps I did as reported in comment 1 and let me know if this is consistent with what you're doing? If you could then post the respective command lines you're using for jackd and ffado-mixer I can then try precisely the same thing on my setup. At least theoretically this would eliminate most things except the kerne, the DICE driver and how it reacts to dbus activity on your system. Clearly though it's only some dbus activity, since I assume the jackd crash only occurs during ffado-mixer startup: if ffado-mixer is already running when jackd starts I assume the use of ffado-mixer doesn't cause jackd to crash.

For the record, I'm not using an RT kernel either. However, I do have CONFIG_PREEMPT turned on (aka "low latency desktop"). If you're not already running a kernel with such an option it may be worth trying that to see if it is implicated in any way.

Actually, what kernel version are you running Nils? The original reporter was on 2.6.33 which is quite old now (although since they were on the old stack the chances of the kernel's age being an issue is low). I'm presently on 3.2.4 (as per comment 1).

(in reply to: ↑ 6 ) 04/24/12 04:42:42 changed by stefanr

Replying to nils:

I can confirm this with current SVN trunk and a Saffire Pro40 (which would correlate well with the reporter's Pro24).

Nils, could you attach ffado-diag output?

04/24/12 08:12:09 changed by nils

Jonathan, Stefan: As soon as I can spend a little time with the machine in question... I hope I'll manage it this week, as I'll be on 10 days of vacation from Friday on. I can't test ffado-dbus-server started via bus activation however, as this will give me the old version.

(follow-up: ↓ 11 ) 04/24/12 15:58:38 changed by stefanr

Nils, take your time. I was most interested in the lspci part of ffado-diag, wondering whether you incidentally had the same controller as the reporter (Texas Instruments PCIxx12).

However, I managed to build ffado-mixer now and could consistently reproduce the issue with ffado svn 2125, jack 0.121.3, kernel 3.3 CONFIG_PREEPMT, Saffire PRO 24, and these controllers:

  • Agere Systems FW643 PCI Express 1394b Controller (PHY/Link) [11c1:5901] (rev 07)
  • NEC Corporation IEEE 1394 [OrangeLink] Host Controller [1033:00cd] (rev 03)
  • Texas Instruments XIO2213A/B/XIO2221 PCI Express to PCI Bridge [Cheetah Express] [104c:823e] (rev 01)

This rules the FireWire controller out as the problem source.

(in reply to: ↑ 10 ) 04/24/12 15:59:42 changed by stefanr

Replying to stefanr:

* Texas Instruments XIO2213A/B/XIO2221 PCI Express to PCI Bridge [Cheetah Express] [104c:823e] (rev 01)

Cut and paste error, should be Texas Instruments XIO2213A/B/XIO2221 IEEE-1394b OHCI Controller [Cheetah Express] [104c:823f] (rev 01).

04/24/12 19:38:16 changed by jwoithe

Stefan: it's good that you can reproduce the issue across multiple controllers. I've been trying (as per comment 1) and have so far been unable to do so. To me it seems like the issue is only present with some devices (either that or there happens to be something about my setup which circumvents the problem). Since you can reproduce the problem is there any chance you can dig into this a little more to see if we can get a handle on what's going on. I'm a little hamstrung because I can't produce the fault on my system.

For reference the firewire controller I'm using is a Texas Instruments TSB43AB23.

04/24/12 23:54:14 changed by stefanr

Affected:

  • Saffire PRO 24 (froh, stefanr)
  • Saffire PRO 40 (nils)

Not affected:

  • RME Fireface 800 (jwoithe)
  • Terratec Phase X24FW, a BeBoB based device (stefanr)

As time permits, I will look further into what particular part of ffado-dbus-server's startup is triggering this event.

04/25/12 00:52:56 changed by jwoithe

Just so we can be sure that we're all testing in a comparable way, could you (Stefan) post the steps you're using to test your devices (or confirm that it's the same as in comment 1)?

It's interesting that you have both an affected and unaffected device - that pretty much rules out system differences as being the cause.

As time allows over the next few days I'll also test my MOTU to see if that's affected.

04/25/12 15:51:30 changed by nils

Hmm. Right now I can't reproduce the issue here anymore, neither with a bus-activated nor a manually started ffado-dbus-server. The only difference I can think of between last time jackd crashed on me due to this is that I added a second Saffire Pro40 in the meantime, but can that have an influence?

06/01/12 03:29:31 changed by jwoithe

Further discussion has taken place on ffado-devel. It now seems likely that this issue only applies to specific devices - namely, those based on the DICE platform. The theory at this point is that there's something done during device discovery/initialisation (which happens when both jackd and ffado-mixer is started) which causes the device to stop (or interrupt) streaming if it was previously streaming.

06/01/12 03:55:42 changed by adi

It does. discover() in dice_avdevice.cpp calls initIoFunctions, and there's the following code:

    // FIXME: after a crash, the device might still be streaming. We
    // simply force a stop now (unless in snoopMode) to return to a
    // clean state.
    bool snoopMode = false;
    if(!getOption("snoopMode", snoopMode)) {
        //debugWarning("Could not retrieve snoopMode parameter, defauling to false\n");
    }

    if (!snoopMode) {
        disableIsoStreaming();
    }

I don't have time to hack on that right now, but maybe somebody else wants to give it a whirl and removes the code in question to see what happens, if devices can still be successfully restarted after a crash.

I've contributed DICE restart code a while ago, so maybe the code above is obsolete and can be removed. Feedback appreciated.

06/02/12 03:18:29 changed by adi

  • status changed from new to closed.
  • resolution set to fixed.

Fixed in r2160, but you want to update to at least r2161. The latter is required to make jackd recover after a crash without explicitly disabling the ISO streams.

Tested on Alesis io14.