Ticket #104 (closed bug: fixed)

Opened 6 years ago

Last modified 6 years ago

Jack crashes after a few minutes

Reported by: grr Assigned to: ppalmers
Priority: minor Milestone: FFADO 2.0
Component: Version: FFADO 2.0-beta4 (1.999.26)
Keywords: Cc:
The device the bug applies to: Focusrite Saffire PRO 26

Description

After a few minutes of successful operation, the server crashes with this (see failure.log for more output lines)

firewire ERR: wait status < 0! (= -1)
DRIVER NT: could not run driver cycle
jack main caught signal 12
no message buffer overruns

Subsequent attempts to start jack fail (see reconnect.log), and jackd eats 100% CPU. After the saffire has been physically shut down, jack can be restarted.

Context:

  • UbuntuStudio 8.04 (generic kernel)
  • libffado 2.0 beta4
  • Jack from svn trunk (rev 1174), launched like this by qjackctl:
    /usr/local/bin/jackd -dfirewire -r44100 -p1024 -n3 -i2 -o2
    

Attachments

failure.log (0.6 kB) - added by grr on 05/07/08 11:29:45.
First failure
reconnect.log (0.7 kB) - added by grr on 05/07/08 11:30:51.
Error at reconnection attempt
ffado-diag.log (4.9 kB) - added by grr on 05/08/08 06:30:47.
Diagnosis tool output
ffado-jack.log.3 (121.9 kB) - added by grr on 05/08/08 06:31:48.
streaming64.log (43.0 kB) - added by grr on 05/09/08 03:21:27.
streaming128.log (43.8 kB) - added by grr on 05/09/08 03:22:04.
teststreaming3 -p 128
streaming64.log.2.bz2 (61.1 kB) - added by grr on 05/22/08 00:43:07.
tests/streaming/teststreaming3 -r 44100 -P 70 -p64 -n3 -v6 2> streaming64.log.2
streaming128.log.bz2 (44.8 kB) - added by grr on 05/22/08 14:22:41.
Log at 128 without most of the streaming_wait output

Change History

05/07/08 11:29:45 changed by grr

  • attachment failure.log added.

First failure

05/07/08 11:30:51 changed by grr

  • attachment reconnect.log added.

Error at reconnection attempt

05/07/08 11:32:40 changed by grr

FFADO 2.0-beta4 is missing from trac's version selector

05/07/08 12:51:02 changed by ppalmers

  • version changed from FFADO 2.0-beta2 (1.999.21) to FFADO 2.0-beta4 (1.999.26).

please prepare a log as indicated in the README file.

Running without realtime priority will most likely cause issues on the long run in any case.

05/08/08 06:30:47 changed by grr

  • attachment ffado-diag.log added.

Diagnosis tool output

05/08/08 06:31:48 changed by grr

  • attachment ffado-jack.log.3 added.

05/08/08 06:42:13 changed by grr

Indeed there's an obvious improvement with the rt kernel, and things are even better with priority raised to 70. Still it crashed after about 15mn. Here's the command.

jackd -R -P 70 -d firewire -r 44100 -v6 2> ffado-jack.log.3

the files I'm currently testing with are 44.1k but I could retry on 48k if you'd find it relevant.

05/08/08 08:25:50 changed by ppalmers

  • status changed from new to assigned.
  • owner set to ppalmers.

This might be a problem with jack rather than with FFADO. Can you check whether the following keeps running fine for a longer time?

tests/streaming/teststreaming3 -r 48000 -P 70 -p256 -n3 -v5

It might be interesting to pipe this to a log file too:

tests/streaming/teststreaming3 -r 48000 -P 70 -p256 -n3 -v5 2> log

Note that you might want to disconnect speakers attached.

05/08/08 08:27:16 changed by ppalmers

Please test with 44100 instead of 48000 since that was the rate used in your original report.

05/09/08 03:19:28 changed by grr

tests/streaming/teststreaming3 -r 44100 -P 70 -p256 -n3 -v5 2> log

ran for one hour without problems.

tests/streaming/teststreaming3 -r 44100 -P 70 -p64 -n3 -v5 2> streaming64.log

crashed after 5 mins

tests/streaming/teststreaming3 -r 44100 -P 70 -p128 -n3 -v5 2> streaming128.log

lasted about 20mins.

05/09/08 03:21:27 changed by grr

  • attachment streaming64.log added.

05/09/08 03:22:04 changed by grr

  • attachment streaming128.log added.

teststreaming3 -p 128

05/09/08 08:00:53 changed by ppalmers

can you try the streaming tests again with latest SVN, but with verbosity level 6 (-v6)

05/19/08 12:16:37 changed by ppalmers

  • status changed from assigned to closed.
  • resolution set to worksforme.

I have tried several longterm runs, and they all seem to work fine. The main requirement for this is running jack or the streaming test with RT privileges. I am therefore closing the ticket, but please reopen it if there are still issues. Run the test with verbosity level 6 (-v6) such that we can figure out what goes wrong.

05/22/08 00:43:07 changed by grr

  • attachment streaming64.log.2.bz2 added.

tests/streaming/teststreaming3 -r 44100 -P 70 -p64 -n3 -v6 2> streaming64.log.2

05/22/08 00:49:42 changed by grr

Hi Pieter,

seems I missed an interesting comment here! Here are the tests results:

with -p128 it ran peacefully about an hour, but didn't survive a whole night (got 3MB of log after compression if you're interested) with -p64 it crashed almost immediately (log attached)

So that looks better. On the other hand, I still get random deconnections with jack (typically after half an hour) but have to investigate a bit more.

Status of this ticket is, of course, up to you.

05/22/08 02:01:56 changed by ppalmers

The log with -p64 indicates that your system cannot handle this low period size. The shutdown is simply due to the system not being able to keep up.

I would like to see the p128 run, but you can cut out the "normal operation" status output.

The jack errors are most likely not a FFADO problem, there are some severe issues with current jack SVN.

05/22/08 14:22:41 changed by grr

  • attachment streaming128.log.bz2 added.

Log at 128 without most of the streaming_wait output

05/23/08 01:44:29 changed by grr

Here you are

05/23/08 02:01:52 changed by ppalmers

  • priority changed from major to minor.
  • status changed from closed to reopened.
  • resolution deleted.

05/23/08 02:07:17 changed by ppalmers

The log file you provide indicates two things:

  1. your system for one reason or the other can't provide the necessary timing guarantees
  2. when handling the xruns, at a certain point in time either the system or the saffire gets confused. The timestamps are really off, and that points me in the direction of a bogus initialization upon xrun handling. Once too many packets are dropped, the DLL get's a too large smack and can't recover in time.

Point 1 is not something we can do a lot about. Point 2 is something that has to be resolved, but since it's not in the nominal operation i've lowered the priority.

05/28/08 22:49:44 changed by porl

i had this exact same problem before. i have bought a new firewire interface card and updated the saffire's firmware, and now it is rock solid (no xruns running for over 24 hours) at -p256. i can't seem to get it to work under that rate though.

porl

ps. i was running the latest svn build at the time (post beta4) so that may have also had something to do with it.

06/21/08 03:54:16 changed by ppalmers

Can you verify this is still a problem with beta6?

07/04/08 01:27:16 changed by ppalmers

  • status changed from reopened to closed.
  • resolution set to fixed.

r1268 addresses point 2 of my comment. point 1 is not something we can solve.

closing the ticket