Ticket #379 (new bug)

Opened 6 years ago

Last modified 5 years ago

jack doesn't always start with RME FF800

Reported by: juanramon Assigned to: jwoithe
Priority: major Milestone: FFADO 2.x
Component: devices/rme Version: FFADO SVN (trunk)
Keywords: Cc:
The device the bug applies to:

Description

I'm using Debian (testing) with jackdbus of jack2. When I try to start Jack server with my rme ff800 just few times I reach. But often it just says:

DBus exception: org.jackaudio.Error.Generic: Failed to start server

Using Qjackctl I get:

qjackoutput.txt

When it doesn't work I kill jackdbus, reset bus (ffado-test) and I try again and one or two times over 20 I get jack working. I cannot say if is a problem of jack or ffado.

A strange behavour of RME device is that led of clock source would be "host" when is jack works with internal clock but some times all clock leds turn off. Some times when jack works, some times when it doesn't.

Is for that I thing meybe is a porblem with samplerate set... If you need more outputs say me what you need to reproduce this issue. Thanks

Attachments

qjackoutput.txt (1.2 kB) - added by juanramon on 05/08/14 05:28:35.
ffado-ubuntu.log (1.1 kB) - added by juanramon on 05/09/14 04:23:55.
diag-ubuntu.txt (8.6 kB) - added by juanramon on 05/09/14 04:24:06.
ffado.log (255.1 kB) - added by juanramon on 05/09/14 04:56:45.
diag.txt (7.4 kB) - added by juanramon on 05/09/14 04:57:00.
ffado-svn-2470-debian.log (255.4 kB) - added by juanramon on 05/09/14 05:32:04.
diag-svn-2470-debian.log (7.4 kB) - added by juanramon on 05/09/14 05:32:15.
isohandler-test.diff (0.8 kB) - added by jwoithe on 05/09/14 06:35:53.
A quick patch to test for a potential cause of this problem
isohandler-test-2.diff (2.0 kB) - added by jwoithe on 05/09/14 17:40:09.
A better test for the source of backwards moving cycle numbers in transmit handlers
ffado-r2529-debian.log (256.6 kB) - added by juanramon on 05/13/14 23:13:16.
cycle_test-1.diff (1.3 kB) - added by jwoithe on 05/17/14 06:06:06.
Patch to investigate cycles being requested
ffado-2530-cyclepatch1.log (0.7 MB) - added by juanramon on 05/18/14 02:57:23.
ffado-2529-cyclepatch1.log (0.7 MB) - added by juanramon on 05/18/14 02:59:45.
cycle_test-2.diff (1.6 kB) - added by jwoithe on 05/18/14 03:23:01.
Cycle timer debug patch number 2.
ffado-2529-cyclepatch2.log (0.7 MB) - added by juanramon on 05/18/14 07:23:47.
libraw1394-test-patch-1.diff (1.3 kB) - added by jwoithe on 05/23/14 03:51:47.
Test patch 1 for libraw1394
libraw1394-configure.log (4.8 kB) - added by juanramon on 05/24/14 02:15:18.
libraw1394-make.log (6.9 kB) - added by juanramon on 05/24/14 02:16:06.
ffado-libraw-patched.log (0.9 MB) - added by juanramon on 05/27/14 13:51:09.
libraw1394-test-patch-2.diff (1.4 kB) - added by jwoithe on 05/27/14 18:32:47.
Test patch 2 for libraw1394
ffado-libraw-patched2.log (0.8 MB) - added by juanramon on 05/28/14 01:38:43.
libraw1394-test-patch-3.diff (1.8 kB) - added by jwoithe on 05/29/14 05:40:18.
Test patch 3 for libraw1394
ffado-libraw-patched3.log (0.8 MB) - added by juanramon on 05/30/14 11:09:12.
ffado-2539-cyclepatch3.diff (2.1 kB) - added by jwoithe on 05/31/14 05:13:00.
Cycle timer debug patch number 3
libraw1394-test-patch-4.diff (2.3 kB) - added by jwoithe on 05/31/14 05:13:38.
Test patch 4 for libraw1394
ffado2539-p3-libraw-p4.log (1.9 MB) - added by juanramon on 05/31/14 10:31:30.
ffado-2539-cyclepatch4.diff (2.1 kB) - added by jwoithe on 05/31/14 17:49:45.
Cycle timer debug patch number 4
ffado2539-p4-libraw-p4.log (0.7 MB) - added by juanramon on 06/01/14 04:29:38.
libraw1394-backward_cycle_fix-rc1.diff (0.8 kB) - added by jwoithe on 06/01/14 19:00:40.
Potential fix for libraw1394 problem causing backward cycle number movement
ffado2539-p4-libraw-fix1-O0.log (163.7 kB) - added by juanramon on 06/02/14 10:09:44.
log file in rev2539 + patch4 and libraw1394-O0option + patch4 + fix-rc1
ffado2539-nop-libraw-O0-fix1.log (161.6 kB) - added by juanramon on 06/03/14 04:52:01.
Log with -O0 option. RME doesn't works
ffado2539-nop-libraw-O2-fix1.log (2.0 MB) - added by juanramon on 06/03/14 04:53:19.
Log with -O2 option . RME works Ok
ffado2542-libraw-O0-ndbg-fix1.log (164.2 kB) - added by juanramon on 06/04/14 02:24:22.

Change History

05/08/14 05:28:35 changed by juanramon

  • attachment qjackoutput.txt added.

05/08/14 05:42:28 changed by jwoithe

From the output provided all that can be concluded with certainty is that the FFADO backend did not start. It appears to be a problem with FFADO. However, to be certain I would need to see detailed debug output. Could you please run the following in a terminal:

jackd -d firewire -p512 -n3 -v6 >& /tmp/ffado.log

and post the contents of the ffado.log file? The "-v6" should enable a bunch of debugging output (since it appears that you've compiled this from the svn source) which may make the reason for the failure clear.

Could you also post the output from the "ffado-diag" command? This will show whether there are any obvious issues with the machine's configuration.

One question that's worth checking from the outset: are you running a "low latency" kernel (aka one configured with CONFIG_PREEMPT enabled)? FFADO in general (and the RME driver in particular) require a low latency kernel for reliable operation.

05/08/14 05:53:24 changed by jwoithe

Sorry, I forgot to respond to the last part of the original report. In relation to the "Host" LED behaviour, this red indicator lights when FFADO has not yet taken control of the interface. When this takes place (as a result of either jackd/ffado or ffado-dbus-server/ffado-mixer being started) this LED will go out. If the interface is configured to use its internal clock none of the clock source LEDs will be lit since there isn't an LED for the internal clock.

The state of the "Host" LED can sometimes get confused following a FFADO crash. While a bus reset will reset the FF800, the crash can leave some resources hanging around on the computer which make subsequent runs of jackd believe that the interface does not need to be initialised. The most significant of these is a shared memory segment which can usually be cleaned up using ipcs/ipcrm or removal of the relevant file under /dev/shm. The presence of the stale shared memory segment will generally cause the "Host" LED to remain lit when a FFADO component is next started. My experience is that often things will work fine but I can't guarantee this.

To summarise, the behaviour of your "Host" LED is almost certainly collateral damage from the FFADO crashes. I doubt it has anything to do with the setting of sample rate or clock source.

05/09/14 04:06:18 changed by juanramon

Thanks about explanations. Firts, my kernel: 3.2.0-4-rt-amd64

I attach jackd verbose level 6 in ffado.log file and ffado-diag in diag.txt

05/09/14 04:23:38 changed by juanramon

I've installed ubuntu 12.04 (3.11.0-11-lowlatency) in the same PC and I've tried there. Everything works fine. Both are using same revison of ffado. So I think it is something wrong in Debian. (or maybe not) I'll attach the same log outputs but from ubuntu maybe it is usefull.

ffado-ubuntu.log and diag-ubuntu.txt

05/09/14 04:23:55 changed by juanramon

  • attachment ffado-ubuntu.log added.

05/09/14 04:24:06 changed by juanramon

  • attachment diag-ubuntu.txt added.

05/09/14 04:39:10 changed by jwoithe

Thanks for providing these logs. It's encouraging that this is working under Ubuntu 12.04 but it would be good to get to the bottom of what is tripping the Debian install up.

The ffado.log file from the run under Debian is a little surprising in that it doesn't contain any debug information from FFADO: just error and warning messages. That is easily explained if DEBUG was not enabled during compilation, which is common for binary packages provided by distributions. That this is from a binary distribution of FFADO is further confirmed by the FFADO version number reported: "2.1.9999-": that is, it's missing the revision number as is normally the case when FFADO is packaged by distributions. I note that the FFADO compile date reported by jackd in this case is November 1, 2013.

What puzzles me somewhat is that in your Debian qjackctl output, the FFADO version is reported as 2.1.9999-2520. That is, when run through qjackctrl FFADO seems to know about its svn revision number, and what is more, this revision (r2520) is very recent (dating from 6 May 2014). Contrast this to the 1 Nov 2013 compile date reported for FFADO when jackd is run by itself.

Based on this it seems possible that there are two different versions of FFADO installed on your Debian system: one provided by your distribution (in /usr/lib/) and one that you compiled (probably in /usr/local/lib/). When run via qjackctl jack picks up the /usr/lib/ version, but when run from the command line jack find the /usr/local/lib/ version instead. You could confirm this by looking at the contents of these two lib directories; if libffado.so* is present in both then this is what has happened.

Does this sound feasible?

Assuming so, what you'll want to do is make sure the libffado you compile ends up overwriting the version in /usr/lib/ provided by your distribution. To do this, firstly arrange for it to be removed from /usr/local/. In the libffado source directory, as root (or via sudo - whatever is needed for your distribution) run

scons -c

This does a complete "clean" which includes removing the build targets from their installed locations. Then recompile FFADO but request that its target directory be /usr/ (rather than the default of /usr/local/):

scons PREFIX=/usr

Note that scons will remember the PREFIX setting on subsequent runs. Once that's complete, do the "scons install" thing. This should overwrite the libffado.so* files in /usr/lib/, but the package manager will remain happy since you haven't formally removed anything.

Once this is done, please re-run the jackd command I mentioned in my last message. You'll know if things have worked because the compile date for libffado will be somewhat more recent than 1 Nov 2013. You will also get a lot more output. :-)

Note that I don't think your troubles are caused by having the two versions of FFADO installed. There is clearly a problem; we just need to capture the verbose debug messages to determine what exactly is going on.

Considering the other details provided, I notice that your kernel is an RT kernel, so you should be good on that front. The kernel is a little dated and I think kernels of that vintage did have some firewire-related issues, but I don't think your symptoms match those. The firewire card is a Ricoh but I don't think this is the cause of your problem either, especially since it works under Ubuntu 12.04. I have a hunch about a possible cause and will be very interested to see the verbose debug messages out of the Debian system once we get that happening.

05/09/14 04:56:45 changed by juanramon

  • attachment ffado.log added.

05/09/14 04:57:00 changed by juanramon

  • attachment diag.txt added.

05/09/14 05:02:02 changed by juanramon

It was a mistake now both are running the same revision of ffado. The only important difference I found in diagnostics is Lowlatency vs RealTime? in its kernels configuration

05/09/14 05:09:08 changed by juanramon

As I wrote before now both revisions are 2470.

Usually I compile ffado with no debug flag but and always in /usr as prefix. If you want I can recompile it with degub option enabled. Just say me and I'll do.

05/09/14 05:32:04 changed by juanramon

  • attachment ffado-svn-2470-debian.log added.

05/09/14 05:32:15 changed by juanramon

  • attachment diag-svn-2470-debian.log added.

05/09/14 05:35:36 changed by jwoithe

Ordinarily yes - compile with the debug option enabled. However, this might not be necessary. On the ffado-devel mailing list Andreas posted that he had spotted what I was looking for (but missed) in your logs:

00167573228: Warning (IsoHandlerManager.cpp)[1704] getPacket: (0xde27c0) dropped < 1 (-27),
  cycle: 3879, last_cycle: 3905, dropped: 0, skipped: 0

My hunch as proved correct. :-) I was wondering whether this was indeed the problem.

This is a little cryptic, but what it means is that FFADO noticed that the cycle number reported in a packet - which should always increase by one compared to the previous value - instead went backwards. This is a problem which Andreas has seen on his system. The problem is that I have not been able to reproduce it on mine, even when switching to the same gcc version as that which was on Andreas's system. Currently it has us puzzled as it is not clear where the problem lies: in FFADO, libraw1394 or the kernel. For details you could look up the "Cycle timer issues on transmit" thread in the ffado-devel mailing list archive.

Are you up for debugging this over the next few days/weeks, possibly with Andreas looking on via the mailing list?

05/09/14 06:02:14 changed by juanramon

Yes sure.

But, say me what must I do... and how. What list may I register?

ffado-announce@lists.sourceforge.net ffado-user@lists.sourceforge.net ffado-devel@lists.sourceforge.net ffado-svn@lists.sourceforge.net

or what else? Thanks

05/09/14 06:12:09 changed by jwoithe

Great - thanks for being willing to do this for us. Give me a day or so to review where we are at, follow up on one vague idea I've had, and work out what might need to be done next.

The appropriate mailing list to discuss this on would be ffado-devel. However, due to the number of posts that might eventuate and the potentially large size of log files which need posting, sending some things directly to me might be needed. Once subscribed to ffado-devel you will be able to see my email address from any of my posts there. Let's see how we go.

05/09/14 06:34:17 changed by jwoithe

A last post for the night which addresses something I've been thinking about since the last round of tests from Andreas. I'm attaching a patch (isohandler-test.diff) to this ticket. Could you download it to your machine and apply it to your ffado tree using:

cd src/libieee1394
patch -p0 < isohandler-test.diff
cd ../..

If necessary provide a full path to the isohandler-test.diff file. Then recompile FFADO ("scons" should do the trick) and install. I would be interested in the output that results, and in particular whether the warning about the cycle timer still occurs. For this test you can leave DEBUG off - it will make it easier to spot the critical warning message.

05/09/14 06:35:53 changed by jwoithe

  • attachment isohandler-test.diff added.

A quick patch to test for a potential cause of this problem

05/09/14 06:50:54 changed by jwoithe

You will want to click the link under "Download in original format" after selecting the isohandler-test.diff attachment in the list at the top of this ticket.

Thinking this through a bit more I'm not sure this patch will help. With it in place we'll probably get errors about dropped packets instead. I'll have to think this through a bit more. Still, it might be worth trying it since this will tell us something.

For those following on via the mailing list, my theory is that if the FFADO transmit handers decide to return RAW1394_ISO_DEFER or RAW1394_ISO_AGAIN in response to a request for a packet of a given cycle (done if the generatePacketHeader() returns eCRV_Again), the kernel will re-request packets for that cycle again at a later date. If m_last_cycle has been set to the cycle number of such a packet, the backwards movement of the cycle counter will be seen when the kernel makes the repeated request later on.

The test patch tries to get around this by only setting m_last_cycle when no retries are needed for the cycle. However, since the kernel will in general request a bunch of packets for other cycles before going back to the deferred cycles, FFADO will believe that cycles have been missed during these requests because m_last_cycle will not be one less than the current cycle for these subsequent cycles.

To get around this, in addition to the general idea of the isohandler-test.diff patch, IsoHandlerManager::IsoHandler? will probably have to keep track of the last deferred cycle number and use this in conjunction with m_last_cycle when testing for dropped packets.

05/09/14 17:39:23 changed by jwoithe

Good morning. :-)

I mulled over this overnight and have come up with a revised patch: isohandler-test-2.diff (attached). When convenient it would be great if Andreas or Juanramon could apply it to the latest svn snapshot and let me know whether it changes the behaviour with regard to the backwards cycle timer movement on their system. To apply this patch, run

  patch -p0 < isohandler-test-2.diff

when in the top-level FFADO source directory.

05/09/14 17:40:09 changed by jwoithe

  • attachment isohandler-test-2.diff added.

A better test for the source of backwards moving cycle numbers in transmit handlers

05/11/14 01:56:25 changed by juanramon

Ok I've rebuild ffado r2520 applying isohandler-test-2.diff patch I attach the jackd -d firewire -p512 -n3 -v6 output

05/11/14 04:12:44 changed by jwoithe

juanramon: thanks for testing. It take it that the problem persisted even with the isohandler-test-2.diff patch. That's a pity - I thought we might be onto something there. At this stage this is just conjecture though since the output does not appear to have made its way into the ticket. I shall await the output before reading too much into things.

05/13/14 05:40:48 changed by jwoithe

In the absence of feedback I have tidied up isohandler-test-2.diff and committed it as r2528. I have thought things through and I am fairly certain that this patch is needed to handle the deferred cases. However, whether it completely resolves the "backwards cycle timer" seen by Andreas and Juanramon remains to be seen. I would still love to hear about the results when using FFADO with this patch applied (svn r2528 or later) on one of the affected machines. In the event that things still don't work the "-v6" output would be good to see.

If r2528 completely solves the problem on these two machines it is still not entirely clear to me why the issue was only critical on a very small number of systems, and why seemingly only when using the RME devices. The lack of a plausible explanation for this is the main reason why I suspect that there may be something else.

05/13/14 23:03:49 changed by juanramon

It doesn't still resolve the problem. But you will be... ;-) I've attached the next file: ffado-r2529-debian.log I hope it will help you to understand what happens inside our machines. Thanks and regards.

05/13/14 23:12:47 changed by juanramon

Now i am seeing that last attached file, patch applied one (ffado-patch-debian.log), has not been uploaded. Why? I don't know. Anyway, I've rebuild revision 2529 and I'll upload output file as I said in last post. I'm sorry about this misunderstanding. I'm following your work.

05/13/14 23:13:16 changed by juanramon

  • attachment ffado-r2529-debian.log added.

05/13/14 23:15:24 changed by juanramon

This time the file has been uploaded properly.

05/16/14 07:32:22 changed by jwoithe

Thanks for the file. I will ponder our next move, which will probably involve adding some additional output in the event of this particular error in order to determine whether the problem lies within FFADO itself. It certainly would have been good if r2528 fixed the issue since I'm fairly sure it addresses a real problem. Clearly there is another which is yet to be identified.

05/17/14 06:06:06 changed by jwoithe

  • attachment cycle_test-1.diff added.

Patch to investigate cycles being requested

05/17/14 06:13:08 changed by jwoithe

juanramon: I have attached another patch which I would like to to apply against r2529 (or whatever is the latest revision at the time) and retest using "-v6". It is attached as cycle_test-1.diff which you can apply using

  patch -p0 < cycle_test-1.diff

in the usual way. This patch simply stores the last 256 cycles requested of the handler and will dump these out when the backwards cycle movement is detected. Assuming the patch doesn't change the behaviour(*) this should give us a definitive answer as to where the backward cycle counter is coming from. The patch has been compile-tested only - since the fault condition isn't triggered on my system I can't do runtime testing. :-)

(*) On Andreas's system we observed that the backward cycle counter behaviour could sometimes be made to go away when debugging code was added at various points. This indicates that the issue is time sensitive. I am hoping that this added code won't do that and instead will tell us something useful.

05/17/14 07:40:08 changed by juanramon

Ok I've compiled r2530 with cycle_test-1.diif aplied. I attach the file: ffado-2530-cyclepatch1.log

05/17/14 18:42:09 changed by jwoithe

juanramon: it appears that you may have forgotten to install the newly compiled libffado:

  ffado_streaming_init: libffado 2.1.9999-2529 built May 14 2014 07:44:13

This is a slightly earlier version (r2529) from a few days ago (May 14). Could you please check to see if something has prevented the latest patched libffado being used? In particular, the latest log file doesn't contain output from the things I added in the cycle_test-1.diff patch.

05/18/14 02:56:47 changed by juanramon

Sorry. I don't konw what happens about my compilations... !?

I'm going to overwrite the last attached log file: ffado-2530-cyclepatch1.log

I think this time it will be the googd one. ;-)

05/18/14 02:57:23 changed by juanramon

  • attachment ffado-2530-cyclepatch1.log added.

05/18/14 02:59:13 changed by juanramon

Sorry again. The last file is conmpiled with 2529 revision.

05/18/14 02:59:45 changed by juanramon

  • attachment ffado-2529-cyclepatch1.log added.

05/18/14 03:23:01 changed by jwoithe

  • attachment cycle_test-2.diff added.

Cycle timer debug patch number 2.

05/18/14 03:35:29 changed by jwoithe

juanramon: thanks for the updated log. There's no problem with it being against r2529; r2530 only added a new configuration file entry for an unrelated audio interface. The revised log contains the output from the ffado-2530-cyclepatch1.log which clearly shows that getPacket() is being called for cycle 3231 after a previous call has been made for cycle 3235. Since m_deferred_cycles is zero at the time of the cycle 3231 call we can conclude that cycle 3235 was not deferred. The puzzle now is to work out why libraw1394 made a callback for cycle 3231 when cycles up to 3235 had seemingly been handled already.

The next step is to see what exactly getPacket() is returning for the earlier cycles. I have attached cycle_test-2.diff for this purpose. It should apply to r2529 or later (take your pick). If using your existing source tree you'll have to revert cycle_test-1.diff and then apply cycle_test-2.diff. You could use the "svn revert -R ." command to do the reversal, or request that patch do it:

  patch -p0 -R < cycle_test-1.diff

Then apply cycle_test-2.diff, recompile and reinstall in the usual way. This time the output will include information about the value returned for the early cycles which may give us a clue as to why libraw1394 is re-requesting cycles seen previously. As before, this patch has only been compile-tested by me.

05/18/14 07:23:08 changed by juanramon

Each time less words... :-D ffado-2529-cyclepatch2.log

05/18/14 07:23:47 changed by juanramon

  • attachment ffado-2529-cyclepatch2.log added.

05/18/14 17:58:28 changed by jwoithe

Thank you for the new log output, which indicates that the return value from getPacket() was 0 (RAW1394_ISO_OK). This isn't all that surprising since I would generally expect this to be the case. That is, that all cycles are handled without errors or other conditions.

The IsoHandlerManager::IsoHandler::getPacket() method is, I believe, set up as the libraw1394 iso transmit callback function. Unless I'm missing something, the contents of ffado-2529-cyclepatch2.log seem to suggest that libraw1394 is sending the backward cycle number to the callback. In other words, if this output is to be believed the problem must lie in a layer of code below FFADO (libraw1394 or the kernel).

To delve deeper into this we will need to apply some debug patches to libraw1394 so see what it sees. Would you be willing to obtain the source for libraw1394 and pursue this? The process will be similar to what we've been doing with libffado.

05/18/14 22:49:36 changed by juanramon

I've tried to rebuild with a libraw1394 downgraded (2.1 to 2.0.9) and its behaivor was the same. Do you want I try some older libs of raw1394 ? Maybe this I know it works under Ubuntu 12.04? Greatings

05/19/14 00:10:34 changed by jwoithe

At this stage we don't need libraw1394 downgraded: instead what I need to do is supply you with a patch against libraw1394 2.1.0 which will hopefully provide more information about what's going on with the cycle numbers being requested from FFADO. What this will involve is getting the libraw1394 source, patching it with a diff I supply (in much the same way we've been doing with FFADO), compiling it and copying the resulting libraw1394.so.* file into the system area (after making a backup copy of the original to make it easy to go back later).

Andreas and I tried a great number of variations on the theme, including different libraw1394 versions and compiling under different gcc versions. The conclusion we ultimately reached at the time was that the fault was not following either of these variations exclusively even though early indications suggested otherwise. For this reason we need to get more information about what libraw1394 is requesting and hopefully the reasons for those requests. We'll stick with libraw1394 2.1.0 since that's what ships on your system and we know it repeatedly triggers the error condition.

(follow-up: ↓ 32 ) 05/19/14 22:54:43 changed by juanramon

Could the origin of this problem be the same that generate random latency reported in ticket #363 ?

Or would it be a different problem's root?

(in reply to: ↑ 31 ) 05/19/14 22:58:04 changed by jwoithe

Replying to juanramon:

Could the origin of this problem be the same that generate random latency reported in ticket #363 ?

It is almost certainly a different problem which stems from the way the RME driver interacts with the timing infrastructure of FFADO.

I hope to work up a patch for libraw1394 in the next day or so. I assume you'd be happy to test it for me?

05/20/14 06:39:22 changed by juanramon

sure, I'm on your side. When it will ready, tell me.

05/23/14 03:51:47 changed by jwoithe

  • attachment libraw1394-test-patch-1.diff added.

Test patch 1 for libraw1394

05/23/14 04:00:43 changed by jwoithe

Ok, I have attached an initial patch against libraw1394 1.2.0 (libraw1394-test-patch-1.diff). Before trying it, obtain the libraw1394 1.2.0 source and compile it without any changes:

  ./configure
  make

libraw1394.so.11.1.0 will almost certainly be in /usr/lib/ on your system. Make a backup copy of this file. Manually replace it with src/.libs/libraw1394.so.11.1.0 from the libraw1394 source that you compiled (just use "cp"). Re-run the test and confirm that the backwards moving cycle time is still seen. This eliminates the possibility that any compilation flags used by your distribution are influencing this bug.

Assuming the fault is still showing, apply the libraw1394-test-patch-1.diff patch. When in the top-level libraw1394 directory:

  patch -p1 < libraw1394-test-patch-1.diff

Recompile and copy the result to /usr/lib/ as before. Then rerun the test again and see what happens. Assuming it crashes out with the backwards cycle timer please make the debug output available as before.

05/24/14 01:26:53 changed by juanramon

This is the first build , before appling the patch. But I get

libraw1394.so.8.1.0

After

$ ./configure -V

I've got

libraw1394 configure 1.2.0
generated by GNU Autoconf 2.59

So I thing the source is the good one. The code is version 1.2.0 downloaded directly from sourceforge.

What can it be happening?

05/24/14 02:14:01 changed by juanramon

I attach configure and make logs

05/24/14 02:15:18 changed by juanramon

  • attachment libraw1394-configure.log added.

05/24/14 02:16:06 changed by juanramon

  • attachment libraw1394-make.log added.

05/24/14 04:00:52 changed by jwoithe

The libraw1394 on sourceforge is not the most recent version. That probably explains the earlier version number you're seeing after compilation. Furthermore, the version being reported by "configure" (which appears to be the same as the library version) is 1.2.0 - not the current release which is 2.1.0. Sourceforge does have 2.0.5 for download but clearly that's also not the current version.

The canonical source for libraw1394 now is now on kernel.org. If you go to https://www.kernel.org/pub/linux/libs/ieee1394/ and scroll to the bottom you'll find the source tarball for version 2.1.0. Grab that and you should be fine.

Apologies for not pointing this out earlier.

05/24/14 07:22:35 changed by juanramon

Sorry about my ineptitude.

I've compiled libraw1394 from code without applying any change, as you said , just to see if your patch would work ok. And surprisingly the result was that ffado works fine with rme, only replacing libraw1394.

I'll attach log file, but there is not any output like in previous logs (getPacket).

Could it be a libraw1394 distro package fault.

Do you want to try it anyway applying your patch to libraw1394?

I can confirm after serveral attempts, that it works.

05/24/14 08:16:14 changed by juanramon

Ok I can't upload log file because it is too big.

05/24/14 21:53:52 changed by jwoithe

Thanks for reporting the outcome of the first step of the test. This is the sort of difficulty we've been having while chasing this bug down: certain changes appear to fix the problem even though there's no technical reason why they might. In your case you are running the same version of libraw1394 that was supplied by your distribution and yet it's now working fine. Andreas saw the same sort of thing.

What we now need to do is work out whether debian testing applies any patches to libraw1394 2.1.0. We also need to find out whether they use any explicit options to "configure" (in your present case we're just using the defaults).

Taking a quick look at https://packages.debian.org/jessie/libraw1394-11 (since I don't have a lot of time right now), there is a libraw1394_2.1.0-1.debian.tar.gz tarball which contains a small number of patches (2). However, neither patch impacts on the streaming functionality that we are focusing on so I don't think they are implicated. What I can't work out straight away is which configure options are used when compiling for Debian testing: the aforementioned Debian tarball does not seem to include this information, so it must be stored elsewhere. We need to work out where we can find this and see whether they use anything which differs from the result.

The idea in the first instance is to work out how to compile libraw1394 so it exactly matches what Debian does. Logically this then should make your compiled version behave in the same way as the one supplied by Debian. When we get to that point we can consider patching libraw1394 to see if it's possible to do so without "fixing" the errant behaviour.

Don't worry about the latest output being too big - since jackd/ffado worked in this instance it is unlikely to tell us anything useful.

05/25/14 04:06:44 changed by jwoithe

Further to comment 40, I've done some more digging. Looking at debian/rules in the libraw1394 distribution it appears that no configure options are used which could affect the code itself. The only configure options used set the prefix, man and info directories. However, CFLAGS is set to one of two values depending on the setting of the DEB_BUILD_OPTIONS variable. I am not sure where this is set and how the "noopt" token works in the context, but my initial guess is that by default the Debian libraw1394 is compiled with CFLAGS set to "-O2 -Wall -g". The alternative is "-O0 -Wall -g"; the difference being a less aggressive optimisation setting. "-O0" is the default optimisation if no "-O" parameter is given.

To check out this theory, please re-configure and recompile libraw1394 using the following sequence:

  CFLAGS="-O2 -Wall -g" ./configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
  make

This should recompile the entire library. Copy the resulting libraw1394.so.11.1.0 as you did previously and then re-test. If this mirrors how the debian package was compiled then the backwards cycle count should reappear since the code should be identical.

If for some reason it doesn't, for completeness you might like to repeat the above steps, but replace "-O2" with "-O0".

I am hoping the bug resurfaces with one of these tests. We can then go on to patching libraw1394 and seeing what that results in.

By the way, please retain the final patched libffado for these tests too. That output will still be useful when interpreting the debug output from the patched libraw1394.

05/27/14 13:50:33 changed by juanramon

I'm here again.

I've compiled libraw1394 with -O2 flag and it doesn't work. Great! I've reconfigure and recompiled it aplying the patch.

But there is none of de error messages added from the patch in the log file I'll attach ffado-libraw-patched.log

05/27/14 13:51:09 changed by juanramon

  • attachment ffado-libraw-patched.log added.

05/27/14 18:31:58 changed by jwoithe

ffado-libraw-patched.log is interesting. As you say, none of the libraw1394 debug messages fired which at face value means that from libraw1394's point of view nothing is amiss. I assume you recopied the recompiled libraw1394 after applying the patches.

I shall review the libraw1394 patch to make sure it does what I think it should do. If that checks out ok that means that the cycle number being seen by getPacket() is being altered somewhere between the libraw1394 client call and FFADO's getPacket().

One minor issue with the libraw1394 patch I've noticed just now is that if the quantity "fwhandle->iso.packet_count-pkt_count" happens to be negative then expected_cycle will go backwards. I've added a test for this condition and will attach the resulting libraw1394-test-patch-2.diff patch. Please revert the first, apply this and re-test. If this doesn't fire (and I don't find further holes in the logic) then I think we can conclude the problem lies higher up in the call chain.

05/27/14 18:32:47 changed by jwoithe

  • attachment libraw1394-test-patch-2.diff added.

Test patch 2 for libraw1394

05/28/14 01:38:43 changed by juanramon

  • attachment ffado-libraw-patched2.log added.

05/28/14 01:41:36 changed by juanramon

Same compiling conditions for: ffado-libraw-patched2.log

05/28/14 01:45:03 changed by juanramon

There is neither any message from libraw patched compilation.

05/28/14 03:37:57 changed by jwoithe

juanramon, thanks for retesting. I can't view the log file right now , but I can't say that I am surprised that this revised patch didn't change much. Hmm. Assuming I haven't missed something, this seems to say that the errant cycle number is somehow being introduced between this part of libraw1394 and the section of FFADO which detects the fault. Interesting.

I’m busy tonight, but in the next couple of days I will try to do up a patch to more completely instrument these additional stages. Hopefully that will shed further light on this issue.

05/29/14 05:40:18 changed by jwoithe

  • attachment libraw1394-test-patch-3.diff added.

Test patch 3 for libraw1394

05/29/14 05:42:04 changed by jwoithe

Another day, another libraw1394 patch. This adds only one more test, but it's an interesting one. Please test libraw1394-test-patch-3.diff in the usual way and post the outcome.

05/30/14 11:09:12 changed by juanramon

  • attachment ffado-libraw-patched3.log added.

05/30/14 11:11:36 changed by juanramon

Ok I've uploaded a log file. ffado-libraw-patched3.log

I hope it reveals something interesting.

05/31/14 05:13:00 changed by jwoithe

  • attachment ffado-2539-cyclepatch3.diff added.

Cycle timer debug patch number 3

05/31/14 05:13:38 changed by jwoithe

  • attachment libraw1394-test-patch-4.diff added.

Test patch 4 for libraw1394

05/31/14 05:16:39 changed by jwoithe

Thanks for the latest log output. Interestingly enough the traces I've inserted are still not being triggered in the fault condition. This is getting really interesting. We're getting there I think, but slowly.

I have two fresh patches: ffado-2539-cyclepatch3.diff (against FFADO r2539) and libraw1394-test-patch-4.diff (against libraw1394 2.1.0). Could you please revert previous patches, apply these, compile/install the resulting libffado and libraw1394 libraries and post the resulting output?

05/31/14 10:30:52 changed by juanramon

Here is comming the last log. ffado2539-p3-libraw-p4.log

05/31/14 10:31:30 changed by juanramon

  • attachment ffado2539-p3-libraw-p4.log added.

05/31/14 17:49:45 changed by jwoithe

  • attachment ffado-2539-cyclepatch4.diff added.

Cycle timer debug patch number 4

05/31/14 17:53:22 changed by jwoithe

Thanks again. That last log was instructive. However, there was a bug in the ffado-2539-cyclepatch3.diff patch. Could you please retrieve ffado-2539-cyclepatch4.diff, apply it and rerun the test?

Can you also double-check that the patched version of libraw1394 has been copied to the relevant place? The last log did not include some output that I would definitely have expected to come out of libraw1394-test-patch-4.diff.

06/01/14 03:10:42 changed by juanramon

I've noticed a strange thing referent to libraw1394

$ ls -la /usr/lib/x86_64-linux-gnu/libraw1394*
[...]
lrwxrwxrwx 1 root root  /usr/lib/x86_64-linux-gnu/libraw1394.so -> libraw1394.so.11.1.0
lrwxrwxrwx 1 root root  /usr/lib/x86_64-linux-gnu/libraw1394.so.11 -> libraw1394.so.11.1.0.backup
[...]

The soft link of libraw1394.so.11 points to original (backup copy). But I'd never create it.

Could it be the reason of no output messages? Must I re-point it to the new (re-compiled) libraw1393.so.11.1.0 ?

06/01/14 03:35:23 changed by jwoithe

Ah. Yes, that symlink (soft link) could well be the reason why libraw1394's debug output is not being seen. I'm glad we've got a reason - I was really starting to become puzzled there. :-)

The symlink would be created by ldconfig at boot, and due to the sort order it uses to determine "the most recent version", the libraw1394.so.11.1.0.backup file will be identified as such in preference to libraw1394.so.11.1.0. Probably the easiest way to avoid this problem is to move the backup file out of the library directory, or call it something which doesn't start with "lib" (eg: backup-libraw1394.so.11.1.0). Then either manually fix the symlink or run "ldconfig /usr/lib/x86_64-linux-gnu/" which should do that for you.

Once this is confirmed fixed you should not have to revisit the libraw1394.so.11 symlink: it should remain correct even across reboots.

06/01/14 04:29:13 changed by juanramon

Well. Two things First: When I compile libraw witk -O2 option ffado works Ok.

Second: When I've compiled with -O0 option it doesn't work, but finally! the debug messages apear in the log file.

ffado2539-p4-libraw-p4.log

06/01/14 04:29:38 changed by juanramon

  • attachment ffado2539-p4-libraw-p4.log added.

06/01/14 04:33:12 changed by juanramon

So I think the default option in Debian package is with -O0 option when it was compiled.

06/01/14 18:58:28 changed by jwoithe

I have posted a description on ffado-devel of what I think might be happening (see thread "Revisiting backward cycle number sequence (ticket 379)"). While awaiting comments from Stefan and others on this, you might like to try out a proposed fix. From what I can tell the problem lies in libraw1394 and I've done up a proof-of-concept patch which addresses it (attached here as libraw1394-backward_cycle_fix-rc1.diff). Could you please apply this to libraw1394 and re-test? In the first instance you might like to leave the most recent libraw1394 debug patch applied for this - that way if I'm wrong there will at least still be some debugging output to go on. If the result seems to work, then try applying the patch to an otherwise unpatched libraw1394. You could test both "O0" and "-O2" compilations to be doubly sure that the problem is resolved.

Let me know how you get on.

06/01/14 19:00:40 changed by jwoithe

  • attachment libraw1394-backward_cycle_fix-rc1.diff added.

Potential fix for libraw1394 problem causing backward cycle number movement

06/02/14 10:09:44 changed by juanramon

  • attachment ffado2539-p4-libraw-fix1-O0.log added.

log file in rev2539 + patch4 and libraw1394-O0option + patch4 + fix-rc1

06/03/14 03:31:05 changed by jwoithe

Thanks once again for your continued testing. Evidence suggests that we are getting somewhere at last. Your most recent test did not fail due to the backward cycle number problem which indicates that the proposed libraw1394 fix might just do the trick. Of course we might have inadvertently masked the problem with the changed code, but given how consistently your system could trigger the bug I am fairly confident we have identified the underlying cause.

The error you got this time was this:

Warning (TimestampedBuffer.cpp)[ 511] writeFrames: ringbuffer full, 784, 15
Debug (StreamProcessor.cpp)[ 587] putPacket: processPacketData xrun

This is an indication that the higher levels of the JACK system couldn’t clear the incoming audio data fast enough. Given the amount of debug output we are producing this doesn’t surprise me or concern me at this stage.

At this point I think it is worthwhile to revert to unpatched FFADO, use libraw1394 with just the proposed fix patch, and retest. Try with libraw1394 compiled with “-O0” first perhaps, and then try with ”-O2” and see what happens. ”-O0” may still give trouble but the optimised one should be fine. In any case, what we are wanting to do in the first instance is prove that the backward cycle number problem is cured with the fix I have proposed. Any other remaining issues which show up can then be dealt with as necessary.

06/03/14 04:52:01 changed by juanramon

  • attachment ffado2539-nop-libraw-O0-fix1.log added.

Log with -O0 option. RME doesn't works

06/03/14 04:53:19 changed by juanramon

  • attachment ffado2539-nop-libraw-O2-fix1.log added.

Log with -O2 option . RME works Ok

06/03/14 04:54:55 changed by juanramon

I've uploaded both log files, with -O0 and -O2 option in libraw compilation and ffado without any patch applied.

06/03/14 16:34:22 changed by jwoithe

Thank you for running these tests. The "-O2" test appears to have run perfectly which is great news. The "-O0" test died with a full ringbuffer in writeFrames() again, but it appears that this test was run with a libraw1394 which still had at least some debug output enabled (in particular, the "queue packet fail" lines are still being printed). Because of how frequently they occur it is quite likely that they are causing the ringbuffer problem.

It would be good if you could rerun the test with libraw1394 compiled with "-O0" without the extra debug output that we added - that is, with just the fix patch applied, as you did for this last "-O2" test. Since "-O0" was more likely to trigger the backward cycle number on your system, this is probably the final test we need in order to conclude beyond doubt that the libraw1394 fix is the correct way to address this ticket.

Having said that, the crash of the latest "-O0" log wasn't due to a backward cycle number so the chances are good that we have the correct fix. Retesting without the libraw1394 debug output should put this beyond doubt.

06/04/14 02:24:22 changed by juanramon

  • attachment ffado2542-libraw-O0-ndbg-fix1.log added.

06/04/14 02:29:45 changed by juanramon

Uploaded libraw log with no debug patch, only fix patch. I'm sorry.

06/04/14 06:24:20 changed by jwoithe

Thanks for the updated log - no need to apologise. While this ("-O0" libraw1394) did not allow jackd/ffado to start, it got past the point where the backward cycle numbers would have happened in the past and failed for an unrelated reason (xrun in the receive handler due to a full buffer). I shall review where we are at tomorrow evening (I've been busy tonight), but the evidence of these last tests suggest that the cause of the backward cycle numbers has been correctly identified and that the proposed libraw1394 patch fixes it. This would result in this ticket being closed as "invalid" since the bug is in an external library (libraw1394).

I will also give some thought to the receive-side xruns you're getting. It could simply be a performance issue with your system, especially since an "-O2" compiled libraw1394 is clearly happier. Expect more thoughts about this tomorrow.

06/10/14 03:20:25 changed by jwoithe

juanramon:

Well, that didn't exactly go to plan and I'm now about a week late. Sorry about that.

Regarding the original topic of this ticket (which was the backward cycle numbers) I am fairly confident that closing this as "invalid" (due to it not being our bug) is the right approach. The fix I posted for libraw1394 appears to be the correct approach to me, although I am still awaiting feedback from those with more direct experience with libraw1394.

There really isn't a neat way for FFADO to work around this libraw1394 issue. While we could adopt the view that any backward movement of the cycle timer is indicative of the libraw1394 bug, I am reluctant to do this since it may mask other future problems. In addition, it's not clear how an ignored cycle might affect the previously queued buffer within libraw1394 and the kernel. All things being considered I think this thought is best avoided.

I would be interested to hear your thoughts on the above process before closing this ticket.

As to the receive-side xruns that you are now seeing, I am almost certain that this is unrelated to the root cause of this ticket. That is, it is a separate issue. At this stage I suspect that something about your system's setup might be causing these although what this might be I cannot say at present. Since this is unlikely to be a bug it might make most sense for you to subscribe to ffado-devel and initiate a discussion there. A post of the "ffado-devel" output from your system to ffado-devel as part of your initial post would be helpful.

I have a suspicion that whatever is causing the receive-side xruns might also have been the reason why your system was able to trigger the libraw1394 bug which caused the backward cycle numbers. For instance, perhaps the system is getting held up for an extended period of time for some as yet unidentified reason. The usual suspects for this sort of thing include various wifi adapters, some card readers and binary proprietary graphics drivers. It might be instructive to start removing kernel modules for various pieces of hardware you don't require for audio work and see if the removal of a particular module improves things.

06/10/14 04:35:40 changed by juanramon

I don't know what to say about that.

Actually my system is a collage of pieces. So maybe something is wrong in its configuration. So, about xruns it don't worry me to much.

The most importatnt think was that jack (ffado) didn't start with rme and it was strange. If this ticket is not a ffado fault I think it is better to close it. And if you think it could bring future problems , maybe I can help libraw developers to fix this bug.

06/10/14 05:01:36 changed by jwoithe

Thanks for the feedback. About libraw1394 I am reasonably certain that libraw1394-backward_cycle_fix-rc1.diff is the fix that's needed; if that's the case there's no further work that needs to be done. I'll ping Stefan about this and see what happens. If he believes that the fix is the correct approach for this issue then I'll close this ticket.

I really do appreciate the help you've given to allow us to (probably) identify the cause of this strange problem and fix it. As I mentioned in the past I couldn't easily look into it directly myself because the backward cycle number never showed up on my system.

I'm more than happy to help work through the separate xrun problem with you. You can either post to ffado-devel or contact me privately (my address can be found in any of my posts to ffado-user or ffado-devel).

06/10/14 06:25:20 changed by stefanr

A mental note about this issue:

juanramon's OHCI-1394 controller is:

09:01.0 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 IEEE 1394 Controller [1180:0832] (rev 05) (prog-if 10 [OHCI])

Subsystem: ASUSTeK Computer Inc. Device [1043:1827]

Andreas Hehn's controller is according to his post on 22 August 2013:

Via VT6306/7/8 [Fire II(M)]

06/12/14 00:37:36 changed by juanramon

I have got 2 firewire ports in my laptop. The first one is a builtin in port

description: FireWire (IEEE 1394)
product: R5C832 IEEE 1394 Controller
vendor: Ricoh Co Ltd

And the next one thru PCI express card.

description: FireWire (IEEE 1394)
product: XIO2200A IEEE-1394a-2000 Controller (PHY/Link)
vendor: Texas Instruments

I ususally connect to the Ricoh builtin port my producction interface (TCelectronic K48) for convenience and practicality.

Since I cannot use RME for recording/monitoring becuase its "extra" delay (latency), I ussually connect it only for testing purposes (new revisions in ffado) to the PCIexpress card port (TexasInstrument?).

04/28/15 15:25:45 changed by stefanr

libraw1394-backward_cycle_fix-rc1.diff has been released in libraw1394 v2.1.1.