Ticket #299 (closed bug: fixed)

Opened 10 years ago

Last modified 10 years ago

FFADO does not read the correct VendorId under the Juju Stack

Reported by: euan Assigned to:
Priority: major Milestone:
Component: generic/dbus Version: FFADO 2.0.1
Keywords: Juju VendorId Cc: euan@dekock.net
The device the bug applies to: Audiofire

Description

Running Ubuntu 10.04 I have problems with the Juju firewire stack and my AudiofirePre?8 device. The FFADO system appears to get random nodeVendorId details from DBUS. Each time I power cycle the device it reports a different VendorId? and the mixer will not start. If I update the ffado configuration to reflect the new vendorId the Mixer App works. Jack works in all cases. I have attached detailed lists showing the old and new firewire stack on the same machine.

Attachments

old_firewire_state.txt (12.0 kB) - added by euan on 09/01/10 08:25:25.
SYstem State running old firewire stack
old_firewire_state.2.txt (12.0 kB) - added by euan on 09/01/10 08:27:09.
SYstem State running old firewire stack
new_firewire_state.txt (19.1 kB) - added by euan on 09/01/10 08:28:24.
System State with the Juju Stack
new_firewire_state1.txt (25.1 kB) - added by euan on 09/01/10 15:57:29.
firewire_debug.txt (8.6 kB) - added by euan on 09/02/10 07:20:31.
Debug output from ffado_test and /sys.../config_rom
firewire_fixed.txt (25.9 kB) - added by euan on 09/03/10 09:34:44.
Debug list showing how it is now fixed.
kern1.log (98.8 kB) - added by euan on 09/04/10 07:37:08.
Debug Log Output jackd
configrom.cpp.diff (2.0 kB) - added by euan on 09/06/10 08:02:39.
configrom.cpp showing my hack to force quad reads
kern_fixed_299.txt (11.2 kB) - added by euan on 09/07/10 08:27:56.
Debug output after csr1212.c patch
csr1212.c.diff (0.9 kB) - added by euan on 09/07/10 08:52:35.
Patched csr1212.c to remove Block Reads in ConfigROM

Change History

09/01/10 08:25:25 changed by euan

  • attachment old_firewire_state.txt added.

SYstem State running old firewire stack

09/01/10 08:27:09 changed by euan

  • attachment old_firewire_state.2.txt added.

SYstem State running old firewire stack

09/01/10 08:28:24 changed by euan

  • attachment new_firewire_state.txt added.

System State with the Juju Stack

09/01/10 14:16:38 changed by stefanr

Strange failure. I have no idea yet where it comes from.

But before this is investigated further, please note: You have libraw1394 v2.0.4 installed. This is not going to work for FFADO with firewire-core underneath. You need libraw1394 v2.0.5 for that. (The changes from v2.0.4 to v2.0.5 affect isochronous transmission only, which is needed for the outgoing audio stream. They are unrelated to asynchronous I/O as used for the reading of the device IDs, among else.)

Also keep in mind that kernel 2.6.32 was the very first one that was able to support FFADO on top of firewire-core and firewire-ohci. Some fixes to these drivers that are also relevant to FFADO have been released in the subsequent kernel versions. I have doubts whether it is worth to proceed looking into this issue on kernel 2.6.32. Your log mentions a 2.6.35 initrd; can you boot that kernel instead?

09/01/10 15:57:29 changed by euan

  • attachment new_firewire_state1.txt added.

09/01/10 15:59:45 changed by euan

With the 2.6.35 kernel and the libraw1394 2.0.5 libraries this still seems to manifest.

Attached is a new ffado-diag and ffado-list stream shwoing this with the new versions.

Please advise if you need anything further.

Thanks,

Euan.

09/02/10 07:19:44 changed by euan

This definitely appears to be in the call to csr1212_parse_csr( m_csr ). This is located in linux/drivers/ieee1394/csr1212.c which is getting a bit deep for me at the moment.

The attached file shows the /sys...config_rom hex dumps and the data in m_csr for all three devices connected to my machine. All three correctly have the VendorID at QUAD[3], and the Model at QUAD[8] as per the spec. However the configrom values we are getting from csr1212_parse_csr are wrong for the Audiofire device. (In fact the 0x1486 value is at QUAD[6] in our data). Do you think I should be passing this up to the kernel people ?

09/02/10 07:20:31 changed by euan

  • attachment firewire_debug.txt added.

Debug output from ffado_test and /sys.../config_rom

09/03/10 09:19:55 changed by euan

After further research, I seem to be closer to the problem.

There is a function static int busRead( struct csr1212_csr* csr,

u_int64_t addr, u_int16_t length, void* buffer, void* private_data )

This is called to read configROM data during the setup process.

Initially this gets called to read 5 quad words making up the bus_info_data. The system then calls this again for each device to read a block of 1024 bytes from further in the configROM. On the Audiofire device this fails after 200 bytes, but the csr1212_parse_csr(...) calls returns success without setting up the bus_info_data values.

I changed the busRead a bit to always read in quadlet blocks in a loop. This at least allows the first 200 to get through, and the system then processes the device correctly.

It does appear though that according to 5.5.6 of the 1394 Spec that the device should allow reads across the whole 1024 range. Again I have attached a doc showing the ffado-test being run with even more debug info to illustrate the problem.

It may be an idea to implement the method I have implemented as it is a bit more robust, and the spec does seem to mention that devices are allowed to reject block requests.

" By default, the Open HCI shall respond to quadlet read requests within the 1K configuration ROM, and send ack_type_error to any block read requests. "

If you want the code let me know and I'll tidy it up a bit and attach it.

Regards,

Euan.

09/03/10 09:34:44 changed by euan

  • attachment firewire_fixed.txt added.

Debug list showing how it is now fixed.

09/03/10 11:25:08 changed by stefanr

Good catch!

A related but different problem was found recently with an Alesis IO|14: http://sourceforge.net/mailarchive/message.php?msg_name=201008231501.20773.luispa8%40aim.com

FFADO startup failed with "Error (configrom.cpp)[ 150] initialize: Could not parse config rom of node 0 on port 0" (seen using the old kernel drivers).

The solution was to hardwire csr->maxrom = 4; in csr1212_parse_csr().

The (ugly, bloated) csr1212.[ch] code is an IEEE 1212 helper library whose code is also used in the old kernel drivers. A while ago, we already did the csr->maxrom = 4 change in the kernel's csr1212.c: http://git.kernel.org/linus/0bed1819687b50a7

This is necessary because of buggy device firmwares.

Background: The Configuration ROM register is a read-only register with up to (but usually considerably less than) 1024 bytes contents. All devices must support 4-bytes sized, 4-bytes aligned read requests in the Configuration ROM. Some devices support larger and differently aligned requests also and indicate this in a field within the 3rd quadlet of the Config ROM, i.e. in Bus Options.

Some firmwares simply have junk in that quadlet. Neither the kernel nor application software must trust what's in there. (Actually, application software should not have to deal with matters like these, which is why firewire-core offers userpsace programs to use the kernel's Config ROM cache of each device instead of having to fetch the Config ROM itself everytime. At present, libraw1394 based programs like FFADO don't profit from this because the raw1394 does not have such a feature.)

Your Audiofire says in its Bus Options (0xe064a212):

irmc 1, cmc 1, isc 1, bmc 0, pmc 0, cyc_clk_acc 100,

max_rec 10 (2048), max_rom 2, gen 1, spd 2 (S400)

max_rom 2 means: "Quadlet read requests and block read requests aligned on quadlet addresses with a data length less than or equal to 1024 bytes are supported." crs1212.c attempts to use that in order to read the Config ROM with as few requests as possible.

One more thing would be interesting: Why was this not a problem with the old kernel drivers underneath? Could you please do one more check:

  • Plug the device in/ switch it on.
  • As root, switch on debug logging of asynchronous requests and responses: echo 1 > /sys/module/firewire_ohci/parameters/debug
  • Start jackd (with original FFADO with unconstrained csr->maxrom).
  • You now got a lot of debug spew in /var/log/messages. Attach the AT/AR debug messages here.
  • As root, get rid of debug logging again before it fills your system disk. :-) echo 0 > /sys/module/firewire_ohci/parameters/debug

Thanks in advance.

PS: You can get a nice representation of Configuration ROM contents by means of the command # crpp < /sys/bus/firewire/devices/fw1/config_rom The tool crpp is a small Python script and available at http://user.in-berlin.de/~s5r6/linux1394/utils/

09/04/10 07:37:08 changed by euan

  • attachment kern1.log added.

Debug Log Output jackd

(follow-up: ↓ 9 ) 09/04/10 07:50:14 changed by euan

Hi Stefan, Thanks for following up. Attached is the kernel log output - its all starting to make sense to me now. The BR (Block Read) do show as failing in the log.

From my reading of the 1212 spec, I don't think the device should respond to a request to read 1024 bytes if it doesn't have that much data.

What I understand it to mean is that while we are parsing the directory tree, we can fulfill any request with a block read for the amount we need at that point. For instance a block request to get the descriptor leaf data (after reading the header) would need a block read of 0x0A quadlets (in my configrom), which should be a valid request and should be honoured.

Unfortunately the spec doesn't specify a single length value for the whole ROM, so we cannot/should not pre-fetch it.

(follow-up: ↓ 8 ) 09/04/10 08:06:06 changed by euan

Just looking at crpp - it references a tool, firecontrol that can send single reads. This would be very useful in my next step - getting 96K samplerate working. Is this a tool you've written, or do you have a link to it. Thanks,

Euan.

(in reply to: ↑ 7 ) 09/04/10 09:19:58 changed by stefanr

Replying to euan:

Just looking at crpp - it references a tool, firecontrol that can send single reads. This would be very useful in my next step - getting 96K samplerate working. Is this a tool you've written, or do you have a link to it.

firecontrol, formerly called 1394commander, was written by Manfred Weihs. A handy tool its is indeed. The project site is http://firecontrol.sourceforge.net/.

(in reply to: ↑ 6 ) 09/04/10 14:05:47 changed by stefanr

Replying to euan:

Attached is the kernel log output - its all starting to make sense to me now. The BR (Block Read) do show as failing in the log.

The debug log only shows ack_complete and data_length = 0 (which is not what tcsr1212.c desired) alas not the rcode (Response Code). But since csr1212.c retries the block read, the rcode must have indicated failure of the transaction.

The fact that csr1212.c corrupts its own data in these cases (from the looks of it) if the new drivers run underneath, but not of the old drivers run underneath, could be because of a weakness in libraw1394. Its firewire-core backend src/fw.c contains this code:

	case FW_CDEV_EVENT_RESPONSE:
		rc = u64_to_ptr(u->response.closure);

		if (rc->data != NULL)
			memcpy(rc->data, u->response.data, rc->length);

		errcode = fw_to_raw1394_errcode(u->response.rcode);
		tag = rc->tag;
		free(rc);

		return fwhandle->tag_handler(handle, tag, errcode);

The rc->length in the memcopy is the requested length, not the actually received length. Correct would be to copy only if the response had RCODE_COMPLETE and only as much as the actual response length. However, the libraw1394 client API does not seem to have a good way to notify the application if less bytes were received than requested. I will send a post to linux1394-devel.

From my reading of the 1212 spec, I don't think the device should respond to a request to read 1024 bytes if it doesn't have that much data.

"Some of the locations within the first 1024 bytes of configuration ROM may be unused in the sense that they are not part of the navigable hierarchy of configuration ROM; these locations shall return response data of values of zero." This Audiofire clearly violates the spec. But csr1212.c on the other hand is clearly not robust enough. As I mentioned, its kernel counterpart has been changed to use only quadlet reads. Everything else is simply too fragile if confronted with real firmwares.

Anyway. If you have it fixed for your device, please attach the diff here so that the FFADO developers can commit it.

What I understand it to mean is that while we are parsing the directory tree, we can fulfill any request with a block read for the amount we need at that point. For instance a block request to get the descriptor leaf data (after reading the header) would need a block read of 0x0A quadlets (in my configrom), which should be a valid request and should be honoured.

Correct. But as I said, this may still go wrong with buggy firmware.

Unfortunately the spec doesn't specify a single length value for the whole ROM, so we cannot/should not pre-fetch it.

If max_rom = 2, a block read of 1024 bytes starting at the beginning of the Config ROM is supposed to succeed. (Provided that the connection speed is S200 or more.) However, some firmwares simply have garbage in max_rom, or in lspd for that matter, or in other parts of the Config ROM. Can happen, nobody is perfect. :-)

09/04/10 17:13:01 changed by stefanr

I proposed a patch for libraw1394: http://thread.gmane.org/gmane.linux.kernel.firewire.devel/14415

As far as I understand, this should fix your issue too without a need to change libffado's csr1212.c. However, due to the other problem with a different device (comment:5), csr1212.c needs to be reduced to quadlet reads regardless.

09/06/10 08:02:39 changed by euan

  • attachment configrom.cpp.diff added.

configrom.cpp showing my hack to force quad reads

09/06/10 08:05:55 changed by euan

This is the code I implemented to get the Audiofire Pre8 to work. At least the first 200 Bytes get read, so the system can correctly parse the configrom dir structure. Feel free to modify as needed.

Regards,

Euan.

09/07/10 01:30:52 changed by stefanr

Your patch looks correct. But simpler and more to the point is to change csr1212.c to ignore max_ROM of device firmwares, as in http://git.kernel.org/linus/0bed1819687b50a7. Of that diff, only the last csr1212.c hunk is required:

--- csr1212.c
+++ csr1212.c
@@ -1443,15 +1428,13 @@ int csr1212_parse_csr(struct csr1212_csr *csr)
 	if (ret != CSR1212_SUCCESS)
 		return ret;
 
-	if (!csr->ops->get_max_rom) {
-		csr->max_rom = mr_map[0];	/* default value */
-	} else {
-		int i = csr->ops->get_max_rom(csr->bus_info_data,
-					      csr->private);
-		if (i & ~0x3)
-			return -EINVAL;
-		csr->max_rom = mr_map[i];
-	}
+	/*
+	 * There has been a buggy firmware with bus_info_block.max_rom > 0
+	 * spotted which actually only supported quadlet read requests to the
+	 * config ROM.  Therefore read everything quadlet by quadlet regardless
+	 * of what the bus info block says.
+	 */
+	csr->max_rom = 4;
 
 	csr->cache_head->layout_head = csr->root_kv;
 	csr->cache_head->layout_tail = csr->root_kv;

The rest of that kernel patch is only optional cleanup. It omits the size argument from csr->ops->bus_read(), removes the csr->ops->get_max_rom() interface and implementation, and removes a block read -> quadlet read degradation in a csr1212.c error handler, since none of this is used anymore after the csr->max_rom = 4 change.

09/07/10 08:27:56 changed by euan

  • attachment kern_fixed_299.txt added.

Debug output after csr1212.c patch

09/07/10 08:51:43 changed by euan

OK, agreed. I have patched the csr1212.c, and the system is now only sending QR requests as expected, and the Audiofire is correctly detected. ffado-mixer also works (after manually fixing the QToolButton bug in audiofire.py #298)

Jack still doesn't start, but thats an exercise for another day. Been compiling libs etc so I need to rationalise the whole stack first.

Euan.

09/07/10 08:52:35 changed by euan

  • attachment csr1212.c.diff added.

Patched csr1212.c to remove Block Reads in ConfigROM

09/11/10 08:21:15 changed by stefanr

An Echo Audiofire 2 owner confirmed on IRC that the bug and the fix (the one in the previous comment) apply to that device as well.

10/09/10 03:31:23 changed by jwoithe

  • status changed from new to closed.
  • resolution set to fixed.

It seems that integrating the max_rom fix is sensible and straight-forward. This has been done in trunk in r1908.