Ticket #181 (closed bug: invalid)

Opened 6 years ago

Last modified 5 years ago

ffado has issue with nvidia8200 chipset (2.6.27 kernel)

Reported by: dx9s Assigned to:
Priority: minor Milestone: FFADO 2.0
Component: Version: FFADO SVN (2.0 branch)
Keywords: b1484 Cc:
The device the bug applies to:

Description

I am still producing a set of instructions on HOW I got this... but to summarize:

two variables:

1) boot-time option a) "nosmp" b) (default) -> uni/smp toggling the kernel 2) jack driver a) jackd -d dummy b) jackd -d firewire -> jackd-dummy / ffado "setup"

under nosmp: dummy and ffado work fine

under smp: dummy works fine but ffado exhibits xruns and usually at one particular time.

The xruns occur regardless of jackd settings 3x128@48kHz or 3x512@48kHz (note dummy only works in 2x mode).

testing proceedure is to reboot and set boot time option (if needed); start terminate and swapoff -a then htop; then start qjackctl with settings for dummy or ffado and start it (note: my qjackctl also disabled cron and I have no "System" update service configured so no surprise scheduled system events hopefully). then I proceed to start jackd and ardour... from here I record 8 tracks at 8 channels (64) to produce a lot of disk I/O.

the xruns occur USUALLY at (or before) when all un-used memory (which is used for disk-caching) is filled and the disk caching goes into FIFO (first in first out) for re-use of caching newer disk I/O data.

The xruns are ALWAYS 2 right next to each other... and since there is three buffers and three notable jackd threads ... I suspect ONE thread is stuck busy (perhaps BKL w/ SMP) and the other two threads cannot stream as normal until that stuck thread is awaken on next pass...

it appears that ONCE the two xruns occur, it's stable past that point. I've spent a bit of time and I can run more test / enable options during compile / checkout svn changes...

dummy does not cause this issue in SMP or UNI ... I think this IS a threading issue and not sure if it's in the core streaming API / interface or down in the driver part. (hence ticket 180) .. I will close ticket 179 in place of this one as I think I've narrowed it down to something SMP/threaded related.

Attachments

ffado-jack.gz (137.4 kB) - added by dx9s on 11/29/08 15:32:20.
list of xruns under jack -v ... -v6 >ffado-jack 2>ffado-jack
dx9s-ohci-debug.tgz (120.9 kB) - added by dx9s on 12/01/08 20:12:33.
copy of my logs from boot to post xruns (duplicate lines removed to make file smaller)

Change History

11/28/08 13:20:34 changed by dx9s

taken from ##kernel: {{{ <dx9s_home> j_engelh, just trying to determine if a concurrency issue with ffado is gone or just well hidden on a single-cpu non-smp kernel ... it still shows up on SMP kernel with "nosmp" at boot time. <dfego> j_engelh: It's spread out over a few files, how is the best way to do that? <j_engelh> dfego: two offending files probably suffice <j_engelh> dx9s_home: concurrency issues can also be triggered with UP kernels, if you were asking that. <dx9s_home> j_engelh, I figured as much <dx9s_home> j_engelh, hence the "well hidden on single-cpu UP kernel" <j_engelh> I would say that nosmp behavior is equal to an UP <dx9s_home> j_engelh, but not technically identical.... <j_engelh> technically, the UP has spinlocks removed even. <dx9s_home> j_engelh, do spinlocks make use of the Big Kernel Lock ??? ( http://kerneltrap.org/Linux/Removing_the_Big_Kernel_Lock ) * CcSsNET (n=ccssnet@pool-72-76-28-68.nwrknj.fios.verizon.net) has joined ##kernel <j_engelh> they do not. the bkl is a spinlock, though. <dx9s_home> j_engelh, (sorry for asking wierd questions) .. not knowledgable on such stuff in any detail (only superficially) * mx-tvt has quit (Read error: 104 (Connection reset by peer)) <dx9s_home> j_engelh, so a solution is to roll back to 2.7.25 and see/test ... or better go after ffado and see what the concurency issue is that is exposed under such spinlocks <dx9s_home> (I mean 2.6.25) }}}

and to sumarize what this means to me:

j_engelh on ##kernel confirmed that a major spinlock is completely removed from kernel in UP (non-smp) kernel .. which could "hide" a concurrency issue in ffado

--Doug

11/28/08 13:22:19 changed by dx9s

FUDGE: that didn't format readable!!!

<dx9s_home> j_engelh, just trying to determine if a concurrency issue with ffado is gone or just well hidden on a single-cpu non-smp kernel ... it still shows up on SMP kernel with "nosmp" at boot time.
<dfego> j_engelh: It's spread out over a few files, how is the best way to do that?
<j_engelh> dfego: two offending files probably suffice
<j_engelh> dx9s_home: concurrency issues can also be triggered with UP kernels, if you were asking that.
<dx9s_home> j_engelh, I figured as much
<dx9s_home> j_engelh, hence the "well hidden on single-cpu UP kernel"
<j_engelh> I would say that nosmp behavior is equal to an UP
<dx9s_home> j_engelh,  but not technically identical....
<j_engelh> technically, the UP has spinlocks removed even.
<dx9s_home> j_engelh, do spinlocks make use of the Big Kernel Lock ??? ( http://kerneltrap.org/Linux/Removing_the_Big_Kernel_Lock )
* CcSsNET (n=ccssnet@pool-72-76-28-68.nwrknj.fios.verizon.net) has joined ##kernel
<j_engelh> they do not. the bkl is a spinlock, though.
<dx9s_home> j_engelh, (sorry for asking wierd questions) .. not knowledgable on such stuff in any detail (only superficially)
* mx-tvt has quit (Read error: 104 (Connection reset by peer))
<dx9s_home> j_engelh, so a solution is to roll back to 2.7.25 and see/test ... or better go after ffado and see what the concurency issue is that is exposed under such spinlocks
<dx9s_home> (I mean 2.6.25)

11/28/08 13:26:17 changed by dx9s

  • summary changed from ffado is not SMP safe to ffado has concurrency issue.

FWIW... I can still get xruns on dual-core with SMP kernel in "nosmp" mode.. it's less likely.. but it still happens... on the OTHER machine I have (single core, UP kernel) ... it doesn't show .. or at least I've not seen it and was watching correctly (not ideal testing condition).

so I'd change this from 'ffado is not SMP safe' to 'ffado has concurrency issue*' *=that is exposed under certain constraints ....

--Doug

11/28/08 19:28:40 changed by dx9s

FWIW... doesn't matter what period size... 3x128 or 3x1024 ... eventually I'll get 2 xruns within a fraction of a second of each other and it happens within the first 30 minutes of a test run...

two recent (back-to-back test)

One test. 3x1024@48KHz ran for 1.5hrs recording 12 tracks .. 2 xrun at ~29minutes and then fine for an hour later -- then I stopped in frustration.

Another test 2x32@48KHz (jackd -d dummy) ran for 2 hours 32 minutes 16 seconds (and 106GB later) recorded 64 tracks and ZERO xruns... so it's not the kernel or ardour or jackd or disk I/O speed -- it has to be ffado!

--Doug

11/28/08 23:32:17 changed by holin

I think you're banging your head to the wall here. The xruns you get are probably caused by a few places in the kernel with long held locks (which will get fixed, given time), or even more likely by artifacts in scheduling and/or cpu cache behaviour. They will be random in nature, and unless you're a seasoned kernel hacker you will find a million scenarios that first seem to reproduce the problem, but after enough tries you find out that 'it wasn't *that* after all'. Anyhow, imho the only solution to this is moving the ffado streaming code to kernel space, which will happen when someone does the work, but surely not for ffado-2.0. If ffado had an actual concurrency issue, it would more likely be manifested as a lockup or a crash.

11/29/08 01:40:05 changed by dx9s

IF it was the kernel holding a long lock --- that would effect jackd regardless of dummy/alsa/firewire ... so I am not sure you understand my testing. As long as the jackd configuration on dummy/alsa are fine (not trying to do 2x8) -- zero xruns on SAME kernel. Running for hours in many cases...

There is something with ffado... jackd w/ dummy (or alsa) is fine on same hardware with really small buffer sizes (try as low as 32) ... but ffado can't handle it at 3x1024 ?? and YES.. I've capture 64 tracks continuously for over 2.5 hours (not using ffado) with 2x32 and NO XRUNS.

It's ffado ... if it's not concurrency -- then it's something else with ffado. But it's ffado never-the-less!

and it always has TWO xruns back to back (fraction of a second part) then goes on without any issue for as long as I can tolerate (at least an hour) .. I'm glad you didn't say increase the settings... because I've tried a mix of different stuff .. like 4x512 or 5x .. and always two xruns within 30 minutes of recording (and just two xruns) ...

I noted that there is a spinlock that is removed when the kernel is configure for UP (uni-processor) and the other machine I've been testing ffado on is UP kernel and runs much better (aka zero xruns for at least 1.5 hours) .. however that machine is slower disk and smaller disk and can only handle MAYBE 16 tracks --- 12 is pushing it (@48KHz).

I think that THAT spinlock is effecting ffado (but not jackd w/ alsa or dummy) -- I just copied the config file over and will compile/install a UP kernel that is nearly identical from the one machine and verify that UP kernel actually fixes (aka hides) the problem ffado with SMP kernels.

I dislike pointing finger at the kernel when jackd when it works with dummy/alsa fine. What this tells me is the SMP kernel exposes a problem with ffado.

--Doug

11/29/08 15:32:20 changed by dx9s

  • attachment ffado-jack.gz added.

list of xruns under jack -v ... -v6 >ffado-jack 2>ffado-jack

11/30/08 03:31:53 changed by ppalmers

Does your DMESG show anything special when these xruns occur?

Can you compile the OHCI modules with debugging enabled?

11/30/08 16:44:39 changed by dx9s

I can and I will.. I was already thinking about that... I am leaning towards funky hardware that the kernel/ohci driver isn't 100% "stable" ... it's a new system and pre 2.6.26 doesn't support the hardware (from what I understand)... DARN .. why does cutting edge hardware always lag in "perfect" kernel support.

I've got an older ShuttleX PC I'm getting up and running for something I need to do soon... So my efforts on the new hardware will be more or less delayed ... but I have the config file and will tweak it (or look at what modprobe options for debugging OHCI) and report back soon.

--Doug

11/30/08 16:53:34 changed by dx9s

I am using older firewire stack... what's the word on juju (libraw1394 version 2 ) ??

12/01/08 07:27:47 changed by dx9s

ppalmers -- thinking it's a kernel thing... I THOUGHT ALSA was stable (but it's not) having similar 2 xruns like ffado .. but it's rarer. SOOO far jackd -d dummy is only thing that is 100% schedule (error) free and with zero xruns... but what good is no sound hardware.

DAMN. this means ffado is probably just subject to flaky kernel. I feel like Doc in 'Back to the Future' ... DAMN DAMN!

I don't know what the course of action to hunt down the problem. I will attempt to capture dmesg and OHCI debug messages (and if possible ALSA debug message)... But I suspect OHCI will point the finger at something inside the kernel that we've not looked/thought of.

Need to change this status to something else .. ffado has issue with nvidia8200 chipset but market it as minor (as ffado cannot control the kernel folks).

12/01/08 07:28:28 changed by dx9s

  • priority changed from critical to minor.
  • summary changed from ffado has concurrency issue to ffado has issue with nvidia8200 chipset (2.7.27 kernel).

12/01/08 07:28:49 changed by dx9s

  • summary changed from ffado has issue with nvidia8200 chipset (2.7.27 kernel) to ffado has issue with nvidia8200 chipset (2.6.27 kernel).

12/01/08 08:03:20 changed by dx9s

was thinking it might be a filesystem issue ??

http://en.wikipedia.org/wiki/Ext3

as I am usually "testing" with 64 WAVs (tracks effectively) I cheat and do 8 tracks with 8 channels in each track, and that eats up a lot of space fast.. the xruns might be something wierd with the filesystem... if that was the case, wouldn't is show up in jackd -d dummy ? The filesystem is a 2TB ext3 filesystem... something the other machines don't even come close too.

things to figure out / test .. and I'll have to re-install to change the filesystem -- what should I use that is known good for such large file systems? or perhaps setup with much smaller ext3 and see if problem is effectively gone (however still present / aka hidden). What gets me is jackd -d dummy is the only reference now that doesn't display and issue (but perhaps it is even rarer there... when it takes a long time to test -- you try to pick and choose your test).

--Doug

12/01/08 08:13:03 changed by holin

The XFS filesystem was designed with streaming media in mind. You can even define real-time subvolumes inside XFS filesystems, although using them would usually require support from the application as well. YMMV, but I've used XFS for years with little to complain. Beware though that XFS doesn't tolerate sudden power-offs (eg. laptop battery running out) as well as ext3.

12/01/08 08:21:23 changed by holin

If you want to dig deeper in your xrun issues, I suggest you compile your own PREEMPT_RT kernel and enable the latency tracing features. That way you might find out if the kernel/where the kernel fails to meet the timing requirements (ie. latencies bigger than the minimum irq period for whatever audio period size and number of periods you have.) There's probably a HOWTO somewhere about how to use the latency tracing stuff. Or maybe it was under kernel's Documentation/. Too lazy to check..

12/01/08 19:59:57 changed by dx9s

put ieee1394/ohci1394 into debug mode (boy does the syslog get big fast) ..

attached a tar.bz2 (7.3MB) of both message/syslog (however syslog seems to have only a few lings more so message can be ignore)... and also a README.TXT (the file shows the two lines that stuck out for me that are around the same time the two xruns occured). Included everything from boot to shutting down jackd/ffado ... note: the log shows a normal startup and short run and shutdown (no XRUNS) and then a longer run of OHCI debug messages (99% of it is all the same).

Thanks ppalmers ....

(here are the two lines:)

messages:Dec 1 19:32:42 xpc kernel: [ 1637.835150] ohci1394: fw-host0: skipped 58 cycles without packet loss messages:Dec 1 19:32:43 xpc kernel: [ 1638.851968] ohci1394: fw-host0: skipped 71 cycles without packet loss syslog:Dec 1 19:32:42 xpc kernel: [ 1637.835150] ohci1394: fw-host0: skipped 58 cycles without packet loss syslog:Dec 1 19:32:43 xpc kernel: [ 1638.851968] ohci1394: fw-host0: skipped 71 cycles without packet loss

Holin, I know how to make custom kernels and will look at that -- but want to follow up on debug information that ppalmers wanted first. I will bug you when you some more... ext3 on other machines don't seem to have any issues so I suspect it's not ext3 at this point (just want to rule it out)... the latency tracing is something that is interesting, but I don't think it's a scheduling thing ... I don't know why ohci will simply skip cycles "without packet loss" .. I do know that in the -v6 of ffado .. it showed the clock of slightly less than 48kHz .. wonder if it's a drift issue?

--Doug

12/01/08 20:12:33 changed by dx9s

  • attachment dx9s-ohci-debug.tgz added.

copy of my logs from boot to post xruns (duplicate lines removed to make file smaller)

12/01/08 20:33:45 changed by dx9s

found the function in ohci1394.c that generates the message in the syslog:

static void ohci_iso_xmit_task(unsigned long data)
{
	struct hpsb_iso *iso = (struct hpsb_iso*) data;
	struct ohci_iso_xmit *xmit = iso->hostdata;
	struct ti_ohci *ohci = xmit->ohci;
	int wake = 0;
	int count;

	/* check the whole buffer if necessary, starting at pkt_dma */
	for (count = 0; count < iso->buf_packets; count++) {
		int cycle;

		/* DMA descriptor */
		struct iso_xmit_cmd *cmd = dma_region_i(&xmit->prog, struct iso_xmit_cmd, iso->pkt_dma);

		/* check for new writes to xferStatus */
		u16 xferstatus = le32_to_cpu(cmd->output_last.status) >> 16;
		u8  event = xferstatus & 0x1F;

		if (!event) {
			/* packet hasn't been sent yet; we are done for now */
			break;
		}

		if (event != 0x11)
			PRINT(KERN_ERR,
			      "IT DMA error - OHCI error code 0x%02x\n", event);

		/* at least one packet went out, so wake up the writer */
		wake = 1;

		/* parse cycle */
		cycle = le32_to_cpu(cmd->output_last.status) & 0x1FFF;

		if (xmit->last_cycle > -1) {
			int cycle_diff = cycle - xmit->last_cycle;
			int skip;

			/* unwrap */
			if (cycle_diff < 0) {
				cycle_diff += 8000;
				if (cycle_diff < 0)
					PRINT(KERN_ERR, "bogus cycle diff %d\n",
					      cycle_diff);
			}

			skip = cycle_diff - 1;
			if (skip > 0) {
				DBGMSG("skipped %d cycles without packet loss", skip);
				atomic_add(skip, &iso->skips);
			}
		}
		xmit->last_cycle = cycle;

		/* tell the subsystem the packet has gone out */
		hpsb_iso_packet_sent(iso, cycle, event != 0x11);

		/* reset the DMA descriptor for next time */
		cmd->output_last.status = 0;
	}

	if (wake)
		hpsb_iso_wake(iso);
}

12/01/08 22:59:29 changed by holin

Latency tracing probably won't help here. It looks like your firewire controller is broken (or linux drivers - I'm not entirely convinced about their correctness either.) Could you do a 'lspci -v'

12/02/08 08:12:39 changed by dx9s

holin,

it's a new TI firewire host (which generally have excellent support) some TSBxxx (at work now -- don't have it)... The syslog should/could give more information (it gives everything from boot) -- but an lspci -v will come.

I'm fairly sure it's either a broken driver -or- an unexpected condition (skipping iso packets). This is really weird. It could also (and this is not a remote possibility) OHCI1394 (like other drivers) could be also effected by something else. I've got the setting in the kernel such that I get similar xruns in ALSA around the same kind of intervals (when then show up)... However I cannot 100% prove that. I need to start jackd w/ alsa and let it sit there as well. And if that is also true.. I might need to put ALSA into some debug thing...

(my thoughts so far) New hardware, new drivers, new bugs... (that hopefully get found)

--Doug

12/02/08 08:22:39 changed by dx9s

found this also in the ohci1394.c src:

====

  • Things implemented, but still in test phase:
  • . Iso Transmit
  • . Async Stream Packets Transmit (Receive done via Iso interface)

====

and the routine/function that caused the xrun: static void ohci_iso_xmit_task(unsigned long data)

so I'd put this in the unstable / testing stage... do I need to put a tag on my guinea pig ear?

12/02/08 12:12:18 changed by holin

I wouldn't say it's the new hardware that's to blame. OHCI is pretty old standard already and hardware is either compliant or isn't. Here's Pieter's lkml posting about the problem for reference: http://lkml.org/lkml/2008/4/18/234 I'm left slightly suspicious about the linux driver, because host controllers that basically don't work at all on Linux (with ffado) work fine on OS X with no clicks or pops or anything. Of course, the OS X driver probably hides cycle skipping if it happens anyway, but still the end result is better. On Linux, the number of users doing isoch transfers with firewire is obviously too small to have much leverage.

12/02/08 13:21:14 changed by ppalmers

OK, time for me to step in.

First of all, I'm the one that wrote the code that is responsible for these messages. So I know why they are happening and what they mean. So if the code is correct, the story is as follows:

OHCI transmission is done through a linked list of descriptors in DMA accessible memory. These descriptors contain (amongst things like the 1394 header and a pointer to the payload data memory) two pointers to other descriptors: one to the descriptor that should be processed after this one when transmission is successful, and one to the descriptor for when transmission is not successful (for whatever reason, more on that later).

What the current linux driver does (implemented by me for post 2.6.24) is that the descriptor pointer for failure points to the descriptor itself. This means that if a packet could not be transmitted, it is retried in the next cycle. This is where the """skipped 71 cycles without packet loss""" message comes in: it reports that the packet was sent 71 cycles after it was intended to be sent. Doing things this way makes that you can handle sporadic skips of one or two cycles, which is what I saw on my O2 micro controller. The code fixes things for that controller. However, if you skip 71 cycles, there is definitely an xrun since that corresponds to 568 frames of audio, so not much chance that you'll survive that.

The main question is: why does this happen. The short answer is: I have no idea. The most likely reason is that the DMA memory controller was unable to provide the payload data on time. Most host controllers have only a limited amount of buffer memory available, usually only for one packet. This means that the DMA controller should be able to service a request every 125us, which might not be possible under certain circumstances.

When you say that the TI controllers are amongst the best ones available, I tend to agree. However this applies only to the dedicated controllers they built in the past. The one you report now is part of a fairly recent integrated flashmedia chipset which does not necessarily follow the previous designs (e.g. smaller buffers to reduce silicon cost).

What might help (in contradiction to intuition) is to reduce the amount of kernel space buffering (by using the config file). Apparently the memory span that the OHCI or DMA controller has to cover influences its reliability. If you lower this span, less of these issues occur.

I have been looking at this issue for quite some time now and couldn't figure out where exactly the problem lies. I've always suspected the DMA controller or the PCI bus for this. Your reports only increase this suspicion.

You say that issues occur at a fairly constant moment in time when you are recording tracks. I suspect that at a certain point the DMA controller is instructed to flush the memory buffers to disk, which might take quite a long time if they are large. If the DMA controller cannot service any other requests in that time frame these kind of effects can occur.

Something similar can happen for the PCI bus itself: maybe the PCI latency timers of the devices in your system are not set correctly. You could try and set all of them to 0 except the OHCI controller: set that one to 256 or something.

There is definitely more work required to investigate this, but unfortunately I don't have time for it. Still I hope this information helps.

12/02/08 14:10:05 changed by dx9s


You say that issues occur at a fairly constant moment in time when you are recording tracks. I suspect that at a certain point the DMA controller is instructed to flush the memory buffers to disk, which might take quite a long time if they are large. If the DMA controller cannot service any other requests in that time frame these kind of effects can occur.


This is some what true... I can start jackd and sit there idle (no disk I/O) however with ohci in debug mode.. there is SOME disk activity.. nothing like when writing 64 tracks to disk tho.

In fact the xruns I've been creating recently are just that... jackd idle -- sometimes I plug the inputs into the outputs (capture/playback) so there is streaming -- but nothing to/from disk.

I can increase the PCI latency of that hardware.. I think it defaults to 32 or 64 ... can max at 240 I think it is...

And I did tinker with ffado configuration and reduced the defaults (one was 256 and one was 128) .. I made both 128 and in the short testing.. it seemed the xruns still happened but now more consistent after so much time.

Another thing that is NOT "exact" but I've noticed (and this is NOT anywhere near consistent) -- is xruns happening near/at "on the hour" and "30" minutes past... however the xruns occur elsewhere too and I've looked at trying to see if anything happens at those times... perhaps the kernel issues an VFS sanity check or something that isn't user-space obvious.

Just hope this helps...

gotta go to meeting.

12/02/08 16:00:13 changed by dx9s

for those concerned... /etc/init.d/cron stop ... (done shortly after booting) ... so it's not cron -- plus jackd -d dummy has no problem (even at really small settings like 2x32 -- but dummy is just a basic timer)

12/02/08 16:35:14 changed by dx9s

also note/understand I am not talking about a slew of xruns... nearly always, just two within moments (literally less that one second) and then smooth.

The at/near "on the hour" means... sometimes up to 5 minutes before.. sometimes up to 5 minutes after ... never within one minute of a particular time.. I've got two minutes to test "30" but it never happens exactly.. taken +/- 5 (=10) x2 an hour (20/60 = 33%) seems odd to have a majority of the xruns within this window. Never exactly at that time -- I "early on" have added to qjactctl "artsshell -q terminate ; gksudo /etc/init.d/cron stop " I've thought about a few other things.. but that is my normal qjackctl startup.

My question is there known any kernel side "scheduled" events (I did say non-user space, which excludes things like cron) .. however if it was some event that is user-space.. should be able to watch it -- need to look into tracking all process startups/shutdowns **some-how** .

If anybody knows of how to track processes (perhaps a tool for process accounting) ?? this would be helpful in ruling out user-space schedule event.

I was curious if there are any known kernel-space scheduled events (such as some kernel initiated VFS *thing*).

--Doug

12/02/08 16:41:24 changed by dx9s

lspci -v: (currently no changes yet)

00:00.0 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a2)
	Subsystem: nVidia Corporation Device cb84
	Flags: bus master, 66MHz, fast devsel, latency 0
	Capabilities: <access denied>

00:01.0 ISA bridge: nVidia Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0

00:01.1 SMBus: nVidia Corporation MCP78S [GeForce 8200] SMBus (rev a1)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: 66MHz, fast devsel, IRQ 10
	I/O ports at fc00 [size=64]
	I/O ports at 1c00 [size=64]
	I/O ports at 1c40 [size=64]
	Capabilities: <access denied>

00:01.2 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
	Subsystem: nVidia Corporation Device cb84
	Flags: 66MHz, fast devsel

00:01.3 Co-processor: nVidia Corporation MCP78S [GeForce 8200] Co-Processor (rev a2)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 10
	Memory at fdf80000 (32-bit, non-prefetchable) [size=512K]

00:01.4 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
	Flags: 66MHz, fast devsel

00:02.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) (prog-if 10)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 21
	Memory at fe02f000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: ohci_hcd
	Kernel modules: ohci-hcd

00:02.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) (prog-if 20)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 23
	Memory at fe02e000 (32-bit, non-prefetchable) [size=256]
	Capabilities: <access denied>
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

00:04.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) (prog-if 10)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 20
	Memory at fe02d000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: ohci_hcd
	Kernel modules: ohci-hcd

00:04.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) (prog-if 20)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22
	Memory at fe02c000 (32-bit, non-prefetchable) [size=256]
	Capabilities: <access denied>
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 8200] IDE (rev a1) (prog-if 8a [Master SecP PriP])
	Subsystem: Device f297:3139
	Flags: bus master, 66MHz, fast devsel, latency 0
	[virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8]
	[virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1]
	[virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8]
	[virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1]
	I/O ports at f000 [size=16]
	Capabilities: <access denied>
	Kernel driver in use: pata_amd
	Kernel modules: pata_amd

00:07.0 Audio device: nVidia Corporation Realtek ALC1200 8-Channel High Definition Audio Codec (rev a1)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 21
	Memory at fe020000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: HDA Intel
	Kernel modules: snd-hda-intel

00:08.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) (prog-if 01)
	Flags: bus master, 66MHz, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: fde00000-fdefffff
	Prefetchable memory behind bridge: fdd00000-fddfffff
	Capabilities: <access denied>

00:09.0 IDE interface: nVidia Corporation Device 0ad0 (rev a2) (prog-if 85 [Master SecO PriO])
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 221
	I/O ports at 09f0 [size=8]
	I/O ports at 0bf0 [size=4]
	I/O ports at 0970 [size=8]
	I/O ports at 0b70 [size=4]
	I/O ports at dc00 [size=16]
	Memory at fe026000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: ahci
	Kernel modules: ahci

00:0a.0 Ethernet controller: nVidia Corporation MCP78S [GeForce 8200] Ethernet (rev a2)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 220
	Memory at fe02b000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at d800 [size=8]
	Memory at fe02a000 (32-bit, non-prefetchable) [size=256]
	Memory at fe029000 (32-bit, non-prefetchable) [size=16]
	Capabilities: <access denied>
	Kernel driver in use: forcedeth
	Kernel modules: forcedeth

00:0b.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000b000-0000bfff
	Memory behind bridge: fb000000-fcffffff
	Prefetchable memory behind bridge: 00000000d8000000-00000000e7ffffff
	Capabilities: <access denied>
	Kernel modules: shpchp

00:10.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000a000-0000afff
	Memory behind bridge: fdc00000-fdcfffff
	Prefetchable memory behind bridge: 00000000fdb00000-00000000fdbfffff
	Capabilities: <access denied>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp

00:12.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
	I/O behind bridge: 00009000-00009fff
	Memory behind bridge: fda00000-fdafffff
	Prefetchable memory behind bridge: 00000000fd900000-00000000fd9fffff
	Capabilities: <access denied>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
	Flags: fast devsel
	Capabilities: <access denied>

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
	Flags: fast devsel

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
	Flags: fast devsel

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
	Flags: fast devsel
	Capabilities: <access denied>
	Kernel driver in use: k8temp
	Kernel modules: k8temp

01:08.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) (prog-if 10)
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3139
	Flags: bus master, medium devsel, latency 64, IRQ 17
	Memory at fdeff000 (32-bit, non-prefetchable) [size=2K]
	Memory at fdef8000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: ohci1394
	Kernel modules: ohci1394

02:00.0 VGA compatible controller: nVidia Corporation GeForce 8200 (rev a2)
	Subsystem: nVidia Corporation Device cb84
	Flags: bus master, fast devsel, latency 0, IRQ 20
	Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d8000000 (64-bit, prefetchable) [size=128M]
	Memory at e6000000 (64-bit, prefetchable) [size=32M]
	I/O ports at bc00 [size=128]
	[virtual] Expansion ROM at e0000000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nvidia

12/02/08 16:55:19 changed by dx9s

I have a question:

----------------------------------------------------
Dumping StreamProcessorManager information...
Period count:  11550
Data type: float
 Receive processors...
 StreamProcessor 0x99a1650, Receive:
  Port, Channel    : 0, 0
  Packets, Dropped, Skipped : 249153, 0, 0
  Now                   : 01585765035 (064s 4199c 1707t)
  Xrun?                 : False
  State                 : ePS_Running
  Buffer                : 0x99a1708
  Framerate             : Nominal: 48000, Sync: 47996.824475, Buffer 47996.824475
  Sync delay             : 30720.000000 ticks (59.996029 frames, 10.000000 cy)
  TimestampedBuffer (0x99a1708): 0048 frames, 0048 events
   Timestamps           : head: 1585728788.567, Tail: 1585753366.192, Next tail: 1585757462.463
    Head - Tail         :     -24577.625
   DLL Rate             : 4096.287275 (512.035909)
 Transmit processors...
 StreamProcessor 0x99a1298, Transmit:
  Port, Channel    : 0, 1
  Packets, Dropped, Skipped : 249291, 0, 0
  Now                   : 01585765919 (064s 4199c 2591t)
  Xrun?                 : False
  State                 : ePS_Running
  Buffer                : 0x99a13b8
  Framerate             : Nominal: 48000, Sync: 47996.824475, Buffer 47996.710049
  Sync delay             : 0.000000 ticks (0.000000 frames, 0.000000 cy)
  TimestampedBuffer (0x99a13b8): 0004 frames, 0004 events
   Timestamps           : head: 1585954067.748, Tail: 1585956115.888, Next tail: 1586021656.381
    Head - Tail         :      -2048.140
   DLL Rate             : 65539.378995 (512.026398)
----------------------------------------------------

Is it normal for the nominal HZ (48000) and the Sync/Buffer (~47996.824475/47996.710049) to be off?? I'd expect then to be closer... or if the firewire audio would get it's close indirectly from the 1394 bus?

Is it possible that the firewire interface gets so far ahead, there are no xmit/receive frames to process and thus the message about skipping 'skipped 71 cycles without packet loss ' -- could this be... I serious doubt it ... and think it's something kernel related.

12/03/08 10:40:52 changed by dx9s

I've come the the conclusion that the elapsed time of (up to) 30 minutes before xruns paired with when I *must* have been starting the test is what lead the majority of xruns to happen ruffly within +/-5 minutes of said times (read before)... I've made sure now that when I started test... to test at different start times.. and now the xruns are happening outside the unusual window. (it's not a scheduled event kernel-or-userspace)...

I guess it might be due to watching the news and clicking the button to start the test during commercial or something... ruled out one thing.

And now I think it's video related -- BLEWY! can't use nv (doesn't know my PCI IDs.. need to hack the src and try again)... *CAN* use vesa if the app runs under 800x600 ... can use text mode ... can use nvidia binary ... and so far, the xruns only happen in one place... I did say new hardware is a pain in the "arse" (slang for rear-end). I'm got a slew of test, but I am pretty sure this is it! (quoting "Doc" from 'Back to the future') "DAMN DAMN!"

I will also look into install nouveau -- I just want stable 2D support on multiple displays (nv doesn't offer that).

--Doug

05/17/09 04:28:08 changed by ppalmers

  • status changed from new to closed.
  • resolution set to invalid.

This seems to be a configuration / nvidia binary driver issue.

You could try and increase the PCI latency timer for the firewire controller, and decrease that of the video driver.