[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: solved ?] i386/79080: acpi thermal changes freezes HP nx6110



Hi Frederic, Nate, list members:

I recently tried 7.0-RC1 on an nx6110.  The thermal freeze
problems are definitely still there, and appear worse.  I
tried all of the workarounds below and nothing helped -
I suspect this issue is not interrupt storm related any more,
but rather, a mutex race condition of some sort...
please see below...

Hello,

I found a workaround to avoid freeze while change _ACx state on
nx6110. In kernel, use
options         SCHED_ULE
device          apic
options         AUTO_EOI_1
options         AUTO_EOI_2

I tried this.  With the exact above options, by root boot device
became unfindable, and no amount of tweaking at the loader prompt
would get it to boot.  When I removed AUTO_EOI_2 and tried again,
the root filesystem booted but the freeze problems remained.

I also tried the out-of-the-box GENERIC kernel, of course;
freeze problems occur.

ULE and apic allow the freeze to last only a few second (without it,
I never waited more than 10 minutes but I supposed it can be long...).
AUTO_EOI_1 and AUTO_EOI_2 have no impact without ULE and apic.
Separately they don't have noticeable effect.

In my case, the mutex problem causes the freeze to last forever,
regardless of the scheduler used.

(From a previous email by Frederic):
Pavel Rydvan stated in the pr that if the temperature doesn't change
there is no problem. In fact, it is not completely true: problem
arises when ACx _increase_. When it decreases if there is a freeze it
is unnoticable.

I agree with this observation.  I only get the freezes if the temperature
INCREASES.

If I manually set hw.acpi.thermal.tz0.active then there is no more
problem (apart from the thermal function of ACPI becomes useless).

This I tried and it didn't work for me.  The "active" number remained
at 1 regardless of the arguments I passed it - I tried -1, 0, 1, 2, 3, 4,
5, and 6.  I don't know how you get this number to change but sysctl
kept it at 1.
(ex: #root# sysctl hw.acpi.thermal.tz0.active=-1
    hw.acpi.thermal.tz0.active: 1 -> 1 )

Pavel Rydvan said that it is due to IRQ storm, I can't dig deeper this
because I don't know how to do.

It seems mutex-related to me.  I placed as much of the debug info as I could
into the PR. I'll also include it below. Thanks to anyone for reading this.

--BEGIN PR 79080 INFO--

The problem is still found in the most recent 7.0 RC code as well.
Has something to do with a Mutex lock/unlock problem when the thermal
zone change occurs - it doesn't appear to be an interrupt storm any
longer.

It is assuredly ACPI-related, because disabling ACPI makes the freezes
go away.  However, this laptop does not function well without ACPI so
it's not a good workaround.  USB devices do not work w/o ACPI, as well
as other hardware.

There are several suggested workarounds I tried, none of which resoloved
the issue.  These included building the kernel with apic, disabling apic,
manually changing the hw.acpi.thermal.tz0.active number (my nx6110
seems to want to keep it at 1 no matter what), and using the ULE
scheduler rather than the 4BSD.  Again, none of the above workarounds,
in any combination, solved the issue.

INFORMATION
-----------
Turning on debugging, the following appears right before the lock,
as soon as temperature rises enough to trigger a change in the zone:

acpi_tz0: _AC3: temperature 68.0 >= setpoint 45.0
acpi_tz0: _AC2: temperature 68.0 >= setpoint 55.0
acpi_tz0: _AC3: temperature 67.0 >= setpoint 45.0
acpi_tz0: _AC2: temperature 67.0 >= setpoint 55.0
...etc...
and then:
ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ1_._TMP] (Node 0xc321b9c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ2_._TMP] (Node 0xc321b8c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320]

(the errors continue to repeat ad infinitum, and each TZ reports problems)

As a result, you will eventually see:

acpi_tz0: error fetching current temperature -- AE_TIME
acpi_tz1: error fetching current temperature -- AE_TIME
(..etc...)

The interesting thing is that THIS PROBLEM DOES NOT APPEAR in FreeBSD
6.2-RELEASE nor in any of the 6.3-RC variants.  It's unique to FreeBSD
7, and it involves some of the new ACPI mutex code.

This is definitely a regression for this particular laptop since it worked
well
in 6.x - so as such, maybe it would be worthwhile to investigate this bug.
It seems general enough that it could affect other laptop ASLs as well.

The ASL dump AND a sysctl dump can be found:
http://www.far-far-away.com/~yousif/freebsd/

Please let me know if more information is needed.

--Yousif

_______________________________________________
freebsd-acpi_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscribe_(_at_)_freebsd_(_dot_)_org"


Visit your host, monkey.org