[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: seems I finally found what upset kqemu on amd64 SMP... shared gdt! (please test patch :)



On Sat, 10 May 2008, Juergen Lock wrote:

On Thu, May 08, 2008 at 09:59:57PM +1000, Bruce Evans wrote:

The message in npx.c is actually about violation of an even more
fundamental invariant -- the invariant that owning the FPU includes
having the TS flag clear so that DNA traps cannot occur.  The bug in
kqemu seems to be mismanagement of the TS flag related to this.  I
forget if it is the host or the target TS flag that seems to be mismanaged.
For the target, it would take a bug in the virtualization of the TS flag
to break this invariant (assuming no related bugs in the target kernel).

Well the `fpcurthread == curthread' bug has been fixed quite a while
ago already, or do you mean another one?

I didn't know what is already fixed.

The message in amd64/machdep.c is about violation of the invariant
that the kernel cannot cause DNA traps.  Spurious DNA traps in the
...

Okay I _think_ I know a little more about this now...  kqemu itself
doesn't use the fpu, but the guest code it runs can, and in that case the
DNA trap is just used for (host) lazy fpu context switching like as if the
code was running in userland regularly.  And I just tested the following
patch that should get rid of the message by calling fpudna/npxdna directly
(files/patch-fpucontext is the interesting part:)

This seems reasonable.  Is the following summary of my understanding of
kqemu's implementation of this and your change correct?:
- kqemu runs in kernel mode on the host and needs to have exactly the
  same effect as a DNA exception on the target.
- having exactly the same effect requires calling the host DNA exception
  handler.
- now it uses a software int $7 (dna) to implement the above, but this is
  not permitted in kernel mode (although the software int could be permitted,
  it is hard to distinguish from a hardware exception for unintentional use).
- your change makes it call the DNA trap handler directly.  This gives the
  same effect as a permitted software int $7.  It is also faster.

It would be better to use an official API for this, but none exists.

...
+Index: kqemu-freebsd.c
+@@ -33,6 +33,11 @@
+
+ #include <machine/vmparam.h>
+ #include <machine/stdarg.h>
++#ifdef __x86_64__
++#include <machine/fpu.h>
++#else
++#include <machine/npx.h>
++#endif
+
+ #include "kqemu-kernel.h"
+
+@@ -172,6 +177,15 @@
+ {
+ }
+
++void CDECL kqemu_loadfpucontext(unsigned long cpl)
++{
++#ifdef __x86_64__
++    fpudna();
++#else
++    npxdna();
++#endif
++}

Just be sure that the system state is not too different from that of
trap() (directly below a syscall or trap from userland) when this is
called.  Better not have any interrupts disabled or locks held, though
I think npxdna() doesn't care.  The FPU must not be owned already at
this point.

++
+ #if __FreeBSD_version < 500000
+ static int
+ curpriority_cmp(struct proc *p)

I guess kqemu duplicates this old mistake instead of calling it because it
is static.  npxdna() is already public so it can be abused easily :-),

Bruce
_______________________________________________
freebsd-emulation_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "freebsd-emulation-unsubscribe_(_at_)_freebsd_(_dot_)_org"


Visit your host, monkey.org