Saturday, November 5, 2016

sun4v emulation update

Just pushed v1. No new features, just clean ups. As a part of the cleaning up process, improved memory flushes, so the v1 should be a bit faster than v0. The new version available here:

https://github.com/artyom-tarasenko/qemu/tree/sun4v-v1

Another visible change is that the machine name is now spelt lowercase for the consistency with the other SPARC machines emulated by QEMU.

The new launch line:

sparc64-softmmu/qemu-system-sparc64 -M niagara -L /path/to/S10image/ -nographic -m 256 -drive if=pflash,readonly=on,file=/path/to/S10image/disk.s10hw2

Saturday, October 1, 2016

QEMU sun4v/Niagara target went public

I’m publishing my work on the sun4v emulation on the GitHub site:

https://github.com/artyom-tarasenko/qemu/tree/sun4v-v0

Yes, I hope it’ll make it into the upstream soon, but those who like to boot Solaris 10/SPARC under QEMU can do it straight away.

It uses the firmware (hypervisor, machine definition and OpenBOOT) from the OpenSPARC T1 project. So in order to use it, download

http://download.oracle.com/technetwork/systems/opensparc/OpenSPARCT1_Arch.1.5.tar.bz2

$ tar xfj OpenSPARCT1_Arch.1.5.tar.bz2 ./S10image
$ cd path/to/qemu-sun4v

$ sparc64-softmmu/qemu-system-sparc64 -M Niagara -L /path/to/S10image/ -nographic -m 256 -drive if=pflash,readonly=on,file=/path/to/S10image/disk.s10hw2

Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.20.0, 256 MB memory available, Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID: 80112233.


ok boot –v
<…>
login: root

Enjoy!
In case you wonder why the path to drive image is not hard coded like all the paths to firmware components: it’s possible to specify a non-Solaris image, like HelenOS or NetBSD/sun4v (once it gets released).

Feel free to report me if you have more working OSes. :-)

 2016.11.04 Update: while the v0 version uses the name "Niagara", v1 and all subrequent ones will be using the lowercase name "niagara".

Saturday, August 6, 2016

Solaris 10 and year 2038 problem

Now I got a moment of a spare time to write why the Solaris 10 boot was failing under the new sun4v (sparc64) emulation target for QEMU.

It turned out that the now solved SMF issues I mentioned before were caused by a single character typo.

Stepping through the SQLite code I’ve noticed that there are two schemes: one persistent, which to my surprise has been opened with no problems, and a temporary one which failed because it could not create a file under /etc/svc/volatile which resides in RAM.

Why? Because of a very funny reason. The old Solaris versions used to check whether Real Time Clock (sometimes they call it “rtc”, sometimes they call it tod) returned a sane value and ignored it if it's not.

Solaris 10 issues a warning, but goes on and uses the given time. Then init system call creating file on a UFS considers time after 0x7fffffff invalid, which sends SMF into busy error loop.

The fatal typo was writing “qemu_clock_get_ns” instead of “qemu_clock_get_ms”, so I hit the error which the rest of the mankind using Solaris 10 for OpenSPARC T1 will hit 22 years later.

So let’s wait and see how many people will find my blog entries about SMF in February 2038.


Saturday, June 11, 2016

The second OS for the fresh sun4v emulation under QEMU

... is HelenOS. Although I was not able to boot the official 0.4 and 0.6.0 releases due to known problems with SILO (or OBP/Hypervisor), the current version works just fine:

HelenOS 0.6.0 revision 2521 under QEMU/sun4v
Note the nice reddish prompt. No other OS bootable under sun4v QEMU sparc64 emulation has something similar out of the box!

Saturday, April 16, 2016

FreeBSD-10.3/sparc64 under QEMU

I made a wrong statement on the debian-sparc mailing list, saying that the upstream qemu-system-sparc64 can already boot FreeBSD. As it turned out I spent too little time with the upstream QEMU. This made me feel obliged to fix it. This is how it's going to look in the QEMU 2.6.0, if my patches get accepted:

$ qemu-system-sparc64 -nographic -m 1024 -boot d -cdrom FreeBSD-10.3-RELEASE-sparc64-bootonly.iso
<...>
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 06:26:08 UTC 2016
    root@releng1.nyi.freebsd.org:/usr/obj/sparc64.sparc64/usr/src/sys/GENERIC sparc64
gcc version 4.2.1 20070831 patched [FreeBSD]

Console type [vt100]: xterm


When finished, type 'exit' to return to the installer.
# uname -a
FreeBSD  10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 06:26:08 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/sparc64.sparc64/usr/src/sys/GENERIC  sparc64
# ls
.cshrc          HARDWARE.HTM    bin             libexec         sbin
.profile        HARDWARE.TXT    boot            media           sys
.rr_moved       README.HTM      dev             mnt             tmp
COPYRIGHT       README.TXT      docbook.css     proc            usr
ERRATA.HTM      RELNOTES.HTM    etc             rescue          var
ERRATA.TXT      RELNOTES.TXT    lib             root
#
So, after all my statement should be correct. :-)
A pity the sun4v port of NetBSD is discontinued. So it's only for sun4u for now.

Tuesday, March 1, 2016

Hello, Solaris 10 under QEMU/sun4v!

SunOS Release 5.10 Version Generic_118822-23 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Ethernet address = 0:80:3:de:ad:3
mem = 1048576K (0x40000000)
avail mem = 1027579904
root nexus = Sun Fire T2000
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
virtual-device: hsimd0
hsimd0 is /virtual-devices@100/disk@0

root on /virtual-devices@100/disk@0:a fstype ufs
pseudo-device: dld0
dld0 is /pseudo/dld@0
cpu0: UltraSPARC-T1 (cpuid 0 clock 5 MHz)
iscsi0 at root
iscsi0 is /iscsi

INIT: Executing svc.startd
svc.startd: Unknown SMF option "=debug".
Booting to milestone "milestone/single-user:default".
Hostname: unknown
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.

Entering System Maintenance Mode

Mar  1 14:09:35 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
#

Well actually the local time is 23:09:35, but I'm cool with it.

Sunday, February 28, 2016

What do SQL and SPARCv9 assembly language have in common?

Well, here we go: I’m debugging SQL execution switching between the kmdb kernel debugger and gdb.

Breakpoint 70, 0x000000000003e528 in sqliteInitOne ()
0x000000000003ec9c in sqlite_exec ()
(gdb) x $i1
0xadea8:        "SELECT type, name, rootpage, sql, 0 FROM \"main\".sqlite_master"

SMF uses sqlite, so the boot process involves some SQLs.
Who would think that 20 years ago?

But it’s fun indeed. Booting Solaris/sparc under sun4v not just involves plain repetition of the old exercises, but requires some totally new ones as well.

Saturday, February 27, 2016

Dial 1-555-MY-SMF

The boot process of the Solaris 2.5 – Solaris 9 is quite robust. If init for some reason fails, there is always a chance to add “-b” boot option and try to debug it manually.

I think the old generation of the Sun engineers implemented it just to make debugging on the real world hardware easier. I really appreciated this option 6 years ago as I was making Solaris/sparc under qemu possible.

Nowadays at the early stages they probably do the most of debugging in simulators.

This would explain why boot process debugging became much harder after introducing SMF in Solaris 10.

Particularly I’m hitting the following crash, happening multiple times pro second in an endless loop:

cpu0: UltraSPARC-T1 (cpuid 0 clock 5 MHz)
iscsi0 at root
iscsi0 is /iscsi

INIT: Executing svc.startd

svc.configd: smf(5) database integrity check of:

    /etc/svc/repository.db

  failed. The database might be damaged or a media error might have
  prevented it from being verified.  Additional information useful to
  your service provider is in:

    /etc/svc/volatile/db_errors

  The system will not be able to boot until you have restored a working
  database.  svc.startd(1M) will provide a sulogin(1M) prompt for recovery
  purposes.  The command:

    /lib/svc/bin/restore_repository

  can be run to restore a backup version of your repository.  See
  http://sun.com/msg/SMF-8000-MY for more information.

Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
svc.configd exited with status 102 (database initialization failure)



On the other hand, now I can use the source of OpenSolaris and step through it in gdb. Different epoch different debug methods.

Saturday, February 20, 2016

Bad, bad cafe! (0xbaddcafe)

Debugging Solaris 10 boot I saw something interesting in an exception trace:

143368: Unaligned Memory Access (v=0034)
pc: 00000000f02421f8  npc: 00000000f02421fc
%g0-3: 0000000000000000 0000000000000001 0000000000000000 00000000edd00620
%g4-7: baddcafebaddcafe 0000000000002e7f 0000000000000000 00000000f0243de8 
%o0-3: 00000000018d46e0 0000000000000001 00000000ede8e7e1 0000000001213010

And indeed, this is not a random pattern. It's a helping hand from the great, wise Solaris engineers who cared to help the ancestors in finding problems with hardware and kernel modules:

opensolaris/usr/src/uts/common/sys/kmem_impl.h:
#define  KMEM_UNINITIALIZED_PATTERN      0xbaddcafebaddcafeULL

Looking at the OpenSolaris sources and Solaris documentation, there are more such helping patterns:

Uninitialized Data: 0xbaddcafe
Redzone: 0xfeedface
Freed Buffer Checking: 0xdeadbeef

They are described in the "Detecting Memory Corruption" chapter of Solaris Modular Debugger Guide, but did actually appear long before mdb.

Saturday, February 6, 2016

Yo dawg, I heard you like debugging



Here is the story: my sun4v can boot OBP, but booting Solaris 10 hangs with no error messages. Ok, being there, done that. Let’s start the Solaris kernel with a debugger. I really liked kadb for debugging early boot stuff, but the Solaris 10 image supplied with the OpenSPARC project has only its successor - kmdb.  Well, kmdb is indeed more advanced, but it’s also quite bigger than its predecessor.  Which might be (or might be not) the reason for it failing to boot:

Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.20.0, 256 MB memory available, Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID: 80112233.
ok boot -kdv
Boot device: /virtual-devices/disk@0  File and args: -kdv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T2000/ufsboot
Loading: /platform/sun4v/ufsboot
The boot filesystem is logging.
The ufs log is empty and will not be used.
Size: 0x76e40+0x1c872+0x3123a Bytes
module /platform/sun4v/kernel/sparcv9/unix: text at [0x1000000, 0x1076e3f] data at 0x1800000
module misc/sparcv9/krtld: text at [0x1076e40, 0x108f737] data at 0x184dab0
module /platform/sun4v/kernel/sparcv9/genunix: text at [0x108f738, 0x11dd437] data at 0x18531c0
module /platform/sun4v/kernel/misc/sparcv9/platmod: text at [0x11dd438, 0x11dd43f] data at 0x18a4be0
module /platform/sun4v/kernel/cpu/sparcv9/SUNW,UltraSPARC-T1: text at [0x11dd440, 0x11e06ff] data at 0x18a5300
Loading kmdb...
module /platform/sun4v/kernel/misc/sparcv9/kmdbmod: text at [0x11e0700, 0x124b2bf] data at 0x18b4da0
module /kernel/misc/sparcv9/ctf: text at [0x124b2c0, 0x1252d97] data at 0x18d6ed0
module /kernel/misc/sparcv9/zmod: text at [0x1252d98, 0x1257a67] data at 0x18d7af8
failed to decompress CTF data for unix: File data structure corruption detected
failed to decompress CTF data for genunix: String name offset is corrupt
failed to decompress CTF data for ctf: File data structure corruption detected
failed to decompress CTF data for zmod: File data structure corruption detected


What is the solution? Connect another debugger (gdb) to QEMU and debug the Solaris debugger (kmdb). Sounds reasonable, right?  In the next step I found a place where memory is already corrupted. This has been easy: as you see, the Solaris engineers put some sanity checks in the CTF code. Well done, Sun guys!
Finding the place where it gets corrupted is a bit harder: gdb has no watch-points on the physical memory, supporting only virtual memory watch-points. The solution is indeed starting the QEMU process itself in a debugger. At this point it gets slightly insane:

I put a debugger (kmdb) in a debugger (gdb x86-64) and connected it to a debugger (gdb sparc-v9) so I can debug while I’m debugging a debugger.

Saturday, January 30, 2016

sun4v in QEMU



Back in 2012 I played with sun4v emulation in QEMU, using it mostly instead of pain killers to get some distraction from a broken leg. The project was considered to be a toy, since I hadn’t expected to get it far enough to be useful for anything. I got it up to the OBP ok prompt, so it’s been sort of already useful at least for playing with post-sun4u OpenBoot and Forth.

Now I’m considering tidying up the code and submitting it upstream.  Tell you what.  Cleaning up the old code is pain. The usual problem with the quick and dirty code that you write once intending to throw it away immediately is that for whatever reasons this code is not thrown away in the 99% of cases. Instead it finds its way into production systems where it lives years and years.

So, a note to myself and the two other guys reading this blog: use a version control system (preferably git :-) ) for any project lasting more than 8 hours. Do it regardless whether you think you never going to need it.  I used to think a week is a good threshold, but even one week is way too much (and if you worked that week something like 16 hours a day, sorting out the mess you created would require some weeks).

Anyway, I’m back to my sun4v experiments.  How many weekends it’ll take to get it into a good shape? Let’s see.

Stay tuned.