Saturday, July 30, 2011

Of Course, It Runs NetBSD!™

NetBSD boot was almost a piece of cake. It tries to detect more things than Solaris and Linux, so I had to implement a couple of device registers more. At the first glance using more registers contradicts with the declared portability. On another hand, it would work on some modified/weird chipsets having non-standard interrupt controllers. Don't know if such chipsets were ever produced though.

NetBSD 4.0.1 (INSTALL) #0: Wed Oct  8 01:13:04 PDT 2008
        builds@wb32:/home/builds/ab/netbsd-4-0-1-RELEASE/sparc64/200810080053Z-obj/home/builds/ab/netbsd-4-0-1-RELEASE/src/sys/arch/sparc64/compile/INSTALL
total memory = 256 MB
avail memory = 234 MB
timecounter: Timecounters tick every 10.000 msec
mainbus0 (root): QEMU,Ultra-3/2: hostid 80000000
cpu0 at mainbus0: SUNW,UltraSPARC @ 100.681 MHz, UPA id 0
cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 512K external (64 b/l)
...
# ping 10.0.2.2
PING 10.0.2.2 (10.0.2.2): 56 data bytes
64 bytes from 10.0.2.2: icmp_seq=0 ttl=255 time=1.575 ms
64 bytes from 10.0.2.2: icmp_seq=1 ttl=255 time=1.150 ms
^C
----10.0.2.2 PING Statistics----
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.150/1.363/1.575/0.301 ms
#

Haven't found any CPU bugs so far, only interrupt processing in the serial port (not relevant to [Open]Solaris). Surprisingly sparc32 and sparc64 serial drivers diverge quite a lot.

Next stop - FreeBSD/sparc64.

Sunday, July 17, 2011

Hiding the boot device from an OS

Suspecting a bug in SCSI subsystem, I was wondering how to prevent an OS from detecting the device it's booting from. At first it sounds weird, but it's possible for most of OS/distributions in the SPARC (and probably PPC) world, because they
  • boot  init scripts from a RAM disk
  • use IEEE 1275 device tree to find out available devices *
*) with exception of  Linux. At least the Linux uniform IDE driver tries to detect a hardware by probing.

For FreeBSD, NetBSD, OpenBSD and Solaris it's possible to hide the boot device from the OS install CD with the following commands:

ok cd scsi " unknown" name device-end
ok boot /unknown/sd@6,0:f

Use the Forth, Luke!

Sunday, July 10, 2011

User mode emulation for Linux/SPARC64

As you know, qemu has a user-mode emulation. This means binaries for one CPU (for example SPARC) can be executed on another CPU (for example i686) under the same OS. The system calls are executed directly on the host (which means they are executed as fast as for native binaries), and the executable code itself is translated with TCG.
After I fixed ELF loading for SPARC64 binaries, qemu can load not only static Linux/sparc64 binaries, but dynamically linked ones too. To achieve that qemu has to be statically linked (it may sound confusing, for launching statically linked binaries qemu doesn't have to be built statically but for the dynamically one it has to be) and chrooted to the guest OS file system image:
 qemu$ ./configure --target-list=sparc-linux-user,sparc64-linux-user,sparc32plus-linux-user --static && make
 ...
 qemu$  mv -i sparc32plus-linux-user/qemu-sparc32plus ../debian-6-sparc64-initrd/
 qemu$  su
 Password:
 #  /usr/sbin/chroot ../debian-6-sparc64-initrd/ /qemu-sparc32plus -L . /bin/busybox
BusyBox v1.17.1 (Debian 1:1.17.1-8) multi-call binary.
Copyright (C) 1998-2009 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.

Usage: busybox [function] [arguments]...
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, ar, ash, basename, blockdev, cat, chmod, chown, chroot, cp, cut, dd, df, dirname, dmesg, dnsdomainname, echo, egrep, env, expr, false,
        find, free, freeramdisk, grep, gunzip, halt, head, hostname, id, init, ip, kill, klogd, ln, logger, ls, md5sum, mkdir, mknod, mkswap, modinfo,
        more, mount, mv, nc, pidof, pivot_root, poweroff, printf, ps, pwd, readlink, realpath, reboot, rm, rmdir, route, sed, sh, sleep, sort,
        swapoff, swapon, sync, syslogd, tail, tar, test, tftp, touch, tr, true, tty, udhcpc, umount, uname, uniq, wc, wget, zcat


This is a great tool to find CPU bugs! One can use existing binaries for example to check that emulated CPU produces the same md5 or sha512 sum for a certain binary as the host does, or pack/unpack using gzip and bzip, or just observe weird unames:

 /usr/sbin/chroot ../debian-6-sparc64-initrd/ /qemu-sparc32plus -L . /bin/uname -a
Linux localhost 2.6.34.9-69.fc13.x86_64 #1 SMP Tue May 3 09:23:03 UTC 2011 sun4 GNU/Linux

Saturday, July 9, 2011

NetBSD vfs and MMU emulation

Since Solaris works perfectly, I turned again to NetBSD for the further test cases.

root on md0a dumps on md0b
root file system type: ffs
WARNING: clock gained 1005 days
WARNING: CHECK AND RESET THE DATE!
exec /sbin/init: error 8
init: trying /sbin/oinit
exec /sbin/oinit: error 2
init: trying /sbin/init.bak
exec /sbin/init.bak: error 2
init: not found panic: no init
cpu0: kdb breakpoint at 1362e60
Stopped in pid 1.1 (init) at 0x1362e64: nop
db>
Hmm what would be a reason for such a behavior? Sounds familiar, right? I hear you saying CPU math bug? Wrong. :)
The previous time I saw the message it was really a math bug. This time it's a MMU problem. Unlike SPARC v8, the v9 has a NFO mode for page mappings, which means No Fault Only. The 'only' part is crucial: all other page loads should fault.

Solaris knows what pages has been marked NFO (after all they can be marked by the OS only) and uses only the appropriate loads.

On the other side NetBSD vfs implementation explicitly provokes such faults to load RAM pages from a file. Had to read the NetBSD documentation to understand how its vfs works. The sources are not that good documented.

After fixing the bug (a sort of a counterpart to the one Tsuneo Saito mentioned on the mailing list), NetBSD gets a bit further:

Kernelized RAIDframe activated md0:
internal 5120 KB image area
root on md0a dumps on md0b
root file system type: ffs
WARNING: clock gained 1005 days
WARNING: CHECK AND RESET THE DATE!
erase ^?, werase ^W, kill ^U, intr ^C

The only part is missing here is the '#' prompt after the message. :) Working on it.

Saturday, July 2, 2011

More math bugs

 Why would one file system work, and another one - not?

Because the fs driver is buggy? Of course not! The ones who read this blog on a regular basis, know that inability to read the file has to do with a bug in a math emulation.
Thanks to Jakub's test case, I fixed two bugs with carry flag handling (yes, again). Now HelenOS/sparc64 boot looks like this:


My impressions of HelenOS - it's neat, the sources are good documented and easy readable. Also I needed just a few minutes to set up cross compiling under Linux/x86_64. So, if you need a small, micro-kernel(!) and multi-arch (amd64, arm32, ia32, ia64, mips32, ppc32, sparc64) OS to play with, HelenOS is definitely worth looking at.