Running OpenShift Container Platform in bhyve containers on OpenIndiana#

Overview#

The scenario is as following, we have two physical machines:
  • sin (runs various stuff like NFS)
  • spin (only runs bhyve VMs for OCP)

Both run OpenIndiana. On "sin" we run multiple zones for LDAP, DNS, DHCP, etc...I'm not going into full detail here.

The host "spin" is actually empty. It's an old Sun X4270 M2 machine with 2 Sockets, 6 Cores X5675 @ 3.07GHz and 144GB RAM. The chassis has twelve disks and I also added two (consumer) NVMe's on PCIe adapters.

Two disks form a ZFS boot mirror (rpool), the other disks form a raidz stripe (localstripe) which has 2 hotspares, one slog device (NVMe) and one l2arc (also NVMe):

root@spin:~# zpool status
  pool: localstripe
 state: ONLINE
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	localstripe              ONLINE       0     0     0
	  raidz1-0               ONLINE       0     0     0
	    c8t2d0               ONLINE       0     0     0
	    c8t3d0               ONLINE       0     0     0
	    c8t4d0               ONLINE       0     0     0
	    c8t5d0               ONLINE       0     0     0
	  raidz1-1               ONLINE       0     0     0
	    c8t6d0               ONLINE       0     0     0
	    c8t7d0               ONLINE       0     0     0
	    c8t8d0               ONLINE       0     0     0
	    c8t9d0               ONLINE       0     0     0
	logs	
	  c5t0026B7682D581035d0  ONLINE       0     0     0
	cache
	  c6t0026B7682D1B8DA5d0  ONLINE       0     0     0
	spares
	  c8t10d0                AVAIL   
	  c8t11d0                AVAIL   

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: resilvered 19.2G in 0 days 00:03:25 with 0 errors on Mon Feb 14 21:29:12 2022
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    c8t0d0  ONLINE       0     0     0
	    c8t1d0  ONLINE       0     0     0

errors: No known data errors
The disks are standard 600GB SAS II drives.

The machine has a 10G ixgbe NIC, where I configured the VNICs on.
root@spin:~# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
spinpub0     ixgbe0       10000  2:8:20:d2:1f:63   random              100
bootstrapint0 ixgbe0      10000  2:8:20:3b:26:2    random              100
master02int0 ixgbe0       10000  2:8:20:22:c1:10   random              100
master00int0 ixgbe0       10000  2:8:20:61:3b:13   random              100
master01int0 ixgbe0       10000  2:8:20:28:b3:a8   random              100
worker02int0 ixgbe0       10000  2:8:20:7c:12:3a   random              100
worker03int0 ixgbe0       10000  2:8:20:62:e:e0    random              100
worker00int0 ixgbe0       10000  2:8:20:ad:2b:6c   random              100
bastionint0  ixgbe0       10000  2:8:20:f9:2e:58   random              100
worker01int0 ixgbe0       10000  2:8:20:12:51:4d   random              100

Setup Zones#

General#

I took an ansible playbook to create the bhyve zones, but to have it more generic I'll show you the zone config here. All zones have the same config, except for bastion, this has less RAM and CPU.

For OpenShift we need to have:

  • 1 bastion node (if you don't have any other linux machine)
  • 1 bootstrap node
  • 1 LoadBalancer zone
  • 3 master servers
  • 2 workers

Bastion#

The bastion will be used to run the OpenShift installer, which is only available for Linux/x64 and MacOS X/x64.

The zone config for the bastion looks like this:

root@spin:~# zonecfg -z bastion export
create -b
set zonepath=/localstripe/zones/bastion
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add fs
set dir="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set special="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set type="lofs"
add options ro
add options nodevices
end
add net
set physical="bastionint0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/bastiond0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/bastiond0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="vcpus"
set type="string"
set value="2"
end
add attr
set name="ram"
set type="string"
set value="6G"
end
add attr
set name="cdrom"
set type="string"
set value="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
end

I took a RedHat Enterprise Linux 8.4 DVD, but CentOS or Fedora would do the job also. This zone gets access to a ZVOL:

root@spin:~# zfs get volsize localstripe/vm/bastiond0
NAME                      PROPERTY  VALUE    SOURCE
localstripe/vm/bastiond0  volsize   20G      local

No other special attributes are set on this, so that's it. Start booting the zone, attach socat to the vnc socket and install Linux the usual way:

root@spin:~# socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/localstripe/zones/bastion/root/tmp/vm.vnc

(1227) x230:/export/home/olbohlen$ vncviewer spin::5905

LoadBalancer#

OpenShift runs on different nodes (bhyve VMs) and we need an external LoadBalancer to access the Kubernetes API and the OpenShift Router. In production environment you would want to make that LoadBalancer HA with VRRP, but in this scenario we go the simple way.

First set up a standard ipkg OI zone (I did that on the other Hardware, "sin"):

root@sin:~# zonecfg -z api export
create -b
set zonepath=/localstripe/zones/api
set brand=ipkg
set autoboot=true
set ip-type=exclusive
add net
set physical="api0"
end
root@sin:~# zoneadm -z api install
[...]
We don't need anything fancy.
Once the api zone is installed, we install the OpenIndiana integrated LoadBalancer (ilb) and start it:
root@api:~# pkg install service/network/load-balancer/ilb
[...]
root@api:~# svcadm enable ilb

The configuration I use looks like the following:

root@api:~# ilbadm export-cf
create-servergroup masters
add-server -s server=172.18.3.10 masters
add-server -s server=172.18.3.20 masters
add-server -s server=172.18.3.30 masters
create-servergroup workers
add-server -s server=172.18.3.50 workers
add-server -s server=172.18.3.60 workers
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-masters
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-workers
create-rule -e -p -i vip=172.18.3.100,port=6443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=6443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mastersrule
create-rule -e -p -i vip=172.18.3.100,port=22623,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=22623 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mcsrule
create-rule -e -p -i vip=172.18.3.100,port=80,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=80 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httprule
create-rule -e -p -i vip=172.18.3.100,port=443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httpsrule

Save this output to a file and import it with "ilbadm import-cf -p filename". I use full NAT here for Load-Balancing as the API loadbalancer and the node are in the same IP range.

bootstrap, master and worker zones#

These zones look identical, just replace the host name:
root@spin:~# zonecfg -z master00 export
create -b
set zonepath=/localstripe/zones/master00
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add net
set physical="master00int0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/master00d0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/master00d0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="ram"
set type="string"
set value="16G"
end
add attr
set name="vcpus"
set type="string"
set value="4"
end
They all have access to their own ZVOL:
root@spin:~# zfs get volsize localstripe/vm/master00d0
NAME                       PROPERTY  VALUE    SOURCE
localstripe/vm/master00d0  volsize   250G     local

Thankfully bhyve will try PXE if the bootdisk ZVOL is empty, so we don't have to setup additional things here.

Install all the zones with "zoneadm -z zonename install", which should be pretty fast for bhyve zones. Just don't boot them up yet.

We will follow the OpenShift installation instruction for UPI installations, see the appropriate docs for your OpenShift Version on https://docs.openshift.com.

Download required material#

You need to login to https://cloud.redhat.com, select "OpenShift" and "Create Cluster". There scroll down to "Platform agnostic". This will take you to a page where you have to download:
  • The Openshift installer (openshift-install)
  • The OpenShift client (oc)
  • The pull secret (access tokens for Red Hat registries)

Save these files on the bastion node which we created earlier.

Setup up install-config.yaml#

Create a empty directory as a non-root user somewhere, inside this directory create a file called "install-config.yaml":
[localadm@bastion ~]$ cat install-config.yaml 
apiVersion: v1
baseDomain: home.eenfach.de
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 2
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: |
  {paste pull secret in here}
sshKey: |
  {paste a ssh public key in here}

Copy this file into a backup location somewhere, as the openshift-install command will consume the file and delete it. If you need to restart, you can use the backup.