random complexity

with a side of crazy

Posts Tagged 'vmware'

Scripted ESX installations

Recently I decided to script the build of my ESX hosts at home, which would enable me to rebuild them easier (if the need arose). The added side effect is you can get identical configurations easily without resorting to host profiles. After doing this I realised it would be of use at work where I'm about to build 9 nearly identical clusters. This post will be more of a brain dump of the whole process for my reference and possible use of others. I'll focus on the enhancements for use at work.

The high level overview is this. PXE boot the ESX installer, with a parameter pointing to the kickstart file. That parameter points to a web server script which produces a host specific kickstart file. The enhancement for use at work, is that first kickstart file is simple but obtains the hosts Dell Service Tag and passes that to the web server to produce the rest of the kickstart file.

PXE boot the installer. I installed a minimal installation of Fedora 20 in a VM and added; dnsmasq, syslinux-tftpboot, nginx, php-fpm. This VM has 2 networks, one (ens192) is connected to the local LAN, and the second (ens224) is the network builds will occur on. Syslinux is on there as it drops the pxelinux binaries and builds out a /tftpboot folder. Nginx is a web server, and php-fpm is installed as I knocked up a quick script for templating the kickstart files in PHP (don't hate me). DNSmasq is a simple DHCP server which can do DNS and TFTP too, so a no brainer for this deployment. I used a very simple configuration for DNSMasq which I put in the fragment directory. This configuration can probably be simpler but I just based it off a working one from my openwrt box (which I already had PXE working on for Linux installers). After creating the config file, enable and start the service.

File: /etc/dnsmasq.d/buildbox.conf

domain-needed
bogus-priv
expand-hosts
domain=local
local=/local/
interface=ens224
dhcp-range=192.168.0.50,192.168.0.150,12h
enable-tftp
tftp-root=/tftpboot/
dhcp-boot=pxelinux.0

The pxelinux configuration is very similar to syslinux, which is great because ESX uses syslinux for installing and booting installed machines. An easy mistake to make with setting this up is "pxelinux.cfg" is a directory NOT a file. Again I setup a basic configuration based off one I already used so know it works. The file below sets up a simple interactive menu showing the available options; local HDD, ESX installer, ESX kickstart. Note: this shows esx5.1 (for work) but it does work fine with 5.5 (home). Also see the tree structure of the basic TFTP boot area - note pxelinux.cfg is a folder and the other files (supplied by syslinux) are in the tftproot, also note esx is in a sub-folder.

TFTP directory tree

File: /tftpboot/pxelinux.cfg/default

default menu.c32
prompt 0
timeout 3000
ontimeout local

menu title BUILDBOX PXE MENU

label local
    menu label Local HDD
    kernel chain.c32
    append hd0 0

label esx51
    menu label ESXi-5.1 Installer
    kernel esx51/mboot.c32
    append -c esx51/boot.cfg

label esx51-ks
        menu label ESXi-5.1 Kickstart
        kernel esx51/mboot.c32
        append -c esx51/boot.cfg ks=http://192.168.0.1/ks.txt

The ESX files need to be loaded onto the tftp server too. I keep them in a subdirectory off the tftproot to make it easy to add/change later. Simply copy all files off your esx iso into a location like /tftpboot/esx51 as I used. Then edit the /tftpboot/esx51/boot.cfg file to cater for the changed root dir. The lines that need editing are "kernel=/esx51/tboot.b00" and "modules=". Every file reference needs the path included so add /esx51/ to each file.

That will get you a PXE booting ESX installer, but to make it more useful for kickstarting lets do the rest.

Nginx setup. Setup nginx to use php via php-fpm. This is a very simple config file with any security options removed. On an isolated network it's probably fine but I wouldn't leave this running on any real web server. Also note the web root is /www which is where we'll be putting files. Setup PHP by setting the timezone (avoids an error);

sed -ie "s/;date.timezone =/date.timezone = Etc\/UTC/" /etc/php.ini

Now enable and start the php-fpm service. Once the nginx file is saved, enable and start nginx.

File: /etc/nginx/conf.d/default.conf

server {
        listen       80 default_server;
        server_name buildbox.local;

        root /www/;
        index index.html index.htm index.php;

        location / { }

        location ~ \.php$ {
                include /etc/nginx/fastcgi_params;
                fastcgi_pass  127.0.0.1:9000;
                fastcgi_index index.php;
                fastcgi_param SCRIPT_FILENAME \$document_root\$fastcgi_script_name;
        }
}

I'd recommend testing the web server from another host and make sure php works. Drop a file on there like this to test. If this script produces the phpinfo page then it's working, if not hit the logs and see why not.

File: /www/test.php

<?php
phpinfo();

Now for the actual kickstart part. I found plenty of good resources for this online, so it wasn't hard to get a working config going pretty quickly allowing me to focus my efforts on specific requirements. For home I use a templated kickstart file, which based off a number passed to the php script I get one host or another - this means my pxelinux menu has entries for each host as the url is slightly different. For work however I wanted to be more efficient than this - due to the much larger number of hosts I didn't want to have heaps of menu options. Fortunately I was able to get the Dell Service Tag (ultimately a short serial number) off the server prior to ESX installation. We track the assets using this number so it's helpful to know service tag 1234XYZ belongs to company ABC and is destined for location JKL or whatever.

The work flow is this:

  • PXELinux menu includes a URL to a simple file /ks.txt. That file is the kickstart file.
  • ESX installer boots (over tftp) and downloads the ks.txt file (over http).
  • The kickstart file (ks.txt) includes a pre-install script to determine the service tag and pull down the rest of the configuration over http.
  • The web server returns a service tag specific kickstart file for a supplied service tag.
  • ESX installer uses the now complete kickstart file to complete the installation

The idea was to use "esxcli hardware platform get" to get the service tag and supply that to the php script. In the outputs below the Dell Service Tag is the serial number line.

# esxcli hardware platform get
Platform Information
   UUID: 0x4c 0x4c 0x45 0x44 0x0 0x4b 0x31 0x10 0x80 0x44 0xb1 0xc0 0x4f 0x43 0x32 0x53
   Product Name: PowerEdge R710
   Vendor Name: Dell Inc.
   Serial Number: 1K1DC2S
   IPMI Supported: true
# esxcli hardware platform get | grep Serial | cut -f2 -d:
 2M28C2S

Base kickstart file (rev 0):

vmaccepteula
rootpw TempESXPassword!!

clearpart --firstdisk --overwritevmfs
install --firstdisk --overwritevmfs

#DHCP for installation
network --bootproto=dhcp --addvmportgroup=false --device=vmnic20
#vmnic 20 is pci1-1 (on dell R820)
reboot

%include /tmp/fullks.txt

%pre  --interpreter=busybox
#grab the per host config
ST=$(esxcli hardware platform get | grep Serial | cut -f2 -d:)
wget -O /tmp/fullks.txt "http://192.168.0.1/ks-conf.php?st=${ST}"

I won't go into details about the ks-conf.php script - basically it takes the service tag in, and pulls details out of a csv to produce this hosts complete configuration (all settings, IP's and vswitches). As what usually happens to me, this was too easy and was bound for issues. Once I'd eliminated any obvious issues I got to the point of checking the nginx access log I found out the service tag was coming through blank. Fortunately during the ESX installer you can still get a shell, where I quickly learned esxcli doesn't work, dmidecode isn't present so that was no use BUT the older tools still work, so I had to adjust to use esxcfg-info instead. After a bit of hunting I found the info I needed and using the few tools available in the installer environment ended up with this;

Base kickstart file (rev 1, changed line only):

ST=$(esxcfg-info | grep "Serial Number" | head -1 | tail -c 8)

That worked, and now I was cooking with gas. Other things to note. The base kickstart file's network line is in my case for installation only - DHCP on 192.168.0.x network. This network is still present for the %post script so I was able to download additional packages from the web server for installation later. In the %firstboot section I setup vmk0's target IP for the destination network, and all the other settings necessary. Below is a sample of the resulting templated script out of ks-conf.php. I've replaced all possibly sensitive details and reduced the config to only show the basic configuration (all other vswitches are based off the same template as vSwitch2 only with different vmnic's and vlans). In the interest of readability I've left all my comments in this.

Sample output from ks-conf.php?st=xxxx123

%post  --interpreter=busybox
#Dell openmanage vib
wget -P /vmfs/volumes/datastore1/ http://192.168.0.1/OM-SrvAdmin-Dell-Web-7.4.0-1070.VIB-ESX51i.zip

%firstboot --interpreter=busybox
# rename local datastore to something more meaningful
vim-cmd hostsvc/datastore/rename datastore1 "auxxxesx1_local"

# network settings
esxcli network ip interface ipv4 set -i vmk0 -t static -I 10.17.43.232 -N 255.255.255.0
esxcli network ip route ipv4 add -n 'default' -g 10.17.43.254

# Set DNS and hostname
esxcli system hostname set --fqdn=auxxxesx1.internal
esxcli network ip dns search add --domain.internal
esxcli network ip dns server add --server=10.17.43.218
esxcli network ip dns server add --server=10.10.1.3
esxcli network ip dns search remove --domain=local #from dhcp
esxcli network ip dns server remove --server=192.168.0.1 #from dhcp

# Enable SSH and the ESXi Shell
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell

# ESXi Shell availability timeout, the interactive idle time logout, and suppress the shell enabled warnings
esxcli system settings advanced set -o /UserVars/ESXiShellTimeOut -i 3600 #timeout also disables ssh after the timeout
esxcli system settings advanced set -o /UserVars/ESXiShellInteractiveTimeOut -i 3600
esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1

# NTP
cat > /etc/ntp.conf << __NTP_CONFIG__
restrict default kod nomodify notrap noquerynopeer
restrict 127.0.0.1
server ntp.internal
__NTP_CONFIG__
/sbin/chkconfig ntpd on

# Logging
esxcli system syslog config set --logdir /vmfs/volumes/auxxxesx1_local/logs --logdir-unique=true

#disable ipv6
esxcli system module parameters set -m tcpip3 -p ipv6=0
#module renames in esx5.5 tcpip4
#esxcli system module parameters set -m tcpip4 -p ipv6=0

#mgmt network switch vSwitch0
esxcli network vswitch standard uplink add -v vSwitch0 -u vmnic4
esxcli network vswitch standard uplink add -v vSwitch0 -u vmnic20
esxcli network vswitch standard policy failover set -v vSwitch0 -a vmnic4,vmnic20
esxcli network vswitch standard policy failover set -v vSwitch0 --failback yes --failure-detection link --load-balancing portid --notify-switches yes
esxcli network vswitch standard policy security set -v vSwitch0 --allow-forged-transmits yes --allow-mac-change yes --allow-promiscuous no
esxcli network vswitch standard set --cdp-status both --vswitch-name vSwitch0 
#vmk0 is automatically on vSwitch0 

#NFS network switch vSwitch1
esxcli network vswitch standard add -v vSwitch1
esxcli network vswitch standard uplink add -v vSwitch1 -u vmnic0
esxcli network vswitch standard uplink add -v vSwitch1 -u vmnic2
esxcli network vswitch standard policy failover set -v vSwitch1 -a vmnic0,vmnic2
esxcli network vswitch standard policy failover set -v vSwitch1 --failback yes --failure-detection link --load-balancing portid --notify-switches yes
esxcli network vswitch standard policy security set -v vSwitch1 --allow-forged-transmits yes --allow-mac-change yes --allow-promiscuous no
esxcli network vswitch standard set --cdp-status both --vswitch-name vSwitch1 
esxcli network vswitch standard set -v vSwitch1 --mtu 9000

esxcli network vswitch standard portgroup add -v vSwitch1 -p NFS
esxcli network ip interface add -p NFS -i vmk1
esxcli network ip interface ipv4 set -i vmk1 -t static -I 172.16.2.6 -N 255.255.255.0
#enable vmotion on vmk1
vim-cmd hostsvc/vmotion/vnic_set vmk1

#VMNET1 vSwitch2
esxcli network vswitch standard add -v vSwitch2
esxcli network vswitch standard uplink add -v vSwitch2 -u vmnic13
esxcli network vswitch standard uplink add -v vSwitch2 -u vmnic19
esxcli network vswitch standard policy failover set -v vSwitch2 -a vmnic13,vmnic19
esxcli network vswitch standard policy failover set -v vSwitch2 --failback yes --failure-detection link --load-balancing portid --notify-switches yes
esxcli network vswitch standard policy security set -v vSwitch2 --allow-forged-transmits yes --allow-mac-change yes --allow-promiscuous no
esxcli network vswitch standard set --cdp-status both --vswitch-name vSwitch2
#portgroup
esxcli network vswitch standard portgroup add -v vSwitch2 -p VMNET1
#vlan
esxcli network vswitch standard portgroup set -p VMNET1 -v 100

#SNMP
esxcli system snmp set --communities=esxcommunity --syscontact="xxxxxx" --syslocation="xxxxxx"
esxcli system snmp set --targets=xxxxxx.internal@162/esxcommunity
#esxcli system snmp set --enable true
#allow all hosts
esxcli network firewall ruleset set --ruleset-id snmp --allowed-all true
/etc/init.d/snmpd restart

#nfs datastores
esxcli storage nfs add --host 172.16.2.3 --share /xxx_vol1 --volume-name xxx_VOL1
esxcli storage nfs add --host 172.16.2.4 --share /xxx_vol2 --volume-name xxx_VOL2

#Dell Openmanage vib
esxcli software vib install --depot=/vmfs/volumes/auxxxesx1_local/OM-SrvAdmin-Dell-Web-7.4.0-1070.VIB-ESX51i.zip

#backup and go into maintenance mode
/sbin/auto-backup.sh
esxcli system maintenanceMode set -e true

Things to note:

  • The DNS search domain and server from DHCP are removed in %firstboot
  • I'm pxebooting on vmnic20 which is to be a management interface.
  • NFS and vmotion are on the same vswitch, in my case this is because that vswitch is 10Gbit.
  • My hosts all have additional vswitches for different networks (physical lan separation), I've only showed one of them as VMNET1 above.
  • SNMP hasn't been enabled, as my hosts hosts are being installed then shipped not installed in place.
  • For the installer to add the NFS datastores, they have to be available at the time of installation.
  • I haven't assigned licenses at this stage. This could be done easily however I prefer to add them when joining the host to vSphere.

Oh and as per usual, the buildbox VM was also built with kickstart which preconfigures everything as above, and dumps the scripts and templates down - just in case I need to rebuild that too.

That'll do for now. I've got another vmware related post coming soon.

Virtually upgraded

I've been doing a fair bit of VMware stuff at work lately and thought it was about time I upgraded my lab to ESXi 5.5. My reasons for holding off was because the 8168 driver was removed from the released iso and they're the only nic's I have in my machines. Fortunately adding the driver in turned out to be very easy.

First up I had to upgrade vcenter. I use the appliance for simplicity however I'm considering moving to the windows version running on a 2012R2 machine to reduce overheads (yes the appliance uses too much ram). All I did for this was deploy a new appliance on a new IP and connect all my hosts to it. I had to recreate datacenter/cluster level stuff but there's not much configured there so that's fairly easy. Also realise as this is a management level connection the running vm's all keep running without issue.

Then to update ESXi I simply followed instructions found online and slightly modified to use the latest release available.

Using VMware PowerCLI create a custom image with the extra driver package which fortunately is still in the online repo. Note I said PowerCLI, not PowerShell. If you run PowerShell instead you'll need to "Add-PSSnapin VMware.ImageBuilder" first.

    Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml 
    Get-EsxImageProfile -Name "*5.5*" | Format-Table -Wrap -AutoSize
    #from the output of that select the newest profile name and use that below, in this case ESXi-5.5.0-20131204001-standard
    New-EsxImageProfile -CloneProfile "ESXi-5.5.0-20131204001-standard" -name "ESXi-5.5.0-20131204001-DS61" -Vendor "tardfree.net"
    Add-EsxSoftwarePackage -ImageProfile "ESXi-5.5.0-20131204001-DS61" -SoftwarePackage "net-r8168"
    Export-ESXImageProfile -ImageProfile "ESXi-5.5.0-20131204001-DS61" -ExportToISO -filepath C:\Temp\ESXi-5.5.0-20131204001-DS61.iso

Then I realised that's fine for manual upgrades done from the console but I wanted to try updating remotely (sadly I don't have out of band access to all my boxes). Fortunately that's easy too.

Before attempting an upgrade though, it's always wise to backup your configuration. I recently used this to rebuild a host when it's boot media died. Again, from PowerCLI:

    Connect-VIServer -Server vcenter -User root
    #obviously use your vcenter hostname if it's not vcenter
    Get-VMHostFirmware -VMHost vs1.tardfree.net -BackupConfiguration -DestinationPath c:\temp

Then to perform the in-place upgrade, from the ESX shell (ssh into the hosts) run the following. This opens the firewall to allow outbound http requests, and then downloads and applies the update. Note the same profile name as in the above iso creation. Also note the update command not install command. Update keeps vibs that aren't present in the image, install does not. So in my case install would throw out the network driver I need as it's not in the new image profile.

    esxcli network firewall ruleset set -e true -r httpClient
    esxcli software profile update -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml -p ESXi-5.5.0-20131204001-standard

After running that it tells you a reboot is needed, so reboot it and hope for the best. So far all of my physical hosts have come up fine (one left to reboot) and 4 out of 5 virtual esx hosts did. The one that failed pink screened on me and then recovered to the previous version partition. I'm yet to figure out quite why, so I might just reinstall it (all 5 of these hosts were built from the same template and have identical configuration).

Anyway that's enough for now.

Copyright © 2001-2016 Robert Harrison. Powered by hampsters on a wheel. RSS.