« Environment | Main | Music »

Thursday, May 27, 2021

Converting CentOS 8 Stream to AlmaLinux 8.4


There's no better time to migrate CentOS 8 Stream to AlmaLinux 8.4. Because RedHat Enterprise Linux 8.4 has only just been released, the differences between Stream and 8.4 are minor.

First off, make sure your box is up to date:

dnf distro-sync

Then back it up or snapshot it. Some packages may revert to older versions and break things.

Then download the AlamLinux conversion script

curl -O https://raw.githubusercontent.com/AlmaLinux/almalinux-deploy/master/almalinux-deploy.sh

edit almalinux-deploy.sh

MINIMAL_SUPPORTED_VERSION='8'

In REMOVE_PKGS add "centos-stream-release" "centos-stream-repos"

Amend the grep command in get_os_version:

        if ! os_version="$(grep -oP 'CentOS\s+(Linux|Stream)\s+release\s+\K(\d+(\.\d+)?)' \ 
                                    "${REDHAT_RELEASE_PATH}" 2>/dev/null)"; then 

    

Or alternatively, download my modified install script from my repo:

curl -O https://raw.githubusercontent.com/philrandal/almalinux-deploy/master/almalinux-deploy.sh

Then you're set to go. The script works with CentOS 8.3 and CentOS 8 Stream.

bash almalinux-deploy.sh 

and if it succeeds

reboot

After the reboot, check for orphaned packages: I got this from migrating a full CentOS 8 Stream install to AlmaLinux 8.4

dnf list extras
Extra Packages
llvm-compat-libs.x86_64 10.0.1-1.module_el8.4.0+533+50191577

They can safely be removed.

When you reboot, you'll see that the original CentOS rescue kernel is still there.

Fix that with:

rm -f /boot/vmlinuz-0-rescue-* /boot/initramfs-0-rescue-*.img
/usr/lib/kernel/install.d/51-dracut-rescue.install add $(uname -r) "" /lib/modules/$(uname -r)/vmlinuz

To remove old CentOS kernels,

rpm -qa | grep kernel-core 

Then use yum remove to get rid of any superfluous ones.

Test, test, and test again.

Remove any snapshots once you're happy.

Postscript:

So you just want to convert back to CentOS 8.x from CentOS Stream?

No worries (except for the caveat that downgrades may break things).

curl -O http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/Packages/centos-linux-release-8.3-1.2011.el8.noarch.rpm
curl -O http://mirror.centos.org/centos-8/8.3.2011/BaseOS/x86_64/os/Packages/centos-linux-repos-8-2.el8.noarch.rpm
rpm -e --nodeps centos-stream-repos centos-stream-release
rpm -ivh centos-linux-release-8.3-1.2011.el8.noarch.rpm centos-linux-repos-8-2.el8.noarch.rpm
dnf distro-sync
reboot  


Posted by Phil at 5:28 PM
Edited on: Wednesday, June 09, 2021 12:55 PM
Categories: IT, Software

Monday, September 16, 2019

Lenovo IdeaPad S340-15API


I've just purchased a Lenovo IdeaPad S340-15API From Argos to replace our aging and very slow Acer Aspire E5-511 laptop.

The spec is good. For £550 you get a 15.6 inch Full HD (1920 x 1080) display, AMD Ryzen 7 3700U quad-core 2.3GHz processor (with Simultaneous Multi-Threading giving 8 virtual cores) with Radeon Vega Mobile graphics, 8GB of RAM, of which 2GB is taken as display RAM, and 512GB SSD (WDC PC SN520 SDAPMUW-512G-1101).

Windows 10 1809 Home pre-installed (in S mode), but it is a simple fix to switch to full Windows 10 Home.

It didn't offer me Windows 10 1903, so I went and fetched it using the Microsoft Update Assistant.

It took a while to get that on, and the successive updates.

Lenovo Vantage offered offered a few driver updates, but I went to the Lenovo Support website and manually downloaded the latest BIOS and assorted drivers.

Power settings are never right by default, as I discovered the hard way. I could boot once, close the lid to put it into sleep mode, open it again to wake up, but the second time I closed the lid it would not behave and effectively locked up.

Powering off and on brought it back to life.

Eventually, after running powercfg /a I discovered a workaround:

C:\Windows\System32>powercfg /a
The following sleep states are available on this system:
Standby (S3)
Hibernate
Hybrid Sleep
Fast Startup
The following sleep states are not available on this system:
Standby (S1)
The system firmware does not support this standby state.
Standby (S2)
The system firmware does not support this standby state.
Standby (S0 Low Power Idle)
The system firmware does not support this standby state.

In the Power Options page in Control Panel, Change Plan Settings, then Change advanced power settings

In the Sleep section of the Advanced settings, turn Allow hybrid sleep on for both On battery and Plugged in

Then it all works as expected.

Postscript, September 28, 2019

The cause of the Sleep problem was the Realtek Audio Driver, and the fix was found on the Lenovo Support Forum here.

If you're not averse to extracting .cab files and manually updating drivers, give it a go.



Posted by Phil at 5:25 PM
Edited on: Saturday, September 28, 2019 8:32 PM
Categories: IT

Saturday, August 17, 2019

About to update to RedHat Enterprise Linux or CentOS 7.7? Beware!


There's a couple of caveats before upgrading to RHEL / CentOS 7.7.

1: Upgrading to latest RHEL 7 PCP package pcp-4.3.2-2.el7 gives several issues.

During installation/upgrade of pcp-selinux:

    Updating / installing...
    1:pcp-selinux-4.3.2-2.el7 ################################# [100%]
Failed to resolve allow statement at /etc/selinux/targeted/tmp/modules/400/pcpupstream/cil:83
semodule: Failed!

Followed by lots of selinux errors logged.

Workaround is to install 7.7's selinux-policy first

    yum update selinux-policy

or, if you've already updated,

    yum reinstall pcp-selinux

Documented in RedHat Bugzilla entry 1714101 

2: bind is updated from version 9.9 to 9.11. This breaks configurations in which the same zone file is defined in several views.

The solution is to use bind 9.11's "in-view xxx" in subsequent definitions of the same zone in different views.

The fix is described in detail here.



Posted by Phil at 7:32 PM
Edited on: Saturday, September 14, 2019 3:47 PM
Categories: IT, Software

Sunday, July 14, 2019

Using the Pimoroni Fan Shim with LibreElec


Trying out the latest LibreElec alpha on a Raspberry Pi 4 on the hottest day of the year so far I found the CPU temperature reaching 70ºC. A bit too warm for my liking, so I've added a Pimoroni Fan Shim.

There's no way you can install the Pimoroni python library on LibreElec, but there is an alternative.

First, you need to install the Raspberry Pi Tools addon in LibreElec.

Addons / install from repository / libreelec add-ons / program add-ons / Raspberry Pi Tools.

I found a script on https://forum-raspberrypi.de/forum/thread/43568-fan-shim-steuern/, and edited it slightly. Change the min and max temperatures to suit yourself.

ssh into your LibreElec box as root and

nano /storage/fanshim.py

#!/usr/bin/env python
# https://forum-raspberrypi.de/forum/thread/43568-fan-shim-steuern/
# place command below in /storage/.config/autostart.sh
#   nohup /storage/fanshim.py &
import os
import time
import signal
import sys
sys.path.append('/storage/.kodi/addons/virtual.rpi-tools/lib')
import RPi.GPIO as GPIO
import subprocess
Pause = 30
CoreTempMax = 57
CoreTempMin = 46
GPIO_Pin = 18
Run_Fan_function = False
def init():
    GPIO.setwarnings(False)
    GPIO.setmode(GPIO.BCM)
    GPIO.setup(GPIO_Pin, GPIO.OUT)
    return()
def Set_Fan_ON():
    GPIO.output(GPIO_Pin, True)
    return()
    
def Set_Fan_OFF():
    GPIO.output(GPIO_Pin, False)
    return()
def get_CPU_Temp():
    temp = subprocess.check_output(['vcgencmd', 'measure_temp'])[5:-3]
    return temp
    
def Watch_Temp():
    global Run_Fan_function
    CPU_Temp = float(get_CPU_Temp())
    if Run_Fan_function==False and CPU_Temp>=CoreTempMax:
        Run_Fan_function = True
        Set_Fan_ON()
    if Run_Fan_function==True and CPU_Temp<=CoreTempMin:
        Run_Fan_function = False
        Set_Fan_OFF()
    return();
try:
    init() 
    while True:
        Watch_Temp()
        time.sleep(Pause)
except KeyboardInterrupt:
    GPIO.cleanup()

Save the edited file, and then:

chmod +x /storage/fanshim.py

Then edit LibreElec's autostart.sh

nano /storage/.config/autostart.sh so it contains the line:

nohup /storage/fanshim.py &

Reboot your Pi 4 for it to take effect.

If you're using a TV Hat, you'll need an extra tall stacking header to give a good gap between the fan and the TV Hat.

Enjoy.

Postscript, October 26, 2019

Someone has done a bit of work on this and turned it into a proper installable Kodi addon:

https://github.com/jane-t/rpi-fanshim

Thanks!



Posted by Phil at 10:06 PM
Edited on: Saturday, October 26, 2019 4:27 PM
Categories: IT, Raspberry Pi

Saturday, April 27, 2019

A Raspberry Pi Stratum 1 NTP Server


Overview

The following instructions are how to make a cheap Pulse Per Second (PPS) disciplined Stratum 1 NTP Time server using one of the Raspberry Pi U-blox M8Q based GPS boards sold by Uputronics.

Our basic requirement is for an NTP server which will work standalone without connectivity to other NTP servers, so if a datacentre loses connectivity its servers will still have a stable time source.

As we’re only interested in time and not height, motion, or location, all we’re interested in is the NMEA xxZDA and xxRMC sentences and the highly accurate PPS signal from the GPS module, both of which we feed straight into the ntp daemon. The xxZDA sentences (which give us full 4-digit years) are not output by default by the M8Q so we have to enable them at boot time.

Many guides on the net use gpsd to feed ntp via shared memory. That’s an additional overhead and complexity, especially if we want to enable Galileo reception (gpsd initialises the U-blox the way it sees fit). I prefer to let the NTP daemon read the NMEA stream from the GPS receiver itself, avoiding the middle man.

In this guide we’ve chosen to use NMEA output from the GPS module via the ntp Generic GPS Receiver driver 20 ( https://www.eecis.udel.edu/~mills/ntp/html/drivers/driver20.html ) and the PPS signal via driver 22 ( https://www.eecis.udel.edu/~mills/ntp/html/drivers/driver22.html ).

You will need a Raspberry Pi 4B, the Uputronics Raspberry Pi+ GPS Expansion Board and a suitable GPS antenna.

This guide assumes that you’re using Raspbian Buster Lite or later. It should work on Bullseye as there's now no dependency on wiringPI. Download and write this to an SD card.

See https://www.raspberrypi.org/documentation/installation/installing-images/windows.md

Create a file named ssh in the boot folder after burning the MicroSD card.

Attach the Uputronics Raspberry Pi+ GPS Expansion Board to the Pi, insert the SD card, connect the antenna and network cable and boot the Pi up. Either connect locally or via SSH to the Pi. If you can’t SSH in and don’t have a monitor see this

https://www.raspberrypi.org/documentation/configuration/wireless/headless.md

Follow the instructions carefully if you miss steps things won’t work.

The Uputronics board has u-blox firmware 3.01 on it, dated 2016.

The week number rollover is set to 1867 (October 2015). All transmitted week numbers are mapped to the ~19.5 year period between week 1867 and week 2990 (April 2035).

A Note About Accuracy

In theory, GPS-based time receivers can give a very high accuracy, with the PPS (Pulse per Second) signal being accurate to within 10ns.

However, fix data from gpsctl shows:


    
pi@ntp2:~ $ /usr/local/bin/gpsctl -Q fix
Time (UTC): 2019-04-18 15:31:27 (yyyy-mm-dd hh:mm:ss)
Latitude: 52.05994980 N
Longitude: 2.72698960 W
Altitude: 198.789 feet
Motion: 0.338 mph at 53.114 degrees heading
Satellites: 5 used for computing this fix
Accuracy: time (39 ns), height (+/-18.199 feet), position (+/-103.911 feet), heading(+/-8.230 degrees), speed(+/-0.132 mph)

Note the 39ns time accuracy from that fix. The more satellites, the better.

Errors are also caused by signal delay (4ns per foot) in the cable from the aerial to the GPS receiver, etc.

So we’d be lucky to get 100ns accuracy from the PPS pulse.

Add to that the processing overheads of the PPS interrupt, and processor clock jitter in the Raspberry Pi, and overheads transferring and decoding the NMEA sentences. The jitter on the PPS signal is less than 5 microseconds.

The output from the ntp NMEA driver without PPS correction can have a jitter of a few milliseconds, and an offset from real time which has to be tweaked manually to get within 5ms of the correct time.

NTP sources on the internet can show offsets / jitter of over 5ms, local LAN 50us or more.

On the Pi 4B we can expect an average jitter of less than 1 microsecond.

Required Components

I sourced my components from the Pi Hut unless noted elsewhere:

1 Raspberry Pi Model 4B 1GB RAM

1 8GB or larger micro SD card

(a Transcend High Endurance 32GB micro SD card would be better choice than a generic one for longevity)

1 Uputronics GPS Hat

1 Pi Hut Pi 4 GPS case

1 Raspberry Pi 4 power supply

1 GPS SMA Antenna

(optional) 1 SMA-male to TNC-female (or BNC, as needed) adaptor to connect to existing GPS aerial (from Amazon or eBay)

Prerequisite Settings

Login as pi / raspberry, and immediately change the password from the default

passwd

sudo raspi-config

Advanced Options

Expand filesystem

Interfacing Options

SSH -> Would you like the SSH server to be enabled – Yes (Recommended)

I2C -> Would you like the ARM I2C interface to be enabled? - Yes

Serial -> Login Shell (no) Hardware (yes) (Optional)

Quit but no need to reboot at this point.

for i in systemd-timesyncd avahi-daemon alsa-state bluetooth triggerhappy hciuart rng-tools autologin getty;

do

sudo systemctl disable $i.service; sudo systemctl stop $i.service;

done

sudo systemctl disable serial-getty@ttyAMA0.service

sudo systemctl mask serial-getty@ttyAMA0.service

sudo apt install pps-tools ntp cpufrequtils i2c-tools

sudo apt purge bluez bluez-firmware wpasupplicant (assuming no wifi required)

sudo apt update

sudo apt full-upgrade

sudo nano /boot/config.txt and add at the bottom :

dtoverlay=disable-bt
dtoverlay=disable-wifi
dtoverlay=pps-gpio,gpiopin=18

Make sure that enable_uart=1 is set in /boot/config.txt

sudo nano /boot/cmdline.txt and add at the end

nohz=off

sudo nano /etc/modules and add at the bottom :

pps-gpio
  

Save and Quit Nano.

sudo nano /etc/udev/rules.d/80-gps-to-ntp.rules (needed for refclock 20 in ntp)

# Change MODE of ttyAMA0 so it is readable by NTP and provide

# a symlink to /dev/gps0
KERNEL=="ttyAMA0", SYMLINK+="gps0", MODE="0666"
# Symlink /dev/pps0 to /dev/gpspps0
KERNEL=="pps0", SYMLINK+="gpspps0", MODE="0666"

and then sudo reboot


Enabling Galileo Satellites and Setting Stationary Mode

Stationary mode gives a faster GPS fix.

See Tom Dilatush’s gpsctl:

http://www.jamulblog.com/2017/11/paradise-ponders-gpsctl-functionally.html

I’ve customised it further to enable the use of a config file and put it in my GitHub repo:

https://github.com/philrandal/gpsctl

Download and build gpsctl.

cd ~
wget https://github.com/philrandal/gpsctl/archive/master.zip
unzip master.zip
mv gpsctl-master gpsctl
cd gpsctl
./build.sh
sudo cp /home/pi/gpsctl/gpsctl /usr/local/bin
sudo cp /home/pi/gpsctl/etc/gpsctl.conf /etc/gpsctl.conf

Individual settings can be tweaked in /etc/gpsctl.conf.

To set port speed to 115200 baud, enable Galileo satellites, set stationary mode, tweak antenna delays, PPS timing, etc:

/usr/local/bin/gpsctl -a -B 115200 --configure_for_timing -vv

To reset the device to its defaults

/usr/local/bin/gpsctl -a --reset -vv

To view info:

/usr/local/bin/gpsctl -a -Q satellites

/usr/local/bin/gpsctl -a -Q config

/usr/local/bin/gpsctl -a -Q fix

Note that these commands can only be run when gpsd / ntp are not using /dev/ttyAMA0

Example /etc/gpsctl.conf which configures the U-blox M8Q for this environment:

    
# # example gpsctl.conf which enables Galileo as in --galileo parameter
#
[gpsctl]
port = /dev/serial0
# sync method: ASCII = 1, NMEA = 2, UBX = 3
sync method = 3
verbosity = 0
[NMEA]
enabled = true
version = 41
GGA = off
GLL = off
GSA = off
GSV = off
RMC = on
VTG = off
GRS = off
GST = off
ZDA = on
[GPS]
enabled = yes
minimum channels=8
maximum channels=16
[SBAS]
enabled = no
minimum channels=1
maximum channels=3
[Galileo]
enabled = yes
minimum channels=4
maximum channels=8
[Beidou]enabled = no
minimum channels=8
maximum channels=16
[IMES]
enabled = no
minimum channels=0
maximum channels=8
[QZSS]
enabled = no
minimum channels=0
maximum channels=3
[GLONASS]
enabled = yes
minimum channels=8
maximum channels=14
[Navigation Engine]
# Dynamic model: Portable = 0, Stationary = 2, Pedestrian = 3, Automotive = 4,
# Sea = 5, Air1G = 6, Air2G = 7, Air4G = 8, Watch = 9
Dynamic model = 2
# Fix mode: 2D only = 1, 3D only = 2, auto 2D/3D = 3
Fix mode = 3
Fixed altitude (2D) = 0.00 meters
Fixed altitude variance (2D) = 1.0000 meters^2
Minimum elevation = 5 degrees
Position DoP mask = 10.0
Time DoP mask = 10.0
Position accuracy mask = 100 meters
Time accuracy mask = 300 meters
Static hold threshold = 0 cm/s
Dynamic GNSS timeout = 60 seconds
Threshold above C/No = 0 satellites
C/No threshold = 0 dBHz
Static hold max distance = 0 meters
# UTC Standard: AutoUTC = 0, USNO_UTC = 3, GLONASS_UTC = 6, BeiDou_UTC = 7
UTC standard = 3
[Time Pulse]
# the nanoseconds / microseconds after the numbers are just reminders,
# they don't mean anything to the config parserAntenna cable delay = 56 nanoseconds
RF group delay = 20 ns
Unlocked pulse period = 1000000 microseconds
Unlocked pulse length = 0
Locked pulse period = 1000000 microseconds
Locked pulse length = 500000 microseconds
User configurable delay = 0

To run gpsctl at system startup,

sudo cp /home/pi/gpsctl/etc/systemd/system/ublox-init.service /etc/systemd/system/ublox-init.service

or

nano /etc/systemd/system/ublox-init.service

and add the following contents

[Unit]
Description=u-blox initialisation
Before=gpsd.service
Before=ntp.service
[Service]
Type=oneshot
Environment="PARAMS=-q -a -B 115200 --configure_for_timing"
EnvironmentFile=-/etc/default/gpsctl
ExecStart=/bin/bash '/usr/local/bin/gpsctl ${PARAMS}'
[Install]
WantedBy=multi-user.target

    

This will enable Galileo satellites, set stationary mode, configure comms at 115200 baud, and restrict NMEA output to RMC and ZDA records.

To use the default baud rate of 9600, create /etc/default/gpsctl with the contents

PARAMS=-q --configure-for-timing

Then

sudo systemctl enable ublox-init.service

sudo systemctl daemon-reload


    

Verifying that PPS Is Working

Ensure the GPS has a lock and the Green PPS LED on the Uputronics Pi+ GPS Expansion Board is blinking once a second.

lsmod | grep pps

Output should be similar to :

pps_gpio                3089  1
pps_core                8606  4 pps_gpio

dmesg | grep pps

Output should be similar to :

[    2.735586] pps_core: LinuxPPS API ver. 1 registered
[ 2.738121] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 2.763842] pps pps0: new PPS source pps@12.-1
[ 2.766361] pps pps0: Registered IRQ 169 as PPS source

    

This indicates that the PPS Module is loaded.

sudo ppstest /dev/pps0

Output should be similar to:

trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1418933982.998042450, sequence: 970 - clear  0.000000000, sequence: 0
source 0 - assert 1418933983.998045441, sequence: 971 - clear  0.000000000, sequence: 0

    

(Press CTRL+C to quit). This indicates that the PPS Module is working.

Enabling PPS/ATOM Support in NTPD

The supplied version of NTPD on the Raspberry Pi in Raspbian Stretch 2018-11-13 and later supports PPS so there is no need to roll your own NTP.

You need to pick a few local NTP servers to use. The easiest way to do this is pick your region:

https://support.ntp.org/bin/view/Servers/NTPPoolServers

Select your region then you get a list of the country servers. E.g for the UK its uk.pool.ntp.org.

Type:

dig uk.pool.ntp.org

You will get four IP’s back:

;; &lt;&lt;&gt;&gt; DiG 9.10.3-P4-Raspbian &lt;&lt;&gt;&gt; +answer uk.pool.ntp.org
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 51647
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
;; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;uk.pool.ntp.org. IN A
;; ANSWER SECTION:
uk.pool.ntp.org. 24 IN A 193.150.34.2
uk.pool.ntp.org. 24 IN A 176.58.109.199
uk.pool.ntp.org. 24 IN A 195.195.221.100
uk.pool.ntp.org. 24 IN A 185.53.93.157


sudo nano /etc/ntp.conf

Add

# PPS Driver
server 127.127.22.0 minpoll 4 maxpoll 4
fudge 127.127.22.0 time1 +0.000000 flag3 0 refid PPS
#flag3 Controls the kernel PPS discipline: 0 for disable (default), 1 for enable.
#time1 PPS time offset
# use local clock at stratum 10 if no others available
tos orphan 10
tos mindist 0.002
# NMEA driver (/dev/gps0 and /dev/gpspps0)
# 115200 baud, RMC and ZDA messages
server 127.127.20.0 mode 89 minpoll 4 maxpoll 4 iburst prefer
fudge 127.127.20.0 flag1 0 flag2 0 flag3 0 time2 0.050 refid GPS stratum 2
#flag1 Disable PPS signal processing if 0 (default); enable PPS signal processing if 1.
#flag2 If PPS signal processing is enabled, capture the pulse on the rising edge if 0 (default); capture on the falling edge if 1.
#flag3 If PPS signal processing is enabled, use the ntpd clock discipline if 0 (default); use the kernel discipline if 1.
#time1 PPS time offset
#time2 NMEA time offset
#mode
# bit 0 - process $GPRMC (value = 1)
# bit 1 - process $GPGGA (value = 2)
# bit 2 - process $GPGLL (value = 4)
# bit 3 - process $GPZDA or $GPZDG (value = 8)
# bits 4/5/6 - select serial bitrate (0 for 4800 - the default, 16 for 9600, 32 for 19200, 48 for 38400, 64 for 57600, 80 for 115200)
# mode 25 = process only xxRMC and xxZDA NMEA records at 9600 baud
  

(Note that I used “fudge 127.127.20.0 time2 0.050 …” to adjust the GPS time according to the other NTP servers. You can try some higher/lower values to have the offset of your NMEA driver compared to the offsets of those other NTP servers very small. We’ll cover that later.)

We want to make sure that the second reported by the NMEA driver is the second that the last PPS pulse referred to, so it needs to be within a few hundred milliseconds of the correct time so that NTP does the right thing. This is why it’s desirable to mark another NTP server as “prefer” to help the NTP daemon get it right.

Note that we use NTP PPS discipline, not kernel PPS (which in my testing results in warnings from ntp that kernel PPS discipline isn’t supported).

Comment out all the pool lines.

Add the servers from the dig command, or use servers of your choice) with the top one saying prefer on it (example only, don't all use these IP addresses):

server 176.58.109.199 iburst prefer
server 195.195.221.100 iburst
server 185.53.193.157 iburst
# By default, exchange time with everybody, but don't allow configuration.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict -6 ::1
restrict 10.0.0.0 mask 255.0.0.0
restrict 172.16.0.0 mask 255.240.0.0
restrict 192.168.0.0 mask 255.255.0.0
# Drift file etc.
driftfile /var/lib/ntp/ntp.drift

    

Note You MUST add a preferred server or PPS doesn’t work.

Save and close nano.

sudo systemctl restart ntp.service

After a few minutes run

ntpq –p

If you get oPPS(0) this indicates source selected, Pulse Per Second (PPS) used and everything is working.

pi@ntp2:~ $ ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
o127.127.22.0 .PPS. 0 l 6 16 377 0.000 -0.002 0.001
*127.127.20.0 .GPS. 1 l 5 16 377 0.000 -0.323 1.052
+195.195.221.100 .GPS. 1 u 49 64 377 18.307 0.190 0.127
+193.150.34.2 85.199.214.101 2 u 56 64 377 7.953 0.067 0.114
  

And then

pi@ntp2:~ $ ntpq -csysinfo
associd=0 status=0115 leap_none, sync_pps, 1 event, clock_sync,
system peer: PPS(0)
system peer mode: client
leap indicator: 00
stratum: 1
log2 precision: -21
root delay: 0.000
root dispersion: 2.015
reference ID: PPS
reference time: e0601d81.248b4b37 Tue, Apr 16 2019 10:23:13.142
system jitter: 0.000477
clock jitter: 0.003
clock wander: 0.001
broadcast delay: -50.000
symm. auth. delay: 0.000

If you aren’t seeing the settings its possible the NTP server is picking up the NTP information via DHCP which is overriding your settings above. Do this :

rm /etc/dhcp/dhclient-exit-hooks.d/ntp

rm /var/lib/ntp/ntp.conf.dhcp

At this point you have a NTP server which will use an external time source and use your local PPS to discipline it.

Note: Once you've configured ntp to use your GPS source, you'll need to stop the ntp service before running gpsctl, otherwise they'll both be trying to access ttyAMA0 at the same time.

GPS Offset Tuning

Your PPS time is going to be more accurate than NTP pool servers. Unless you have specialised equipment and local LAN GPS / DCF77 / MSF PPS sources to calibrate against, it’s not really possible to determine the appropriate PPS offset. It’s likely to be in the order of a few microseconds at most ( htt://lists.ntp.org/pipermail/questions/2011-September/030338.html ).

The PPS is a precise signal with around 10 ns of jitter. On the other hand, the offset of the GPS serial data output (GPSD) generally will have much more variation than the PPS because of the variables involved in sending data over an asynchronous serial port. However, the GPSD offset can be reduced somewhat by adjusting the GPSD reference clock fudge parameter time2 in the ntp.conf file.

The initial value used in our ntp.conf file is 43ms (0.043s). This means that the GPS ZDA packet arrives in the NTP daemon approximately 43ms after the PPS pulse corresponding to it. On our test device, we see a jitter of under 3ms on the GPS_NMEA clock.

We might need to adjust the GPS ntp.conf time2 (offset) value to get the NMEA time offset low enough that the offset isn’t great enough to confuse NTP as to which second the PPS pulse referred to.

This option can be used to compensate for a constant error. The specified offset (in seconds) is applied to all samples produced by the reference clock. The default is 0.000s.

Start with these ntp.conf settings, they should get you close enough to get everything working properly:

sudo nano /etc/ntp.conf

#kernel-mode PPS
server 127.127.22.0 minpoll 4 maxpoll 4
fudge 127.127.22.0 time1 +0.000 flag3 0 refid PPS
tos mindist 0.002
#GPS (NMEA)
server 127.127.20.0 mode 89 minpoll 4 maxpoll 4 iburst prefer
fudge 127.127.20.0 flag1 0 flag3 0 time2 0.043 refid GPS stratum 2

Add these lines at the top of ntp.conf:

statsdir /var/log/ntpstats/
statistics peerstats
filegen peerstats file peerstats type day enable

    

This enables logging of the peer server statistics.

To calculate the GPS offset we must disable GPS by placing a noselect in the ntp.conf GPS line. We'll run the time server for a few hours and then compare the ntpq -p GPS offset to the average public time server offset. For accurate tuning use a bunch of known-good Stratum 1 servers in ntp.conf.

sudo nano /etc/ntp.conf

#PPS
server 127.127.22.0 minpoll 4 maxpoll 4
fudge 127.127.22.0 time1 +0.000 flag3 0 refid PPS
tos mindist 0.002
#GPS (NMEA)
server 127.127.20.0 mode 89 minpoll 4 maxpoll 4 iburst prefer noselect
fudge 127.127.20.0 flag1 0 flag3 0 time2 0.043 refid GPS stratum 2

sudo systemctl stop ntp.service

sudo rm –f /var/log/ntpstats/*

sudo systemctl start ntp.service

Start ntpd and let it run for at least four hours. Periodically check progress with "ntpq -p" and wait until change has settled out.

Look for the row with GPS_NMEA(0) and refid GPSD. The offset values probably will be different for each

query. Note that the ntpq offsets are in milliseconds, but the peerstats file offsets and NTP’s time2 parameter are in seconds.

Calculate the average GPS offset (in seconds) using this script:

sudo nano ~/nmeaoffset

#!/bin/sh
#
# Generate an estimate of your GPS's offset from a peerstats file
#
awk '
/127\.127\.20\.0/ { sum -= $5 ; cnt++; }
END { printf("%.6f\n", sum / cnt); }
' </var/log/ntpstats/peerstats

    

Then sudo chmod +x ./nmeaoffset

Run by typing

pi@ntp2:~ $ ./nmeaoffset

-0.001212
  

That’s within a few milliseconds, close enough for anyone as PPS is going to do its magic to give us accuracy within a few microseconds.

Adjust the "time2" value for the GPS source of your ntp.conf by adding the average offset from above.

sudo systemctl stop ntp.service

sudo rm –f /var/log/ntpstats/*

sudo systemctl start ntp.service

Repeat the procedure above until -5ms < offset < +5ms (or under 10ms, if that’s OK with you). When you’re done, remove the noselect from the server 127.127.20.0 line in ntp.conf.

If you decide to recalculate the average offset using the above procedures, wait at least another day or two.

Avoid unnecessarily changing the time2 value. A typical value for the Adafruit GPSD driver is +0.534 s when using its default 4800 baud interface and in the HAB Supplies/Uputronics GPSD it’s +0.043 s when using 115200 baud and the gpsctl --config_for_timing tweaks.

To save wear and tear on the SD card, comment out the statistics line in /etc/ntp.conf when done.

Automatically updating GPS leap seconds

These semi-annual changes will be made no later than 1 June and 1 December of each year to indicate what action (if any) is to be taken on 30th June and 31st December, respectively.

In Buster, the default ntp.conf points LeapFile at the tzdata-provided leap-seconds.list. That's fine, as long as you auto apply system updates via cron or manually on a regular basis.

Automatically updating Raspbian

To automatically update Raspbian on a schedule, follow these steps:

sudo apt install unattended-upgrades mailutils

sudo nano /etc/apt/apt.conf.d/50unattended-upgrades

Edit to include these sources (remove all others)

    
"origin=Raspbian,codename=${distro_codename},label=Raspbian";
"origin=Raspberry Pi Foundation,codename=${distro_codename},label=Raspberry Pi Foundation";
and add
Unattended-Upgrade::Automatic-Reboot "true";
sudo nano /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::Verbose "1";
APT::Periodic::AutocleanInterval "7";


sudo dpkg-reconfigure --priority=low unattended-upgrades

Set the systemctl apt timer to fire at 2AM daily

sudo systemctl edit apt-daily-upgrade.timer
[Timer]
OnCalendar=
# OnCalendar=DayOfWeek Year-Month-Day Hour:Minute:Second
# see https://wiki.archlinux.org/index.php/Systemd/Timers
OnCalendar=02:00
RandomizedDelaySec=0

To test,

sudo unattended-upgrade -d -v --dry-run

Static IP and Hostname

If you want to fix your LAN IP you do it by amending /etc/dhcpcd.conf adding the following lines (adjust to suit your environment):

# It is possible to fall back to a static IP if DHCP fails:
# define static profile
profile static eth0
static ip_address=192.168.1.7/24
static routers=192.168.1.254
static domain_name_servers=8.8.8.8 8.8.4.4
# fallback to static profile on eth0
interface eth0
fallback static_eth0

    

This way DHCP will work if plugged into a client switch port, but static IP will default when no DHCP is available.

Great for testing the box on one’s desktop before going live.

Amend your hostname by editing /etc/hostname and then adding the below to /etc/hosts.

i.e if you call your machine ‘ntp’ fix the /etc/hosts 127.0.1.1 line:

127.0.1.1 ntp
  

Further Reading

The original text this document was based on can be found at https://ava.upuaut.net/?p=951

Updated to include information from comments at https://blog.webernetz.net/ntp-server-via-gps-on-a-raspberry-pi/

David Taylor’s website https://satsignal.eu/ntp/Raspberry-Pi-NTP.html goes into much further detail about the process above and covers graphing, remote access monitoring.

Rob Robinette has a good write up on Pi-based NTP servers

https://robrobinette.com/pi_GPS_PPS_Time_Server.htm

Whitham D. Reeve’s GpsNtp-Pi Installation and Operation Guide

http://www.reeve.com/RadioScience/Raspberry%20Pi/GpsNtp-Pi.htm

Rich Laager’s Raspberry Pi 3 Stratum 1 NTP Server has the best stuff on offsets

https://coderich.net/2016/11/21/raspberry-pi-3-stratum-1-ntp-server/

Tech Solvency’s similar setup

https://www.techsolvency.com/ntp/systems/tackleberry-uputronics/

Gary Miller’s GPSD Time Service HOWTO

http://www.catb.org/gpsd/gpsd-time-service-howto.html

Jack Zimmerman’s Raspberry PI NTP Server LCD Display

https://github.com/jacken/Raspberry-Pi-ntp-server-LCD-display

Jack Zimmerman’s Pi U-Blox Stationary Mode

https://github.com/jacken/Raspberry-Pi-U-Blox-Stationary-Mode

Tom Dilatush’s gpsctl

http://www.jamulblog.com/2017/11/paradise-ponders-gpsctl-functionally.html

My improved gpsctl with added .conf file support GitHub repo

https://github.com/philrandal/gpsctl

u-blox M8 Receiver Description (the bible)

https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_%28UBX-13003221%29_Public.pdf

GNSS Firmware 3.01 for u-blox M8

https://www.u-blox.com/sites/default/files/GNSS-FW3.01_ReleaseNotes_%28UBX-16000319%29_Public.pdf

Postscript, July 15th, 2019

You can enable either BeiDou satellites, or Glonass, but not both with gpsctl. You'll get an error if you try. Section 4.2 of the u-blox m8 receiver description manual has a hint as to why.

From version 1.5 of gpsctl, enabling either BeiDou or Galileo automatically enables NMEA 4.1.

Postscript, January 30th, 2020

Updated for the Raspberry Pi 4, used cpufrequtils to set processor speed, added nohz=off, fixed numerous typos, added automatic updating howto, and added a note about making sure that ntpd is not running when you run gpsctl.

Postscript, June 21st, 2020

Released gpsctl v1.9. Tried to fix an issue with gpsctl not initialising the u-blox module via systemd at startup. Still not convinced that it is right.

Added configure-rv3028.sh script to properly configure the rv3028 RTC chip on the new (June 2020) Uputronics boards.

Without this step I got lots of

    kernel: [  156.227408] rtc-rv3028 1-0052: Voltage low, data is invalid.

messages in /var/log/messages on boot.

To configure Raspberry Pi OS to use the rv3028,

sudo nano /boot/config.txt

Add the line

dtoverlay=i2c-rtc,rv3028

at the end.

Remove the fake hardware clock:

sudo apt-get -y remove fake-hwclock
sudo update-rc.d -f fake-hwclock remove
sudo systemctl disable fake-hwclock

Run sudo nano /lib/udev/hwclock-set and comment out these three lines:

#if [ -e /run/systemd/system ] ; then
# exit 0
#fi

Also comment out the two lines

/sbin/hwclock --rtc=$dev --systz --badyear

and

/sbin/hwclock --rtc=$dev --systz

Configure the rv3028 by running

sudo configure-rv3028.sh

from the gpsctl archive.

Then,

sudo hwclock -w -v

to set the clock for the first time.

To debug, sudo systemctl disable ntp, then sudo reboot

After the reboot, sudo hwclock -r -v should show the retained time.

Then sudo systemctl enable ntp --now to enable and start ntp.

Postscript, October 17th, 2020

Due to the repeated unreliabilty of gpsctl's automatic baud rate synchronisation, I've reverted to using 9600 baud and not messing with the baud rate at all. The latest version in the github master branch allows you to specify the gpsctl parameters for ublox-init.service in /etc/default/gpsctl. e.g.

PARAMS=-q --configure-for-timing

If /etc/default/gpsctl doesn't exist, the behaviour will be as before.

Any edits of ublox-init.service or /etc/default/gpsctl require you to run sudo systemctl daemon-reload



Posted by Phil at 7:12 PM
Edited on: Wednesday, October 13, 2021 7:21 PM
Categories: IT, Raspberry Pi, Software

Monday, April 29, 2013

Migrating Nagios Configuration from Nagmin to Check_MK's WATO


When I first set up a Nagios server, many years ago, in the days of Nagios 1.x, the best configuration tool I could find was Fred Reimers' Nagmin. That has since turned into abandonware, but there is a fork, NagminV, under development.

I'd patched Nagmin to support Nagios 2.x and 3.x, and added a few fields to its database, but it was still buggy and quirky.

So, after I'd installed Mathias Kettner's Check_MK for its livestatus broker module for use with PNP4Nagios, I started investigating Check_MK's broader features.

What soon caught my eye was Check_MK's use of rulesets based on host tags. The temptation of editing text files (python scripts in disguise) was too great for me, so I started converting my Nagios service checks into Check_MK format.

These are very rough notes, proceed with caution. Back up everything first!!!

First thing to do was get Check_MK's agent on all our hosts.

Then, to list all of them in /etc/check_mk/main.mk:

# don't generate host config yet
# comment this out when Nagmin is decommissioned
generate_hostconf = False
all_hosts = [
host1,
host2,
]

Then I added the obvious tags, win, linux, etc, and created config files for legacy checks in /etc/check_mk/conf.d, until eventually all service checks were defined in Check_MK and not in Nagmin.

After each check was migrated to Check_MK, I'd run

check_mk -U
nagios -v /etc/nagios/nagios.cfg

The first command generates a new Nagios config in /etc/nagios/check_mk.d/check_mk_objects.cfg

The second validates the resulting config. Here I'd find duplicate definitions, reminding me to delete them from Nagmin.

So, after a while plodding away at this - in my case this meant over a year of coexistence - all that was left in Nagmin was hosts, host groups, contacts, contact groups, and timeperiods.

Time to abandon Nagmin and get serious.

Do not, under any circumstances, try to generate a Nagios config using Nagmin again. It will always fail!

Comment out the Command.cfg include in nagios.cfg

#cfg_file=/etc/nagios/Command.cfg

Try validating the nagios config again, with nagios -v. If it fails, you've forgotten to define some commands as legacy checks

Rinse, repeat until you've got it right.

Check the contents of /etc/nagios/Services.cfg - if it contains any service definitions, you've forgotten something.

Do the same with all the service-related .cfg files; if things fail, change your config to use Check_MK's default service templates.

Get your custom Time Periods into WATO - check_mk has a default, hidden timeperiod 24X7 (Nagmin's is 24x7, case matters), and comment out TimePeriods.cfg in nagios.cfg. Revalidate.

And so on till you're left only with hosts, host groups, contacts, and contact groups from the original Nagmin.

So now was the time to take the leap and use Check_MK's WATO to configure nagios.

I created contacts and contact groups manually in WATO. There's no conflict with your existing Nagios config until you associate a contact group with hosts and/or services in WATO. That was the last thing I did.

Now comes the fun bit.

We need to import our hosts into WATO.

Run the attached hosts.py which will generate a list of hosts in the format hostname;alias;parents;ipaddress

python hosts.py >wato.csv

Our import script requires a file containing

wato folder;hostname;alias;parents;ipaddress;tags

where tags is a list of check_mk tags separated by |

So, we'll have to manually edit it. I didn't use WATO folders so just preceded each line with a ;

Tags are the fun bits. By now you may have been using them in your check_mk config. WATO comes with some predefined ones, which we need to list in our import file (wato.csv).

You'll need one each of the 'agent', 'criticality', and 'networking' tags. As well as your custom tags, which should also be entered into WATO.

Download and edit my_wato_import.py adding your tag definitions into tagz, run:

python my_wato_import.py wato.csv

and watch the output scroll by. If there are any errors reported, then you've forgotten to add a tag definition to tagz, or misspelt a tag in wato.csv. Rinse and repeat until all's well.

Set testing = False near the top of the script, and run again.

Comment out generate_hostconf = False in your main.mk, and check_mk -U

Comment out Host.cfg in /etc/nagios/nagios.cfg

Validate your nagios configuration. It should be OK.

A look in WATO will show all your hosts, with appropriate tags.

Hidden away in /usr/share/doc/check_mk/treasures is a script wato_host_svc_groups.py

I ran this against my original Nagmin Hosts.cfg file, which produced output which was easily massaged into the form needed for WATO. I could have amended the script to produce output of the following form, but a few regular-expression search and replaces got me there quickly enough.

host_groups = [
( 'group1', []. ['server1', 'server2']),
( 'group2', []. ['server3', 'server3']),
]

Place that code into /etc/check_mk/conf.d/wato/rules.mk and create the host groups (with descriptions) in WATO. Before applying changes in WATO, run check_mk -U

Comment out the Hostgroup.cfg include in nagios.cfg, and validate config once more.

#cfg_file=/etc/nagios/HostGroup.cfg

Now, if you've done everything properly, the Nagios config validation will succeed, and on a restart of Nagios your host groups will be there as before.

That just leaves Contacts, Contact Groups and notifications.

I'll leave that as an exercise for the reader. Hint: don't try any of the above until you've figured out how to apply contact groups to hosts and services.

And you'll also need to adjust host and service check intervals and retries in WATO too, otherwise everything gets polled every minute, which probably isn't what you want.



Posted by Phil at 9:59 PM
Edited on: Wednesday, April 01, 2015 9:24 PM
Categories: IT, Software

Friday, May 18, 2012

Disabling Forefront for Exchange 2010 when Installing Exchange Service Packs and Hotfix Rollups


Belatedly installing Microsoft Exchange 2010 Service Pack 2 Hotfix Rollup 2 this week, I once again was niggled by the need to manually disable Forefront for Exchange first.

Unknown to me, the Microsoft Exchange designers included the right hooks into the product to make this easy.

A quick web search led to an Exchange Team blog post from back in June 2010, entitled "Sample script to disable and enable Forefront service during patching".

Unfortunately, their sample script leaves a lot to be desired, and isn't general enough to be useful everywhere.

So I've tweaked it into a more sensible form, which you can download here.

It should be placed in <Exchange installation folder>\Scripts\Customization

On my Exchange 2010 systems, that's C:\Program Files\Microsoft\Exchange Server\V14\Scripts\Customization

Create the Customization directory if it does not already exist.

Fixes:

1: If Forefront for Exchange isn't installed, do nothing.

2: If one of the listed services is not present on the box the script is run on, treat it as successfully started/stopped and continue.

3: Wait until the service is successfully stopped/started, or for 3 minutes, whichever happens first (easily modified to suit your environment and experiences). Based on a code snippet by andreister on StackOverflow.

The end result is that the same copy of the script can be deployed to all Exchange 2010 servers in your organisation, and it just does the right thing.

Enjoy.

Postscript, May 30th

There's a bug in the original which rendered the script ineffective. The path to the FSCController executable, retrieved in line 62 of the script, is enclosed in double quotes. These need to be stripped off for the script to do the right thing.

Adding

    $imagePath = $imagePath -replace '"(.*)"', '$1'

after line 65 strips the offending " characters. Fixed, properly tested in an Exchange 2010 SP2 RU3 install.



Posted by Phil at 11:20 PM
Edited on: Wednesday, April 01, 2015 10:52 PM
Categories: IT

Monday, May 14, 2012

[Updated] Compiling an RPMForge-compatible Nagios 3.5.0 RPM


At some point I had the brilliant idea of replacing our hand-compiled build of Nagios with RPMForge's RPM version.

All was well with that, but RPMForge is still stuck at version 3.2.3, and Nagios 3.4.1 has just been released.

There's one patch from Icinga which we really want, by Icinga core developer Michael Friedrich (@dnsmichi), which fixes perfdata issues (a regression in nagios 3.3.1 which is still not fixed in 3.4.1):

re-allow perfdata with empty results being put on perfdata channel, disable via opt-in cfg option

Updated, Sept 5th, 2012:

Nagios users have reported memory leaks if embedded perl is compiled in, even when not used, so I've removed it from the configuration in the nagios.spec file.

In this build, I also re-implement execv (bug 346) using my fixed regexp. Double quotes and escape characters are no longer an issue. This is based on the changes to checks.c in Icinga to implement execvp.

I've also fixed a problem with pagination in pages generated by a hostname search in the Nagios 3.4.1 status.cgi.

Fixed too is a sorting issue on paginated pages (bug 381).

Also included are cvelasco's patches for a problem with scheduled downtimes (bug 338) and some memory leaks in 3.4.1 (bug 339).

So, to build a Nagios 3.4.1 RPMForge-layout compatible RPM on CentOS 5.x, I did the following:

1: Download nagios-3.2.3-3.rf.src.rpm and install (you'll need the --nomd5 switch in rpm).

2: Download nagios-3.4.1.tar.gz into /usr/src/redhat/SOURCES

3: Download my perfdata.patch into /usr/src/redhat/SOURCES

4: Download my execv-v2.patch into /usr/src/redhat/SOURCES

5: Download my status.patch into /usr/src/redhat/SOURCES (this will be in Nagios 3.4.2)

6: Download my status-paginate.patch into /usr/src/redhat/SOURCES

7: Download the downfix-6.patch into /usr/src/redhat/SOURCES

8: Download the leaks1-2.patch into /usr/src/redhat/SOURCES

9: Download my nagios.spec into /usr/src/redhat/SPECS

10: rpmbuild -bb /usr/src/redhat/SPECS/nagios.spec

The built RPMs will be left in /usr/src/redhat/RPMS/i386.

They'll happily install over the old RPMForge Nagios 3.2.3 RPMS.

Enjoy.

Postscript, July 18th

The perfdata patch has been checked in to Nagios by the developers. Lets hope the execv one follows.

Postscript, November 13th

The forthcoming Nagios 3.4.3 includes all but the execv patch, so a slight mod to the spec file to change the version to 3.4.3 and include only that patch (or not, as you choose) is all that's needed to get 3.4.3 built in RPMForge-compatible style.

Postscript, May 8th, 2013

Now updated for Nagios 3.5.0. I've left the above instructions intact for historical reasons.

Apart from my execv patch, there are two additional patches, from the Open Monitoring Distribution team. Were I to build a Nagios server today, I'd use OMD.

The build instructions now are:

1: Download nagios-3.2.3-3.rf.src.rpm and install (you'll need the --nomd5 switch in rpm).

2: Download nagios-3.5.0.tar.gz into /usr/src/redhat/SOURCES

3: Download my execv-v2.patch into /usr/src/redhat/SOURCES

4: Download 0006-fix_f5_reload_bug.dif into /usr/src/redhat/SOURCES

This fixes an annoying screen refresh bug in the Nagios web interface.

5: Download 0007-fix_downtime_struct.dif into /usr/src/redhat/SOURCES

This reverts a Nagios API change which was incompatible with check_mk, which manifested itself as crashes at midnight during Nagios log rotation, and maybe at other times too.

6: Download my nagios.spec into /usr/src/redhat/SPECS

7: rpmbuild -bb /usr/src/redhat/SPECS/nagios.spec

The built RPMs will be left in /usr/src/redhat/RPMS/i386.

Enjoy.



Posted by Phil at 5:24 PM
Edited on: Wednesday, April 01, 2015 10:52 PM
Categories: IT

Saturday, March 10, 2012

One vCheck Plugin to Rule Them All


Alan Renouf has recently updated his fabulous vCheck Powershell script to support a plugin architecture, with one (sometimes more) checks per plugin. You can disable the plugins by renaming them manually, but that quickly becomes a hassle.

Here's my solution, Select-Plugins.ps1, a GUI picklist from which enabling/disabling plugins is no longer such a chore.

It requires vCheck 6.10 or later.

It can be copied into your vCheck directory and invoked from there, or copied in to your Plugins directory and renamed to be the last plugin to run. vCheck has already loaded its list of plugins before any are run, so using it as the first plugin would not have the results you expect.

To go with it, there's a "Report on Plugins" Plugin too, which I've sent to Alan. It's not of much interest unless you're disabling plugins with Select-Plugins.ps1, or want a list of the plugins used in each run of vCheck.

Enjoy!

Postscript:

Updated to use a conditional expression to create 'new' filename. Much more elegant.

Postscript 2, March 19, 2012:

Updated to detect the situation where both pluginname.ps1 and pluginname.ps1.disabled exist, and to warn user without deleting anything.


# Select-Plugins.ps1

# selectively enable / disable vCheck Plugins

# presents a list of plugins whose names match *.ps1 or *.ps1.disabled
# 
# disabled plugins will be renamed as appropriate to <pluginname>.ps1.disabled
# enabled plugins will be renamed as appropriate to <pluginname>.ps1

# To use, run from the vCheck directory
#     or, if you wish to be perverse, copy to the plugins directory and rename to 
#         "ZZ Select Plugins for Next Run.ps1" and run vCheck as normal.

# Great for testing plugins.  When done, untick it...

# If run as a plugin, it will affect the next vCheck run, not the current one,
#   as vCheck has already collected its list of plugins when it is invoked
#   so make it the very last plugin executed to avoid counter-intuitive behaviour

# based on code from Select-GraphicalFilteredObject.ps1 in
#  "Windows Powershell Cookbook" by Lee Holmes.
#  Copyright 2007 Lee Holmes.
#  Published by O'Reilly ISBN 978-0-596-528492
# and used under the 'free use' provisions specified on Preface page xxv

$Title = "Plugin Selection Plugin"
$Author = "Phil Randal"
$PluginVersion = 2.0
$Header =  "Plugin Selection"
$Comments = "Plugin Selection"
$Display = "None"
# Start of Settings # End of Settings
$PluginPath = (Split-Path ((Get-Variable MyInvocation).Value).MyCommand.Path) If ($PluginPath -notmatch 'plugins$') { $PluginPath += "\Plugins" } $plugins=get-childitem -Path $PluginPath | where {$_.name -match '.*\.ps1(?:\.disabled|)$'} | Sort Name | Select Name, @{Label="Plugin";expression={$_.Name -replace '(.*)\.ps1(?:\.disabled|)$', '$1'}}, @{Label="Enabled";expression={$_.Name -notmatch '.*\.disabled$'}} ## Load the Windows Forms assembly [void] [Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms") ## Create the main form $form = New-Object Windows.Forms.Form $form.Size = New-Object Drawing.Size @(600,600) ## Create the listbox to hold the items from the pipeline $listbox = New-Object Windows.Forms.CheckedListBox $listbox.CheckOnClick = $true $listbox.Dock = "Fill" $form.Text = "Select the plugins you wish to enable" # create list box items from plugin list, tick as enabled where appropriate ForEach ($plugin in $Plugins) { $i=$listBox.Items.Add($plugin.Plugin) $listbox.SetItemChecked($i, $Plugin.Enabled) } ## Create the button panel to hold the OK and Cancel buttons $buttonPanel = New-Object Windows.Forms.Panel $buttonPanel.Size = New-Object Drawing.Size @(600,30) $buttonPanel.Dock = "Bottom" ## Create the Cancel button, which will anchor to the bottom right $cancelButton = New-Object Windows.Forms.Button $cancelButton.Text = "Cancel" $cancelButton.DialogResult = "Cancel" $cancelButton.Top = $buttonPanel.Height - $cancelButton.Height - 5 $cancelButton.Left = $buttonPanel.Width - $cancelButton.Width - 10 $cancelButton.Anchor = "Right" ## Create the OK button, which will anchor to the left of Cancel $okButton = New-Object Windows.Forms.Button $okButton.Text = "Ok" $okButton.DialogResult = "Ok" $okButton.Top = $cancelButton.Top $okButton.Left = $cancelButton.Left - $okButton.Width - 5 $okButton.Anchor = "Right" ## Add the buttons to the button panel $buttonPanel.Controls.Add($okButton) $buttonPanel.Controls.Add($cancelButton) ## Add the button panel and list box to the form, and also set ## the actions for the buttons $form.Controls.Add($listBox) $form.Controls.Add($buttonPanel) $form.AcceptButton = $okButton $form.CancelButton = $cancelButton $form.Add_Shown( { $form.Activate() } ) ## Show the form, and wait for the response $result = $form.ShowDialog() ## If they pressed OK (or Enter,) ## enumerate list of plugins and rename those whose status has changed if($result -eq "OK") { $i = 0 ForEach ($plugin in $plugins) { $oldname = $plugin.Name $newname = $plugin.Plugin + $(If ($listbox.GetItemChecked($i)) {'.ps1'} else {'.ps1.disabled'})
If ($newname -ne $oldname) { If (Test-Path ($PluginPath + "\" + $newname)) { Write-Host "Attempting to rename ""$oldname"" to ""$newname"", which already exists - please delete or rename the superfluous file and try again" } Else { Rename-Item ($PluginPath + "\" + $oldname) $newname } } $i++ } }



Posted by Phil at 4:32 PM
Edited on: Wednesday, April 01, 2015 10:51 PM
Categories: IT

Monday, April 25, 2011

Enhanced check_esxi_hardware.py for Nagios and pnp4nagios


Having spent a bit of time implementing Trond Hasle Amunsen's wonderful check_openmanage plugin for Nagios to monitor the Dell Windows and Linux servers at work, I came to wondering if the same was possible for our VMware ESXi boxes. I was monitoring them with the check_esxi_hardware.py plugin, maintained by Claudio Kuenzler. That, unfortunately, didn't collect performance data and lacked the clever html links to Dell documentation found in check_openmanage.

So, I got to work, emulating some of check_openmanage's features.

The features I collect performance data for are those found on our ESXi boxes, Dell M600, R815, and R905 models.

M600

Power consumption

System board ambient temperature

R815

Power consumption

System board fan speeds

System board ambient temperature

System Internal Expansion Board 1 IO1 Planar Temp

System Internal Expansion Board 1 IO2 Planar Temp

Power supply voltages and currents

R905

Power consumption

System board fan speeds

System board ambient temperature

Power supply voltages and currents

I've also created a check_esxi_hardware.php template for pnp4nagios.

They're here in human-readable form:

check_esxi_hardware.py.html

check_esxi_hardware.php.html

Or download check_esxi_hardware.zip

check_esxi_hardware.py (not formatted as html)

check_esxi_hardware.php (not formatted as html)

Update, April 28th:

Now includes:

Indentation of the verbose output

Support for the HP Proliant BL460c, and, drum roll....

Proper parameter handling, which gracefully fails back to the original commandline format:

  usage: check_esxi_hardware.py https://hostname user password system [verbose]
  example: check_esxi_hardware.py https://my-shiny-new-vmware-server root fakepassword dell
or, using new style options:
  usage: check_esxi_hardware.py -H hostname -U username -P password [-V system -v -p -I XX]
  example: check_esxi_hardware.py -H my-shiny-new-vmware-server -U root -P fakepassword -V auto -I uk
or, verbosely:
  usage: check_esxi_hardware.py --host=hostname --user=username --pass=password [--vendor=system --verbose --perfdata --html=XX]

The hardware vendor string defaults to unknown, which is treated the same as ibm. intel has a slight quirk with BIOS identification. dell is similar to the previous cases, but also allows html links to product documentation and warranty information. hp have their own CIM return values to handle, so they are a special case. But the best of all is auto, which determines the vendor (if it can), from the Manufacturer information from CIM.

That's it for now, I consider it stable enough for production.

One improvement would be better handling of CIM numeric sensor names we haven't encountered yet. That should be possible with a bit of thoughtful regular expression wizardry, but I'm going to pass on that for the forseeable future.

Update, April 29th:

Rewritten perfdata code should now do something sensible on any vendor's hardware.

By peeking at the CIM UnitType attribute, I now correctly handle HP's Virtual Fan (or anyone else's) speed as a percentage, and can distinguish between power consumption (Watts) and current (Amps) automatically.

Mopping up of any quirky sensor name formatting can be done in check_esxi_hardware.php

Update, May 3rd:

Minor bug fixes, code reorganisation, and sorted performance data.

Performance data is now sorted by sensor name within sensor categories in the following order: Power, Voltage, Current, Temperature, Fan Speed, and (Virtual) Fan percentage.

A major side effect of these changes is that the sensor data previously created by check_esxi_hardware.py in /usr/local/pnp4nagios/var/perfdata is not compatible with my new code, and will have to be erased.

Update, May 4th:

More fixes:

Minor code changes and documentation improvements

Remove redundant mismatched ' character in performance data output

Output non-integral values for all sensors to fix problem seen with system board voltage sensors on an IBM server (thanks to Attilio Drei for the sample output)

Update, May 5th:

Added --no-power, --no-volts, --no-current, --no-temp, and --no-fan options to suppress performance data output by category

A few minor optimisations

Update, May 6th:

Added -t / --timeout parameter, ensuring it doesn't run on Windows (it works in Cygwin, though)

Made the new file:passwordfile option work for old-style command lines too

Update, May 7th:

On error, include the numeric sensor value in output

Example from this morning, aircon fail in one of our datacentres:

 

Things got rather hot, and the system fans all went into overdrive:

 

Power consumption on the few boxes and blade chassis' I looked at increased by 20 to 25 percent above normal.

Update, April 2nd, 2012

I've updated check_esxi_hardware.py to fix Dell warranty links (when you click on the displayed Tag No) to point to the new Dell Support site.



Posted by Phil at 10:51 PM
Edited on: Wednesday, April 01, 2015 10:49 PM
Categories: IT, Waffle