Always-on PPPoE but 30 seconds disconnection

So I have a FTTH internet connection and my ISP uses PPPoE – a rather common configuration. But the ISP resets my connection every 12 hours, possibly for accounting purposes.

Now whenever the connection reset was taking place my internet used to vanish for 30 seconds – a very bad situation to be in when you’re in an online meeting, or making a payment. I couldn’t quite figure out why this was happening only on connection resets (terminations initiated by the ISP), when I terminate the connection from my end and reconnect, it was connecting without any lag.

I don’t use a router available off the shelf – it’s a Celeron mini PC with four Gigabit NICs running Ubuntu and I have hand configured all the network related services (firewall, dns, dhcp) on it – including pppd for my PPPoE connection.

Since this is an unlimited connection it doesn’t matter how many hours I am connected or how much data I’ve used, I pay a fixed fee to my ISP. So my pppd was configured with the persist and maxfail 0 options (basically keep trying to connect endlessly if there was any failure). The pppd manual page also lists SIGHUP as a signal which causes it to terminate the current session and restart it if it is configured with the persist option.

Since my issue was only with connection resets I tried to simulate one, using SIGHUP. And there I found the 30 second delay in connecting. After reading the pppd manpage once again I found an option called holdoff which tells pppd to wait for such time before trying to reconnect and the default value of holdoff isn’t documented in example configuration files nor in pppd manpage as of Ubuntu 18.04 which runs on my router. The default value of holdoff is apparently 30 seconds as mentioned in this TLDP guide.

I repeated the same test after setting holdoff 0 and it works fine now – without the 30 second delay.

Linux rename network interface using udev rules

Sometimes it is convenient to have user assigned names to network interfaces – particularly when the MAC address of the interface remains constant but the bus on which it is attached may change because it’s a virtual machine.

All that is required to be done is create a file /etc/udev/rules.d/persistent-net.rules : This file used to exist earlier but has been phased out by most mainstream Linux distributions in favor of the consistent naming scheme or other methods like systemd-networkd.

The udev rules required in that file:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="aa:bb:cc:dd:ee:ff", NAME="wan"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="bb:cc:dd:ee:ff:aa", NAME="lan"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="cc:dd:ee:ff:aa:bb", NAME="lan2"

Save the file and reboot, the network interfaces should be assigned the new names as per the rules in this file.

Hibernate support on Ubuntu 20.04 with encrypted swap and encrypted root filesystem

So I installed Ubuntu 20.04 on my laptop with encrypted root filesystem with bcache support since I have a NVME SSD along with the usual hard disk in it. While setting up encrypted root filesystem the installer will not allow you to have unencrypted swap – and the default encrypted swap setup is to use a random encryption key that is changed at every boot – so that makes hibernation impossible. After a few shenanigans with the rescue system and systemd I managed to hibernate working with a fixed encryption key swap.

First thing is you have to set up fixed key encryption for the swap – so when using the installer to create an encrypted root filesystem – do not create a partition for swap, the installer will automatically create a swap in /swapfile after installation. But when partitioning your disks leave out space equivalent to the size of your RAM in your SSD for creating our fixed key encrypted swap post installation.

After installation, disable existing swap using sudo swapoff -a. Delete the entry of /swapfile in /etc/fstab. Using your favorite partitioning tool – fdisk, gdisk, parted, or gparted, create a partition in your disk to hold the encrypted swap partition. In my case, the partition is /dev/nvme0n1p3. Be careful in this step as you risk screwing up the installation itself.

Now we create a random key for encrypting the swap partition using dd(1) and then use cryptsetup(8) to create a luks header on the disk partition with the key:

dd if=/dev/urandom of=/.swap-key bs=1 count=512
cryptsetup luksFormat -d /.swap-key /dev/nvme0n1p3

It is a good idea to add a passphrase to the swap partition in case it has to be entered on console at any time – this is not exactly required but this helped me when I was playing around with the system – how would I enter a 512 byte random string on the console? 😀

cryptsetup luksAddKey -d /.swap-key /dev/nvme0n1p3

Open the encrypted partition and format it:

cryptsetup open -d /.swap-key /dev/nvme0n1p3 cryptswap
mkswap /dev/mapper/cryptswap

If you note above the mapped name for the encrypted swap is named as cryptswap – this name will be used later – you can change the name if you wish to. Add the /etc/fstab entry so that the swap gets mounted at boot, also enable the swap at present:

echo /dev/mapper/cryptswap swap swap defaults 0 0 >> /etc/fstab
swapon /dev/mapper/cryptswap

Next we have to populate /etc/crypttab so that the initramfs can set up the swap partition and resume from hibernate. But before that, find the name of the device that is mounted as / (root filesystem) from /etc/fstab. In my case that is /dev/mapper/bcache0p1_crypt. Now we can create the crypttab entry:

echo cryptswap UUID=$(blkid -s UUID -o value /dev/nvme0n1p3) /dev/disk/by-uuid/$(blkid -s UUID -o value /dev/mapper/bcache0p1_crypt):/.swap-key luks,discard,keyscript=/lib/cryptsetup/scripts/passdev,noauto >> /etc/crypttab

The first field in the crypttab entry is mapped name of the device, second field is the UUID of the device which has this encrypted filesystem, the third field is path to the key file – here I have specified the path /dev/disk/by-uuid/<UUID of ROOT FILESYSTEM>:/.swap-key. This syntax is documented in the crypttab(5) man page for providing unlock keys – if you used a different path for the key then you have to change the path here as well. Important point to be noted here – we have put the encryption key of the swap inside encrypted root filesystem – so without unlocking the root filesystem it is not possible to access the key to the swap. Fourth field specifies the options for the encrypted device – you can omit discard in it if your swap partition is on a hard disk drive.

The keyscript=/lib/cryptsetup/scripts/passdev script is documented at Debian Cryptsetup docs and /usr/share/doc/cryptsetup-initramfs/README.initramfs.gz. It basically lets us use a key file stored on some device. The noauto option is required to prevent systemd to try to mount the partition as systemd does not seem to support keyscript / fixed keys so when you boot the system normally you will end up without a swap partition because the systemd target fails – but with the noauto option it gets mounted because initramfs unlocks the swap partition and then devices to be mounted are read from fstab.

Edit /etc/default/grub to disable the systemd crypto generator since we do not need it:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash luks=no"

Now update grub and initramfs, then test with a reboot that the swap is getting mounted during normal boot process:

update-grub
update-initramfs -u -k all
reboot

If successful, you can try hibernate – open some text editor window for example and type the following command – as there is no way to get the hibernate button in GNOME menu yet:

systemctl hibernate

Once your machine turns off start it again and you should have the text editor back on the screen in the same state you had shut it down. Hibernate working! You can make this command as a keyboard shortcut in GNOME control center so you can hibernate with a key sequence.

Cloning a UEFI/GPT Windows 10 installation from HDD to SSD using SystemRescueCD

So, I had an interesting problem at hand – to transfer a completely working Windows 10 installation from a 1 TB HDD to a 1 TB NVME SSD (Samsung 970 EVO Plus). As someone who has done something of that sort many times, but with Linux based OSes my first thought was if I could do some sort of stuff with a SystemRescueCD running from a USB pen drive – as I wasn’t willing to use any Windows based backup solution since I had no idea how they work, or whether they work at all. To those who don’t know, SystemRescueCD is a kind of Swiss knife Linux distribution you can use to do all kinds of disk and OS recovery tasks. You usually don’t need to install any extra packages, etc.

If you are following this post, read it completely before attempting anything. This is not a step-by-step guide. Also you may need to adjust the commands to suit your system to select the correct drive and/or partition when copying partitions.

So the first try was using a block to block copy, using ddrescue. But copying 1 TB from a HDD which reads at max 100 MB/s to a SSD which can write at almost 1500 MB/s is surely an extremely slow job. That too when the total data in the HDD is around 200 GB combined – OS (C drive) and Data (D drive). ddrescue was showing estimated time as 17 hours!!!

Then I started looking for NTFS cloning tools – turns out ntfs-3g has a utility called ntfsclone which does exactly that – it copies just the data, not complete blocks. That’s exactly what I needed. But before doing ntfsclone it is necessary to have a duplicate partition table of the HDD on the SSD. Since this is was a GPT/UEFI based installation, I used sgdisk to do this:

sgdisk -R /dev/nvme0n1 /dev/sda

This will copy the partition table from /dev/sda to /dev/nvme0n1 without changing the GPT UUID. It is possible to generate a new GPT UUID using sgdisk but that might break something in the boot process when booting from SSD, so I decided not to that. This also means that the HDD must be disabled from BIOS/UEFI settings before booting from SSD, otherwise the operating system (Windows) and BIOS itself will see two disks with exactly same GPT UUID, possibly leading to errors.

Now that the partition table has been cloned, it’s time to copy the data using ntfsclone and ddrescue. There were 5 partitions on the HDD, as seen in fdisk -l /dev/sda:

  • /dev/sda1 – Recovery
  • /dev/sda2 – EFI System Partition
  • /dev/sda3 – Some other Recovery
  • /dev/sda4 – C drive (NTFS)
  • /dev/sda5 – D drive (NTFS)

I used ddrescue to copy the partitions /dev/sda1 => /dev/nvme0n1p1, /dev/sda2 => /dev/nvme0n1p2, /dev/sda3 => /dev/nvme0n1p3.

ddrescue -v -f /dev/sda1 /dev/nvme0n1p1
ddrescue -v -f /dev/sda2 /dev/nvme0n1p2
ddrescue -v -f /dev/sda3 /dev/nvme0n1p3

Then to copy the NTFS partitions:

ntfsclone -O /dev/nvme0n1p4 /dev/sda4
ntfsclone -O /dev/nvme0n1p5 /dev/sda5

Note: it may be necessary to you turn off fast startup option in Windows 10 power options before rebooting into SystemRescueCD – because ntfs-3g often complains about that, I did that before doing it. I have no idea if this process would have been successful with that option enabled.

Control Panel Power Options

Now reboot to UEFI/BIOS and disable the HDD, then reboot. Now there are two possibilities that will happen: either you will be able to boot into your Windows 10 installation from SSD, or it will give you a BSOD with an error code 0xc00000e. If you get the error code, you need a Windows 10 recovery disk or the installation media to fix this. I did not know this beforehand but I had my Windows machine, using which I prepared a Windows recovery disk.

Boot using the Windows 10 recovery disk or installation media and head to the command prompt. In the command prompt:

DISKPART
LIST DISK
LIST PARTITION

Note the disk which is your SSD – usually it is disk number 0.

SELECT DISK 0
SELECT PARTITION 2
ASSIGN LETTER=M
EXIT

Note – in the above commands, you have to assign the drive letter M to the EFI system partition. In my case it was the second partition, so I did SELECT VOLUME 2. Then format the EFI system partition using FAT32:

FORMAT M: /FS:FAT32

Now run BCDBOOT:

BCDBOOT C:WINDOWS /s M: /f UEFI

That’s it, now reboot the system and you should be able to boot into Windows from the SSD. Optionally, if you are going to retain the HDD (because why not, extra storage is always good), enable to HDD and boot to SystemRescueCD once again and clear the partition tables using sgdisk:

sgdisk -Z /dev/sda

Reboot into Windows 10 then you can partition the HDD using Disk Management to your heart’s content.

How I repaired my dishwasher using fundamental principles for troubleshooting

I have a Samsung DW-FN320T dishwasher, which is about 5+ years old. It was not used very often because of maid at home. But now due to COVID lockdown the dishwasher started getting used heavily – almost twice a day because authorities banned maids to contain the spread of the disease.

One day though, it stopped working – the symptom, it showed heater error which as per the manual means either of temperature sensor, heating coil or some low sensing device is dead. So I called up my usual home repair technician, he came and found some plastic part of a mixer-grinder blade in the drain basin of dishwasher. Meanwhile we also discovered that the drain pipe is broken so he replaced that, but that still didn’t solve heater error.

He was not very positive about this because it seems Samsung has phased out this model in India (and they don’t sell dishwashers anymore) and availability of spare parts is a problem. He left saying he’ll try to procure the heater sensor.

But I wasn’t convinced that there is a heater related problem, because the dishwasher was completing cycles partially – for a cycle of say 2h 30m it would complete about 20-30 minutes then stop with heater error and I also observed it wasn’t pumping water to the spray arms in some cycles occasionally. The problem of pump not working was completely random and there was no pattern irrespective of wash program selected.

In addition to this, before the machine stopped working completely there were days when dishes wouldn’t come clean or the soap won’t dissolve but dishwasher used to show END (wash cycle complete).

So I opened up the dishwasher (a hell of a job that is) and then checked all the wiring for any loose connection etc. The problem here was – motor not running sometimes, so I was suspecting some kind of loose connection or faulty component such as a relay (which drives the motor) on the controller board.

To check what is happening I used a simple multimeter – put it in AC mode and inserted the probes in the connector of the motor. Then I started the dishwasher in the smallest cycle (pre-wash) – when I discovered the motor was getting power but it wasn’t spinning. This was weird, because if it isn’t spinning that means motor is dead, but a dead motor does not spin occasionally!

The shaft the motor was accessible from outside, so to check if the shaft is jammed I tried to turn it manually using a screwdriver (with the dishwasher being off) – it was moving freely without any resistance to motion. Then turned on the dishwasher again and started pre-wash – the motor started.

This made me suspect the capacitor of motor, because in single phase motors a capacitor is used to phase-shift the current in second winding which gives the starting torque. Also whenever the motor was turned on by the controller and it didn’t spin there was a sound coming from it. In one such instance, when power was being applied to motor (confirmed using multimeter), turned the shaft manually and it did start spinning.

The dead capacitor

This confirmed the doubt on capacitor. When capacitor loses it’s capacitance due to age/other factors, it is not able to provide the current required to produce the starting torque to spin the motor. Once a motor starts though it continues to spin due to inertia.

New capacitor

Now the next challenge was to finding a compatible capacitor (3 uF) to replace this – since it has a bolt thread, it was fixed using a nut on the motor itself. Considering the urgency I had planned to wire up two ceiling fan capacitors in parallel (which are typically 2.25 uF) in case I don’t get a new one. Luckily a friend of mine helped in procuring a 3 uF capacitor but it didn’t have the bolt thread.

Mounting

So in order to fit the new capacitor I did a simple jugaad / hack – put a simple nut-bolt on the mounting hole and tied it on the bolt using cable ties.

After fitting the capacitor and making the machine upright, I ran some test cycles in short long ones to check if the problem is fixed – it was fixed. The motor started spinning immediately without getting stuck / jammed at any time.

The lesson from this exercise is that one should always think from basics when approaching a problem to be solved, this applies to any field – whether it is electronics, electrical or computer science.

Also often people say – what is the use of subjects taught in school in real life. AC motor working was a part of 11th grade when I was in school. Yes, the subjects taught in school do have some use in real life. That’s how I was able to fix my dishwasher and saved a lot of money in that process in spite of not being an electrical or electronics engineer.

Fixed my Soundbar’s faulty remote

I have a Phillips sound bar, must be about 6-7 years old. Today while watching TV, I couldn’t increase the sound volume using the remote, and unfortunately using the remote control is the only way to change the Soundbar’s settings.

Update 16 Feb 2021: It stopped working again but this time it was a dry soldered IR LED. Reheated (or in simple speak, resoldered) the contacts and got it working again.

Then I took few drops of isopropyl alcohol on my finger and smeared it on the PCB as well as on the rear side (contact side) of the rubber sheet with buttons. Let it dry for sometime then tried to test if it works, and it did!

E-waste and money both saved, or I’d probably have to buy a new remote which might have been difficult to procure given the age of the Soundbar.

Speeding up PHP CLI applications using OpCache

I run a few PHP based applications which require background jobs, and I use the usual crond via crontab to run the jobs. PHP has an excellent feature – OpCache which can cache the compiled code in memory to speed up the web applications, where typically the PHP-FPM process is a long running process so the compiled code can be fetched on next request. But with CLI applications there is no shared state or memory to store the compiled code.

The Solution

Use the file cache feature of PHP, configurable in php.ini using opcache.file_cache configuration option. Note that OpCache on CLI must be enabled using opcache.enable_cli=1, by default it stays off.

With multiple users running background jobs

The opcache.file_cache path must point to a directory where the user running the PHP script must have write access so that PHP can create the .bin files. Now here’s the catch, what happens if we have multiple users running different PHP scripts? They cannot share a common file cache directory for security reasons or it can cause cache poisoning. PHP seems to generate some hash string based directory structure but since all my PHP applications live in a common directory but are run by different users, it was not working when using a single directory. Consider /home as the common path and /home/user1, /home/user2 as different users.

Solution for multiple users

Specify opcache.file_cache path in the crontab! Here’s how:

*/5 * * * * /usr/bin/php7.3 -d opcache.file_cache=/var/cache/php/user1 -f /home/user1/app/cron.php

Create the /var/cache/php directory with permissions similar to /tmp (sticky bit) and the individual directories as well (they don’t seem to get created automatically):

mkdir /var/cache/php
chmod 1777 /var/cache/php

mkdir /var/cache/php/user1
chown user1:group1 /var/cache/php/user1
chmod 700 /var/cache/php/user1

It is also a good idea to enable the file consistency check option opcache.file_cache_consistency_checks=1, it will prevent errors in case cache gets corrupted for any reason.

Enjoy fast background jobs!

KeepassXC SSH Agent in WSL and OpenSSH for Windows

I use KeepassXC for managing passwords – it is a fantastic FOSS tool for that. If you don’t use a password manager or use a paid one, do give it a try, I used to be a paid LastPass user few years ago till I discovered KeepassXC.

It also has support for storing SSH keys inside the database file and exporting it via SSH-Agent when the database file is unlocked and removing it when it is locked. An excellent feature which eliminates the need to store the ssh keys un-encrypted.

I never got around setting up the SSH agent thing using KeepassXC, because I just used to copy my ~/.ssh folder around – primarily two devices, Windows desktop and Linux laptop. In my .zshrc I use keychain so that the keys are added to my agent whenever I open the terminal – in WSL as well as in Linux.

Now the catch here is that KeepassXC cannot export SSH keys to the SSH Agent running inside WSL, this means either I have to ditch the idea of storing ssh keys inside Keepass or use native Windows OpenSSH. While searching I came across this article A Better Windows 10+WSL SSH Experience in which the author has done agent sharing setup between native Windows OpenSSH and WSL using a named pipe <=> socket proxy (since inside WSL the applications speak Unix stuff and Windows OpenSSH agent listens on named pipe instead of Unix socket, obviously).

In the process I also updated the Windows OpenSSH version because the one installed by default was giving me a warning when SSHing to one of the servers and ssh was switching back to password based authentication:

warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512)

The solution for this problem is basically to update OpenSSH to a newer version. The Windows OpenSSH version in my system was 7.7, updated to 8.1.

Now with the current setup my keys are safely stored inside the KeepassXC database file, and agent forwarding works in both native Windows OpenSSH and WSL! 😀

Solving Windows Store Error 0x80240007

So yet again, I faced an annoying update related issue with Windows but this time it was Windows Store. It was throwing error code 0x80240007 for any app update. Previous encounter with Windows Updates and it’s solution was here.

It’s absolutely disgusting how Microsoft makes it extremely difficult to find out errors in their operating system. If someone from Microsoft is reading this, please give more helpful error messages! Such error codes and their numerous solutions online make it seem like for any error that occurs in Windows the solution is to reset and reinstall, WTF!

I wasn’t able to install any app from the store nor was it updating, and it was throwing the errors. I searched a bit about this, but didn’t get any useful solution. In event logs I had three error codes 0x80240007 0x80070002 0x80073CF0. The screenshots of Windows Event Log:

I had completely given up on this after numerous searches and trying out the solutions mentioned on those pages, to the extent I uninstalled all the store apps – the most significant ones for me being Skype, Whatsapp, Telegram; and actually I discovered this issue when I knew Telegram was updated but I was wondering why it wasn’t updating on my desktop.

Then today when updating my password manager KeepassXC the MSI installer gave me an error code 2203. Initially I thought the two were unrelated issues (store and MSI error) but you see in the event log it has complained about some msixvc. Then I discovered MSIs are installed using a command msiexec, so I checked out it’s available options in which I found a /log option which logs the installation messages / errors etc.

msiexec /log keepassxc.log /i KeepassXC.msi

In the log generated I found an interesting error:

Database: C:\Windows\Installer\inprogressinstallinfo.ipi. Cannot open database file. System error

A little bit of searching and some site mentioned it could be because the temporary files directory is not writable by the installer. But I had solved the temporary folder issue without which I wasn’t able to update at all, this got me thinking. In the environment variables I had set the system temp folder as %windir%\Temp. Just as a fluke I changed it to C:\Windows\Temp and after a reboot tried installing KeepassXC again. It worked! And all the store apps updated as well. Apparently %windir% doesn’t expand to C:\Windows when MSIs are installed?

So the lesson is, don’t tamper with the temp folder on Windows. This is not Linux where you can just do mount -t tmpfs /tmp for performance and to save the SSD unwanted write cycles.

Windows Defender Update Error 0x80070643

So my Windows 10 installation was throwing this update error 0x80070643 since a last few days:

Went through many solutions such as resetting the windows update service, removing downloaded files, etc. but nothing helped to solve the problem. The thing is, Windows 10 seems to hide the actual error codes behind some generic error codes so you actually do not know what is causing the error 😐
Finally I found this helpful blog post A broken Windows Defender update which had some steps how to get detailed log of why it’s failing.

I ran Get-WindowsUpdateLog in PowerShell and went through the update log file where I found the error code why it was actually failing – 0x80092003. Let’s see what this error code means – all the Windows error codes seems have been documented on COM Error Codes (Security and Setup).
So the error code 0x80092003 means CRYPT_E_FILE_ERROR which is basically “An error occurred while reading or writing to a file.”. It doesn’t specify what is the error though!

I made a wild guess, that probably it is not able to write to a temporary file or something. On my system temporary folders were set to be on RAMDISK using IMDisk Toolkit. So I went to the environment variables page and changed the system TMP and TEMP variables (only) to point to the original %windir%\Temp folder and rebooted. Then I tried to run the update again, and it worked! My personal temporary folder continues to be on RAMDISK.