Today I connected to a CentOS 6 server via SSH and quickly noticed that the file system was in read only mode, after checking a few other Linux servers on the same XenServer host it quickly became apparent that there had been a network issue between the storage and compute layers which caused the Linux file systems to go read only in order to protect themselves.
After not being able to do anything useful within the operating system such as remounting the file system as read/write, I decided that it was time to reboot and force a file system check to pick up and fix any problems, however once the server had shut down it did not power back on as part of the restart task, it also did not power back on when attempting to start it up. This only happened to one VM, all of the others powered back on fine and worked as expected.
Below is the error that came up through XenCenter for the reboot task:
Rebooting VM 'server1' on 'host1' Internal error: xenopsd internal error: VM = 2bd7670d-ed30-931a-c4da-2cb9917e26a0; domid = 50; Bootloader.Bad_error Traceback (most recent call last): File "/usr/bin/pygrub", line 903, in ? fs = fsimage.open(file, part_offs, bootfsoptions) IOError: [Errno 95] Operation not supported
I also tried to start the VM through command line to ensure it wasn’t just related to XenCenter with ‘xe vm-start vm=host1’ however that resulted in the same error as below:
The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem. message: xenopsd internal error: VM = 2bd7670d-ed30-931a-c4da-2cb9917e26a0; domid = 24; Bootloader.Bad_error Traceback (most recent call last): File "/usr/bin/pygrub", line 903, in ? fs = fsimage.open(file, part_offs, bootfsoptions) IOError: [Errno 95] Operation not supported
Initially I was thinking that there was an issue with the VM within Xen having never encountered this previously. The errors seemed to indicate that there was a problem booting the VM, I wanted to file system check the disk. To do this I detached it from the VM, attached it to another Linux VM, then ran ‘fdisk -l’ which confirmed the disk was there as /dev/xvdb. To check it I ran a ‘e2fsck -y /dev/xvdb1’ which found and fixed a lot of various problems.
After this completed I shutdown the VM I had attached the disk to for the purposes of running the file system check, detached the disk, attached the disk back to the original VM, and successfully powered it on. I was not able to detach the disk from the VM while it was running which is why I powered it off in this instance.
Although it did boot, many processes such as Apache did not correctly start primarily due to various directories having been removed from /var, I’m assuming that this was related to what ever corruption took place, however I was able to easily recreate the small handful of directories as they showed in the error logs and get the services back up and running.