Linux Server Maintenance Checklist

Server maintenance needs to be performed regularly in order to ensure that your server will continue to run with minimal problems, while a lot of maintenance tasks are automated within the Linux operating system now there are still things that need to be checked and monitored regularly to ensure that Linux is running optimally. Below are steps that should be taken in order to maintain your servers.

Updates

New package updates have been installed within the last month.
Keeping your server up to date is one of the most important maintenance tasks that needs to be done. Before applying updates to your server, confirm that you have a recent backup or snapshot if working with a virtual machine so that you have the option of reverting back if the updates cause you unexpected problems. If possible you should aim to test updates on a test server first if you are applying them to a production server, this allows you to first confirm that the updates will not break your server and will be compatible with any other packages or software that you may be running.

You can update all packages currently installed on your server by running ‘yum update’ or ‘apt-get upgrade’, depending on your distribution (throughout the rest of this post commands will be aimed towards Red Hat based operating systems). Ideally this should be done at least once per month so that you have the latest security patches, bug fixes, and improved functionality and performance. You can automate the update by making use of crontab to check for and apply updates whenever you like rather than having to do it manually.

Other applications have been updated in the last month.
Other web applications such as WordPress/Drupal/Joomla need to be frequently updated, as these sort of applications act as a gateway to your server by usually being more accessible than direct server access and by allowing public access in from the Internet. Lots of web applications also may have third party plugins installed which can be coded by anyone which potentially have many security vulnerabilities throughout their unaudited code, so it is critical to update these sorts of applications installed on your server very frequently. These content management systems are not managed by ‘yum’ so will not be updated with a ‘yum update’ like the other packages installed. The updates are usually provided directly through the application itself, if you’re unsure contact the application provider for further assistance.

Reboot the server if a kernel update was installed.
If you ran a ‘yum update’ as previously discussed check to see if the kernel was listed as an update. Alternatively you can explicitly update your kernel with ‘yum update kernel’. The Linux kernel is the core of the Linux operating system and is updated regularly to include security patches, bug fixes and added functionality. Once the kernel has been installed you must reboot your server to complete the process. Before you reboot, run the command ‘uname –r’ which will print the current kernel version that you are booted into. After you reboot and the server is running run the ‘uname –r’ command again and confirm that the newer version that was installed with yum was displayed. If the version number does not change you may need to investigate the kernel that is booted in /boot/grub/grub.conf, yum will update this file by default to boot the updated kernel so you shouldn’t have to change anything normally.

It is possible to avoid rebooting your server by using third party tools such as Ksplice from Oracle or KernelCare from CloudLinux, however by default on a standard operating system the reboot will be required to make use of the newer kernel.

Security

Server access reviewed within the last 6 months.
In order to increase security you should review who has access to your server, in an organization you may have staff who have left but still have accounts with access, these should be removed or disabled. There may be accounts that have sudo access that should not, this should also be reviewed often to avoid a possible security breach as granting root access is very powerful, you can check the /etc/sudoers file to see who has root access and if you need to make changes do so with the ‘visudo’ command. You can view recent logins with the ‘last’ command to see who has been logging into the server.

Firewall rules reviewed in the last 6-12 months.
Firewall rules should also be reviewed from time to time to ensure that you are only allowing required inbound and outbound traffic. Requirements for a server change and as packages are installed and removed the ports that it is listening on may change potentially introducing vulnerabilities so it is important to restrict this traffic correctly, this is typically done in Linux with iptables or perhaps a hardware firewall that sits in front of the server. You can test for ports that are open by using nmap from another server, and view the current rules on the server by running ‘iptables –L –v’.

Confirm that users must change password.
User accounts should be configured to expire after a period of time, common periods are anywhere between 30-90 days. This is important so that the user password is only valid for a set amount of time before the user is forced to change it. This increases security because if an account is compromised it will not always be able to be used as the password will change to something different – access by an attacker will not be maintained through that account.

If your accounts are using an LDAP directory like Active Directory this can be set for the accounts centrally there. Otherwise in Linux you can set this on a per account basis, however this is not as scalable as using a directory because you need to implement the changes on all of your servers individually which will take time. This can be done using the chage command, ‘chage –l username’ will display the current settings on the account, for example:

[root@demo  ~]# chage -l root
Last password change                                    : Apr 07, 2014
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7

All of these parameters can be set for every user on the system.

Monitoring

Monitoring has been checked and confirmed to work correctly.
If your server is used in production you most likely have it monitored for various services, it is important to check and confirm that this monitoring is working as intended and reporting correctly so that you know you will be correctly alerted if there are any issues. It is possible that incorrect firewall rules may disrupt monitoring, or your server may be performing different roles now since the monitoring was initially configured and may now need to be monitored for additional services.

Resource usage has been checked in the last month.
Resource usage is typically checked as a monitoring activity, however it is good practice to observe long term monitoring data in order to get an idea of any resource increases or trends which may indicate that you need to upgrade a component of your server so that it is capable of working under the increased load. This will depend on your monitoring solution, however you should be able to monitor CPU usage, free disk space, free physical memory and other variables for certain thresholds and if these start to trigger more often you will know to investigate further. Typically in Linux you’ll be monitoring with SNMP/NRPE based tools, such as Nagios or Cacti.

Hardware errors have been checked in the last week.
Critical hardware problems will likely show up on your monitoring and be obvious as the server may stop working correctly. You can potentially avoid this scenario by monitoring your system for hardware errors which may give you a heads up that a piece of hardware is having problems and should be replaced in advance before it fails.

You can use mcelog which processes machine checks, namely memory and CPU errors on 64-bit Linux systems, it can be installed with ‘yum install mcelog’ and then started with ‘/etc/init.d/mcelogd start’. By default mcelog will check hourly using crontab and report any problems into /var/log/mcelog so you will want to monitor this file regularly each week or so.

Backups

Backups and restores have been tested and confirmed working.
It is important to backup your servers in case of data loss, it is equally important to actually test that your backups work and that you can successfully complete a restore. Check that your backups are working on a daily or weekly basis, most backup software should be able to notify you if a backup task fails which should be investigated.

It is a good idea to perform a test restore every few months or so to ensure that your backups are working as intended. This may sound time consuming but it’s well worth it, there are countless stories of backups appearing to work until when all the data is lost, only then do people realize that they are not actually able to restore the data from backup.

You can backup locally to the same server which is not recommended, or you can backup to an external location either on your network or out on the Internet, this could be your own server or a cloud storage solution like Amazon’s S3 storage. An external backup is recommended, keep in mind that if you are going to be storing sensitive data at a third party location that you will probably need to investigate encrypting the data so that it is stored safely.

Other general tasks

Unused packages have been removed.
You can save both disk space and reduce your attack surface by removing old and unused packages from your server. Having less packages on your server is a good way to harden and secure it as there is less code available for an attacker to make use of. The command ‘yum list installed’ should display all packages currently installed on your server. ‘yum remove package-name’ will remove the package from your server, just be sure you know what the package is and that you actually want to remove it. Be careful when removing packages with yum, if you remove a package that another package depends on, the dependent package will also be removed which can potentially remove a lot of things at once, after running the command it will confirm the list of packages that will be removed so carefully double check it before proceeding.

File system check performed in the last 180 days.
By default after 180 days or 20 mounts (whichever comes first) your servers will be file system checked with e2fsck on next boot, this should be run occasionally to ensure disk integrity and repair any problems. You can force a disk check by running ‘touch /forcefsck’ and then rebooting the server – the file will be removed on next boot, or with the ‘shutdown –rF now’ command to force a disk check on next boot and perform the reboot now. Aternatively you can use -f instead of –F to skip the disk check, this is known as a fast boot and can also be done with ‘touch /fastboot’. This can be useful for example if you have just performed a kernel update and need to reboot and you want the server back up as soon as possible rather than waiting for the check to complete.

The mount count can be modified using the tune2fs command, the defaults are pretty good however ‘tune2fs –c 50 /dev/sda1’ will increase the mount count to 50 so a file system check will happen after it has been mounted 50 times. On the other hand ‘tune2fs –i 210’ will change the disk so that it is only checked after 210 days rather than 180.

Logs and statistics are being monitored daily or weekly.
If you look through /var/log you will notice that there are a lot of different log files on the server which are continually written to with different information, sometimes useful information but most of the time it is not relevant leading to a large amount of information to go through. Logwatch can be used to monitor your servers logs and email the administrator a summary on a daily or weekly basis – you can control it via crontab. Logwatch can also be used to send a summary of other useful server information such as the disk space in use on all partitions on the server, so it’s a good way to get up to date notifications from your servers. You can install the package with ‘yum install logwatch’.

Regular scans are being run on a weekly/monthly basis.
In order to stay secure it is important to scan your server for malicious content. ClamAV is an open source antivirus engine which detects trojans, malware and viruses and works well with Linux. You can set the cron job to run a weekly scan at 3am for instance and then email you a report outlining the results. Depending on how much content you have the scan may take a while, it’s recommended that you set an intensive scan to run once per week at a low resource usage time such as on the weekend or over night. Check the crontab and /var/log/cron log file to ensure that the scans are running as intended, you can also configure an email summary to be sent to you so also confirm you are receiving these alerts.

  1. Patrick Prucha

    Hey there. Good List. Needed to review to see if i had missed anything in my scripts that take care of maintenance.

    Cheers

    Patrick

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>