New Server Upgrade – A Scalable Solution

For some time now I have wanted to move this website over to a newer server in order to improve overall website load times around the world. First I’ll cover the old server, what I was looking for and how I decided on a provider.

The old server

The old server hosted this website for around 3 years since the beginning, it was the first Linux server that I have ever built and even though it ran cPanel I learned a lot through that process, below are the specs of that server.

  • Location: Sydney Australia
  • OS: CentOS 5.11
  • Web Server: Apache 2.4.12
  • PHP: 5.4.40
  • Database: MySQL 5.6.23
  • CPU Cores: 2 Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
  • Memory: 4GB
  • Disk space: 40gb

I ran a ServerBear benchmark on the server, so you can see CPU/IO/Network speed information here:

Overall it’s not too bad, and it’s definitely more than capable of running the websites that I am hosting on it for the foreseeable future and has plenty of resources available.

So why did I want to change? A few reasons listed below.

  • Location: This is probably the biggest factor, the server was running in Sydney Australia. The majority of views to websites that I host on this server mostly come from the US. To help serve a global audience I use Cloudflare which helps reduce overall loading times considerably. The problem is that anything that isn’t cached must still go to the server in Sydney. Additionally, the web host recently implemented DDoS protection which basically sends all international traffic (any traffic that originates outside of Australia) through the US for filtering. This added considerable latency for international users. For example, if someone in Japan were to view this page they would have gone through the Cloudflare POP in Tokyo, over through the US, then over to Australia. It’s not going to be particularly useful anyway as Cloudflare already provide DDoS filtering.
  • Outdated OS: CentOS 5 was released in April 2007 with full updates stopping in Q1 2014, while maintenance updates would be available until March 2017.
  • Low Bandwidth: Bandwidth in Australia is quite expensive compared to other areas of the world, as a result we tend to get lower data allowances and lower overall speeds. As you can see in the previous benchmark link most of the network tests are sitting around the 1MB/s mark which isn’t very great. Cloudflare does help up make for this as it saves overall bandwidth usage on this website by around 75%, however there’s still that problem of 25% of requests needing to come back to Australia.
  • Older Hardware: This is perhaps not such a big issue as it did not negatively impact performance, however while the server has been running on the same hardware for almost 4 years it has not changed at all during that time. This becomes clear when looking at the CPU in /proc/cpuinfo which stats that it is a E5540, released Q1 2009.
  • Inefficient Stack: Ever since my web server benchmark in 2012 I’ve known that Apache is quite resource intensive and does not scale as well as other options, when I upgraded from 2.2 to 2.4 there were some decent improvements however I wanted to explore better options.

The new server

With the problems that I wanted to address defined as listed above, I was able to come up with a solution to each as listed below.

  • Closer Location: The physical location of the server would be better placed near the majority of my audience, I decided that it would be located in the US, East coast. The East coast is well connected internationally and is much better for Europe visitors compared to getting to Australia. I also wanted to be located close to Cloudflare POPs, as there are a few locations on the US East coast that would work well.
  • Newer OS: I wanted to take this opportunity to upgrade to CentOS 7 which was released July 2014, will receive full updates until Q4 2020 and maintenance updates until June 2024. I was considering using Debian 8 which was recently released, however the Debian life cycle seems to move a lot faster so upgrading would be more frequent. I decided on staying with CentOS primarily as I’m more familiar with it.
  • Better Bandwidth: Anything I was looking at getting in the US would at least have 100Mbit/s, a massive improvement on what I was previously running. As long as I was receiving 10MB/s+ to most places around the world then that would be more than enough.
  • Newer Hardware: Anything from the last couple of years would be fine, this wasn’t too important however I thought it would be quite easy to get something significantly newer than what I was running on, and it was. Anything around the same level of resources should be plenty, I decided on 2 CPU cores 4GB of memory and 40GB of disk space.
  • Higher Performing Stack: I decided to ditch Apache 2.4 and go with Nginx 1.8.0, the latest stable version released April 2015. I also upgraded to the latest version of PHP 5.6 which is supported through to August 2017. MySQL 5.6.23 which performs well and is supported through to February 2018 was replaced with MariaDB 10.0.19, supported until March 2019.

Finding a provider

Now that I knew what I wanted I just needed to find a reliable hosting provider. I was primarily deciding between Vultr and Amazon Web Services (AWS). I have previously used AWS and knew that it would be more than capable, however I had been hearing some great things about Vultr so wanted to test the two against each other. I’m certain there are plenty of other great providers available, however I didn’t want to sign up for accounts to test and compare with too many as this would take a long time.

I created an test instance at Vultr which came with 2 vCPU, 2GB memory, 40GB SSD disk space and 3TB of included bandwidth. I could have got one with 4GB of memory however it was more than double the price so I figured it wasn’t worth it, as I’ve never seen my current server use more than 2GB anyway it wasn’t that much of a big deal. The price of this was $15 USD per month, or $0.022 per hour. This was provisioned in the New Jersey location.

I also created a T2.Medium test instance at AWS which came with 2 vCPU, 4GB memory and a network speed of “Low to moderate”. With AWS bandwidth usage is charged at a per GB rate each month as well as SSD disk space based on how much I use. The price of this was $0.052 per hour (approximately $39 USD per month), however if you pay upfront for 1 year it becomes $0.0345 per hour (approximately $25 USD per month), or for a 3 year term it’s $0.0231 per hour (approximately $17 USD per month). This was provisioned in the US East (Northern Virginia) Region.

Below are the ServerBear benchmarks of both:
AWS Benchmark:
Vultr Benchmark:

As you can see, the AWS one seemed to perform better with the exception of disk IO. Disk IO was not deemed to be too important as I will be performing plenty of caching which would reduce database queries and file loading significantly. I should also note that during the test of the AWS instance it would have been bursting CPU so the CPU benchmark there wont be consistent. As I am only running a web server the CPU usage is not very constant so the 24 minutes per hour of burstable time that come with the T2.Medium instance was deemed as being acceptable.

I copied the website over to both servers and performed some basic tests of my own using weighttpd, in each instance I modified my hosts file to point to the correct server so that the test hit the intended location. The first test is 1000 requests with 50 being concurrent (at the same time), the second test is 10,000 requests with 100 being concurrent, followed by 10,000 requests with 200 being concurrent.

Vultr Tests:

# weighttp -n 1000 -c 50 -t 2 -k ""

finished in 9 sec, 781 millisec and 992 microsec, 102 req/s, 3239 kbyte/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 32452000 bytes total, 317000 bytes http, 32135000 bytes data

# weighttp -n 10000 -c 100 -t 2 -k ""

finished in 51 sec, 684 millisec and 603 microsec, 193 req/s, 6130 kbyte/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 324480843 bytes total, 3169730 bytes http, 321311113 bytes data

# weighttp -n 10000 -c 200 -t 2 -k ""

finished in 46 sec, 25 millisec and 9 microsec, 217 req/s, 6884 kbyte/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 324475164 bytes total, 3170000 bytes http, 321305164 bytes data

AWS Tests:

# weighttp -n 1000 -c 50 -t 2 -k ""

finished in 8 sec, 481 millisec and 592 microsec, 117 req/s, 3736 kbyte/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 32454626 bytes total, 317000 bytes http, 32137626 bytes data

# weighttp -n 10000 -c 100 -t 2 -k ""

finished in 43 sec, 4 millisec and 118 microsec, 232 req/s, 7368 kbyte/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 324483528 bytes total, 3169720 bytes http, 321313808 bytes data

# weighttp -n 10000 -c 200 -t 2 -k ""

finished in 38 sec, 861 millisec and 234 microsec, 257 req/s, 8119 kbyte/s
requests: 10000 total, 10000 started, 10000 done, 9957 succeeded, 43 failed, 0 errored
status codes: 9957 2xx, 0 3xx, 0 4xx, 43 5xx
traffic: 323107773 bytes total, 3187114 bytes http, 319920659 bytes data

As shown above the AWS instance had the edge over the Vultr one, some requests were starting to return 5XX errors in the final test. Both servers were running CentOS 7, Nginx 1.8.0, PHP 5.6.9, and MariaDB 10.0.19 with default configuration. Performance can easily be improved with some simple modifications (more on that below), however these are the test results as done on stock settings. The tests were also performed over the Internet from my location in Australia rather than locally on the server. Both servers had also started maxing out CPU on php-fpm on the final test so we were starting to hit a bottleneck.

For comparison I ran the same test on the old server as shown below.

Original Server Test:

# weighttp -n 1000 -c 50 -t 2 -k ""

finished in 7 sec, 150 millisec and 773 microsec, 139 req/s, 4431 kbyte/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 32448000 bytes total, 318000 bytes http, 32130000 bytes data

Interesting, the old server beat the Vultr and AWS ones at serving 1000 connections with 50 being concurrent. I increased the concurrency and continued testing. I changed -n to 10000 total requests and the concurrent requests -c to 100 as in the Vultr/AWS tests and the old server started to fail. Apache started going crazy with memory usage and it took much longer for the test to complete.

Original Server Test:

# weighttp -n 10000 -c 100 -t 2 -k ""

finished in 344 sec, 794 millisec and 589 microsec, 29 req/s, 910 kbyte/s

As shown while the older server was performing well at the start it is clearly not scalable.

The result

While both AWS and Vultr instances performed well and significantly better than the original server, in the end I decided to go with AWS being the larger provider with more features making future scaling simple. I would be paying slightly more however for the performance and features I think it’s worth it. Although the Vultr disk IO was a lot higher I found this did not help increase my performance.

Time to improve performance

On the old server I made use of the WordPress plugin WP Super Cache, which basically creates static .html files and serves these out reducing PHP execution, database queries and potentially calls to disk as the OS would cache these frequently accessed static files into memory. The WordPress plugin created both static .html files and .gz gzipped versions of them as well, I modified the Nginx configuration to serve these out first if they existed. These would also be cached into memory by the operating system further reducing disk reads. Currently I’ve got the cache set to automatically generate once per week or if a new post is posted or existing post edited.

After some additional small tweaks to the Nginx configuration I performed my weighttpd tests again. This time I found that the server was using next to no resources and the test reported that the speed was around 11,000 kbyte/s, essentially the 100Mbit/s limit of my home connection. To remove my home connection being the bottleneck I asked one of my friends to perform similar tests from his Las Vegas based server that has a 1Gbit connection, the test below sent 100,000 requests with 1,000 being concurrent and completed with no problem.

# weighttp -n 100000 -c 1000 -t 2 -k ""

finished in 45 sec, 636 millisec and 89 microsec, 2191 req/s, 69287 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3237897375 bytes total, 23397375 bytes http, 3214500000 bytes data

Despite the traffic hitting around 600Mbit/s, Nginx was only using around 13% CPU, memory usage was fine, and the disk was seeing 0 bytes being read through iotop. The static files were being served by Nginx from memory so as thought there would not be a disk IO bottleneck, confirming the higher IO of the Vultr instance was not required. There were also no php-fpm processes to be seen taking up resources in this test, previously php-fpm was the highest CPU user. This is because no PHP needs to execute to serve the pages out, once the pages are cached I can completely stop the php-fpm process and the pages are still served out correctly with no errors, great should php-fpm ever crash!

Conclusion and the future

Based on my tests I believe I made the right decision picking AWS over Vultr, I’m sure there are plenty of other providers I could have tested however these two were at the top of my list and they both performed well. If I needed high IO I likely would have gone with Vultr, however as shown that was not too important for me due to caching. The new server should be able to provide a better experience to a larger audience for many years to come, as the OS and installed packages are much more up to date which helps improve performance and reliability.

As server resource utilization is extremely low despite a high number of requests being served the bottleneck now appears to be the network. Rather than increasing server resources in the future or splitting the database out to a dedicated server, the next move would likely be to a load balanced architecture potentially with two or more servers located around the world. This is another reason I selected MariaDB as MariaDB Galera Cluster offers multi-master database replication, granted I have very low database usage it’s a nice option for future expansion.

This would likely involve creating additional AWS instances in different locations and using an Elastic Load Balancer (ELB) to send the traffic to different servers. If required I could set up multiple ELB’s in different regions and have Cloudflare DNS point to the IP of each ELB, as Cloudflare will by default perform round robin on the requests if I have multiple A records. The problem here however would be that Cloudflare currently do not have a method of health checking an IP address to confirm it’s good to accept traffic, from my understanding it’s a feature request for the future.

I decided to keep using Cloudflare as it’s been working well. I get a high cache hit rate, page load speeds are improved globally and it blocks a lot of known malicious traffic. This is probably the most important part, in April 2015 for example over 27,000 requests were flat out denied to this website as Cloudflare knew the sources were bad. I’m currently on the free plan but have considered upgrading for improved speed and the web application firewall (WAF), however that’s a story for another time.

  1. How To Install and Configure MariaDB | RootUsers - pingback on September 3, 2015 at 12:03 am
  2. Great article! This was a very interesting read. Will help learn many things about servers and how to effectively upgrade old ones. Good choice going with AWS it is one of the best options in the market and is very capable!

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackbacks and Pingbacks: