Create a Load Balanced Web Service on Rackspace Cloud

Richard Benson30 April 2010IT Pros, Webcomments
When we first wanted to load balance web servers, we initially followed Rackspace Cloud's articles on the subject. They recommended using mod_proxy with Apache.  This took a little while to set up and even with countless amounts of config changes, every now and then requests would get lost and you'd have to refresh your browser to get connected again.  This was not a problem in a development environment, but is unacceptable when we wanted to go live with the service.  So we looked for a different solution and found HAProxy, which was not only easier to set up than mod_proxy_balancer, but is tonnes more reliable and quicker too.

To get started you are going to need one 256MB Ubuntu 9.10 server for your load balancer, and 2 or more other servers to be your web servers behind the load balancer.  The purpose of this guide is to show you how to load balance your web servers, not to teach you how to set up a web server itself, I will leave that to you, they can be anything you like, our balancer is just going to evenly spread HTTP requests between them.  Just ensure that you can access each server individually and that they are returning web pages normally.

 
One you've got your load balancer built, log in via ssh (use Putty if you are on Windows) and update all packages to the latest versions:
apt-get update
apt-get upgrade

Now install HAProxy:

apt-get install haproxy

Now you only need to edit 2 files to make this work, and your balancer is all done.  We'll also make a backup of the original config for safekeeping and reference:

cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg_orig
nano /etc/haproxy/haproxy.cfg

Nano is a good editor for people used to windows, the commands you are going to need are all listed at the bottom of the screen, but I'll walk you through all the ones you'll need to set this up.

 
It's perfectly fine to leave the "global" and "defaults" sections alone, the settings in these sections are suitable for most situations.  Go ahead and remove all the "listen" sections,  using Ctrl+K cuts the line you are on, which is probably the quickest way to clear out all of the examples.
 
Once that's clear, add a new "listen" section as shown below and I'll explain what it all means afterwards:
listen webcluster:80
        mode http
        stats enable
        stats auth admin:
        balance roundrobin
        cookie BALANCEID prefix
        option httpclose
        option forwardfor
        #option httpchk HEAD /check.txt HTTP/1.0
        #http-check disable-on-404
        server:80 cookie A check
        server:80 cookie B check

You need to replace anything that looks likewith an appropriate value for your cloud servers, each will have a public IP and an internal private IP.  What we are doing here is sending all connections into your balancer's public IP then forwarding them to the webservers on their private IPs so that you do not incur double bandwidth costs.

 
stats enable & stats auth: These turn on the some very useful stats available from http:///haproxy?stats/ using the username:password combo specified on the auth line.
 
cookie: If your site uses any session information, then a load balancer will break the session consistency as it switches requests between the servers, by specifying a cookie name here (BALANCEID) you can force connections to a specific node once a session has been established on the server.  The format of the cookie will be BALANCEID=webcluster.servername.  If you don't use sessions throughout your whole site, it's best to set this only when needed.  You can do this in code, or through your web server's config.  One thing worth noting is that if you try and insert the period in a cookie name through Classic ASP, it will be URL encoded and therefore not work.  If you are using Classic ASP, then setting the cookie through IIS is your only choice.
 
option httpchk & http-check: These two lines are commented out in the example but are an optional method used to determine the up or down status of a server by checking for the existence of a file.  If you use this method, it is worth excluding this file from your server logs as it will be checked about every 2 seconds.  The "disable-on-404" allows you to gracefully remove a webserver from the cluster, by stopping any new connections but allowing existing connections to continue and only marking the server as "Down" once all connections are completed.
 
Once you've completed your config, press Ctrl+X which will prompt you to save, press "y" and then enter to save and exit nano.
 
Now type the following command:
nano /etc/default/haproxy

and in this file change "ENABLED" to "=1" and you are all set.
For the sake of ensuring you've got it all right and that there are no mistakes in your config, stop and then start the haproxy service with the following command:
 
/etc/init.d/haproxy restart

If there are any errors in your config you will see them at this point.

 
This is essentially all you need to get your load balancer working and you should be able to see the results by visiting your balancer's public IP in your browser.  Don't worry about rebooting your box, the install for HAProxy sets itself to startup whenever you reboot.
 
Whilst you are have now set up up the load balancer, you shouldn't really stop there, you need to secure your server from any kind of intrusion. Rackspace provide very good articles on securing a Linux server, so have a look at these and follow through, making sure that you change the root password, set up another user so you are not logging in as root and configure iptables (the firewall).  Configuring iptables for this set is covered in another article.
comments powered by Disqus
Support Ticket
Remote Support
Support
clever girl