Create a Load Balanced Web Service on Rackspace Cloud
When we first wanted to load balance web servers, we initially followed
Rackspace Cloud's
articles on the subject. They recommended using mod_proxy with Apache. This took a little while to set up and even with countless amounts of config changes, every now and then requests would get lost and you'd have to refresh your browser to get connected again. This was not a problem in a development environment, but is unacceptable when we wanted to go live with the service. So we looked for a different solution and found
HAProxy, which was not only easier to set up than mod_proxy_balancer, but is tonnes more reliable and quicker too.
To get started you are going to need one 256MB Ubuntu 9.10 server for your load balancer, and 2 or more other servers to be your web servers behind the load balancer. The purpose of this guide is to show you how to load balance your web servers, not to teach you how to set up a web server itself, I will leave that to you, they can be anything you like, our balancer is just going to evenly spread HTTP requests between them. Just ensure that you can access each server individually and that they are returning web pages normally.
One you've got your load balancer built, log in via ssh (use
Putty if you are on Windows) and update all packages to the latest versions:
apt-get update
apt-get upgrade
Now install HAProxy:
apt-get install haproxy
Now you only need to edit 2 files to make this work, and your balancer is all done. We'll also make a backup of the original config for safekeeping and reference:
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg_orig
nano /etc/haproxy/haproxy.cfg
Nano is a good editor for people used to windows, the commands you are going to need are all listed at the bottom of the screen, but I'll walk you through all the ones you'll need to set this up.
It's perfectly fine to leave the "global" and "defaults" sections alone, the settings in these sections are suitable for most situations. Go ahead and remove all the "listen" sections, using Ctrl+K cuts the line you are on, which is probably the quickest way to clear out all of the examples.
Once that's clear, add a new "listen" section as shown below and I'll explain what it all means afterwards:
listen webcluster:80
mode http
stats enable
stats auth admin:
balance roundrobin
cookie BALANCEID prefix
option httpclose
option forwardfor
#option httpchk HEAD /check.txt HTTP/1.0
#http-check disable-on-404
server:80 cookie A check
server:80 cookie B check
You need to replace anything that looks likewith an appropriate value for your cloud servers, each will have a public IP and an internal private IP. What we are doing here is sending all connections into your balancer's public IP then forwarding them to the webservers on their private IPs so that you do not incur double bandwidth costs.
stats enable & stats auth: These turn on the some very useful stats available from http:///haproxy?stats/ using the username:password combo specified on the auth line.
cookie: If your site uses any session information, then a load balancer will break the session consistency as it switches requests between the servers, by specifying a cookie name here (BALANCEID) you can force connections to a specific node once a session has been established on the server. The format of the cookie will be BALANCEID=webcluster.servername. If you don't use sessions throughout your whole site, it's best to set this only when needed. You can do this in code, or through your web server's config. One thing worth noting is that if you try and insert the period in a cookie name through Classic ASP, it will be URL encoded and therefore not work. If you are using Classic ASP, then setting the cookie through IIS is your only choice.
option httpchk & http-check: These two lines are commented out in the example but are an optional method used to determine the up or down status of a server by checking for the existence of a file. If you use this method, it is worth excluding this file from your server logs as it will be checked about every 2 seconds. The "disable-on-404" allows you to gracefully remove a webserver from the cluster, by stopping any new connections but allowing existing connections to continue and only marking the server as "Down" once all connections are completed.
Once you've completed your config, press Ctrl+X which will prompt you to save, press "y" and then enter to save and exit nano.
Now type the following command:
nano /etc/default/haproxy
and in this file change "ENABLED" to "=1" and you are all set.
For the sake of ensuring you've got it all right and that there are no mistakes in your config, stop and then start the haproxy service with the following command:
/etc/init.d/haproxy restart
If there are any errors in your config you will see them at this point.
This is essentially all you need to get your load balancer working and you should be able to see the results by visiting your balancer's public IP in your browser. Don't worry about rebooting your box, the install for HAProxy sets itself to startup whenever you reboot.