Skip to content

What is HAProxy

HAProxy (High Availability Proxy) is a TCP/HTTP load balancer and proxy server that allows a webserver to spread incoming requests across multiple endpoints. This is useful in cases where too many concurrent connections over-saturate the capability of a single server.

Instead of a client connecting to a single server which processes all of the requests, the client will connect to an HAProxy instance, which will use a reverse proxy to forward the request to one of the available endpoints, based on a load-balancing algorithm.

Its most common use is to improve the performance and reliability of a server environment by distributing the workload across multiple servers (e.g. web, application, database). It is used in many high-profile environments, including: GitHub, Imgur, Instagram, and Twitter.

HAProxy is the most popular non-cloud Load Balancer, as a powerful, flexible, and very high-performance tool heavily used by everyone. HAProxy also has powerful logging methods and a nice UI.


Set up and configure HAProxy

Now we install HAProxy. To do that, enter the following in command line.

$ sudo apt-get -y install haproxy

This should get installed in a couple of minutes. Now, you have to edit the haproxy config which is situated in /etc/haproxy/haproxy.cfg

$ sudo nano /etc/haproxy/haproxy.cfg

This will display a config file as following.

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

At the bottom of the file, let’s insert our load balancing configurations.

frontend incoming_requsts
        bind *:80
        option forwardfor
        default_backend webservers

The above snippet tells the haproxy to bind all the incoming requests attached to port 80 on any ip range and forward them to a backend service called web servers, which we will be defining below.

backend webservers
        balance roundrobin
        server webserver1 Your-Webserver1-IP:80 check
        server webserver2 Your-Webserver2-IP:80 check
        option httpchk

Here you should replace the relevant ip addresses of your web servers. Note here that the load balancing option we have given is round robin. Round robin selects servers in turns. This is the default algorithm. You can learn more about other options from here.

Now let’s save the config file and verify whether configs are correct by running the following command.

$ haproxy -f /etc/haproxy/haproxy.cfg -c

Alright! If it passes the verification, You can go ahead and apply the configuration by restarting the haproxy service.

$ sudo service haproxy restart

Now if you curl the ip address of your haxproxy host from your local machine, you should be able to see the response alternating from webserver1 and webserver2. There you have it!


Geolocation Routing

Amazon AWS example

Geolocation routing lets you choose the resources that serve your traffic based on the geographic location of your users, meaning the location that DNS queries originate from. For example, you might want all queries from Europe to be routed to an ELB load balancer in the Frankfurt region.

When you use geolocation routing, you can localize your content and present some or all of your website in the language of your users. You can also use geolocation routing to restrict distribution of content to only the locations in which you have distribution rights. Another possible use is for balancing load across endpoints in a predictable, easy-to-manage way, so that each user location is consistently routed to the same endpoint.

You can specify geographic locations by continent, by country, or by state in the United States. If you create separate records for overlapping geographic regions—for example, one record for North America and one for Canada—priority goes to the smallest geographic region. This allows you to route some queries for a continent to one resource and to route queries for selected countries on that continent to a different resource.

Geolocation works by mapping IP addresses to locations. However, some IP addresses aren't mapped to geographic locations, so even if you create geolocation records that cover all seven continents, Amazon Route 53 will receive some DNS queries from locations that it can't identify. You can create a default record that handles both queries from IP addresses that aren't mapped to any location and queries that come from locations that you haven't created geolocation records for. If you don't create a default record, Route 53 returns a "no answer" response for queries from those locations.

For more information, see How Amazon Route 53 Uses EDNS0 to Estimate the Location of a User.


Other Solutions

HAProxy isn’t the only kid in town. If you feel like HAProxy might be too complex for your needs, the following solutions may be a better fit:

  • Linux Virtual Servers (LVS) — A simple, fast layer 4 load balancer included in many Linux distributions

  • Nginx — A fast and reliable web server that can also be used for proxy and load-balancing purposes. Nginx is often used in conjunction with HAProxy for its caching and compression capabilities


Further reading