Put your message here! Contact me for more information
 
 







 

Archive for the ‘Clustering’ Category


 

For the past few days I’ve been working on the cluster project again, after a month of suspension. Finally today I was able to configure a (almost) working load balancer with 3 web servers (1 is the loadbalancer itself, and 2 other dedicated web servers, web01 and web02). I’m in the process of writing down the scripts which I will follow later on to do the screencast. At this stage, what I have is half of the whole picture of a web cluster:

  • Failover loadbalancer server pair, running Heartbeat and ldirectord
  • A small cluster of 3 load-balanced web servers, using weighted round robin
  • A pair of Nameserver to provide name-lookup service for the entire cluster, no more “ping 10.10.10.10.101.101.10.12“!

What is missing is the loadbalanced MySQL cluster. I have been able to get a cluster of 3 MySQL boxes working, but now I’d like to add the ipvsadm and ldirectord to provide load-balancing feature so that the cluster can be scaled up easier.

I am pretty proud of myself for learning all of these amazing technologies within a relatively short period of time. Especially for the DNS, I have learned so much about the networking aspect of a network of computer. The two books, “DNS & BIND” and “BIND Cookbook”, both by O’Reilly, are amazing reference source. I particularly like the “DNS & BIND” book for its in depth coverage on the concept of how nameservers work. Without that fundamental knowledge, it’s hard to build such a working cluster with growing demands.

Here is the revised virtual Linux cluster diagram. So slowly but surely my cluster project is reaching its destination: a complete virtualized web cluster running on VMWare. And once this is done, I think I begin to learn how to develop cluster-ready applications.

Revised HA Cluster

view comments
 

I’ve been doing quite a bit of reading about High-Availability (HA) web clustering techniques for the past 2 weeks. Thanks to VMWare, now it is possible to create a virtual web server farm with multiple linux instances running concurrently to simulate a cluster setting. The initial result is very heart-warming: I’ve sucessfully installed a mysql database, configure fail-over (thus means high-avaibility) web servers using heartbeat, and name servers running BIND9 to do name lookup. The last bit of the puzzle is now the Linux Virtual Server (LVS) Director for load-balancing. That would be my next experiment.

Here is my initial diagram of my web servers farm and the IP/server name assignments
Web Farm

With this design, I tried to eliminate the single point of failure by implementing redundant, ready to fail-over servers for critical sections of the network, for example, the DNS, the load balancers for incoming traffic, and the load balancers for the database farm.

Basically, incoming traffic will pass through A, the load balancer. The load balancer A will spread out the load to the web servers C in the farm using LVS-DR method (direct-routing). There are also 2 kinds of web servers: one is supposed to be the beefy, powerful server with fast CPU to run the web applications, and the other is the media server which doesn’t need good CPU but requires fast HDD (SCSI) and lots of RAM. The applications server will do the number crunching, churning out pages as fast as they can while the media servers will provide all the images, CSS, and javascript files. Of course since I am using VMWare, it virtually costs me $0.00 to add a new scsi drive to the VM machine. Great!
The name servers B running Bind9 are located centrally to help with the name resolutions. Of course DNS is critical so we need to have a certain level of HA. Hearbeat will make sure the DNS is always up and ready.

Meanwhile, for the database farm E, which is on a separate network (supposed to be high-speed, low latency with very expensive switches) a pair of load balancers is needed to spread out the “read” (SELECT) load. I’m not quite sure how to implement the “write” (UPDATE/ DELETE, ALTER TABLE, etc.) DB servers yet, but I’m sure that we can improvise along the way. Again, Heartbeat will be implemented to keep the database load balancers up and happy. Our database farm will consist of 2 network storage nodes to store data and 2 “API” nodes to do the database heavy lifting. A fifth server is used to be the management node to manage (add, delete, or update) the database servers.

Finally, (and not shown in the above diagram as I just realize that I am missing something), a monitoring server running Nagios is implemented to do health-monitoring and network management. With the current design, all part of the network can be scaled independently: if more web servers are needed, we add new boxes to section C. If we need more database storage nodes, we can quickly add a new NDB node to the MySQL database cluster F. The bottle neck will now be our gateway, the load balancers in A. However, since it’s been confirmed (see the linux-ha.org site) that a decent load balancer can easily handle the amount traffic to saturate a 100Mbps connection, I would say for a small/ medium business settings, this is more than enough.

If you are asking why I am writing all of this down. I am doing this because I will begin to construct this web farm using VMWare with CentOS 4.4. The post and the diagram will serve as a guideline for this particular project. Moreover, I intend to do screencast of the entire process of setting up this web farm. Yes, I’d like to commoditize the knowledge of building Linux cluster using off-the-shelf tools. It’s a noble goal, I know, but I’m doing it for myself first so you don’t have to thank me now.
Now off to work I go. Keep on checking back alexle.net for more information about Web clustering. “This is Alex Le doing it so you don’t have to.” (yeah, I copy Ze frank’s line, so excuse me for the plagiarism. :)

view comments