Our customers’ Abiquo clouds are growing every month with more concurrent users, VMs deployed, and so on. So the Abiquo team must ensure that the platform will not limit our customers’ business.
Scalability is a key requirement and the foundation of scalability is load balancing across multiple APIs.
This feature enables customers to add a load balancing component in front of multiple servers running API.war to distribute load among multiple API nodes, thus supporting more concurrent requests and failover capabilities.
See Load balancing client and API using Apache v2.4 and v2.6 or Load balancing client and API with HAProxy v2.4 and v2.6 for configuration instructions
To configure multiple API nodes configure a 'datanode'. The datanode will contain all the basic infrastructure needed for Abiquo APIs and remote services to communicate and store data required for normal Abiquo API functionality. On the datanode, you must install:
These products may be installed with their default configuration because all the necessary setups will be defined in the Abiquo API or Abiquo Remote Services. Note that it is not necessary for all of the datanode services to be installed on the same machine. For example, you could have a separate database server. In addition, you can configure fault tolerance as required, for example, you could configure MySQL with primary-secondary replication.
User-generated API requests will need to be distributed by installing a Load Balancer (i.e. Apache) over this configuration (see our Load balancing client and API using Apache v2.4 and v2.6 or Load balancing client and API with HAProxy v2.4 and v2.6 guide).
When you have the datanode ready, deploy as many API nodes and/or Remote Services nodes as desired to improve performance and fault tolerance. Abiquo will internally distribute all event processing through the API nodes.
The following diagram shows how the load-balanced APIs will require access to the same abiquo.properties configuration and the same shared 3rd-party services.
To provide fault tolerance we have defined a Leader Election recipe. The recipe ensures that at all times one of the API nodes is the Leader, which deals with the events sent from any module in our platform. User requests are balanced and distributed to any other API nodes (or even the leader itself). We use Zookeeper to always keep track of all live API nodes and always select one, and only one, Leader to make sure all the events are processed.
All API nodes must have a common abiquo.properties file to ensure all of them will access the proper information. These are the common properties that all the API nodes must be configured to use:
All remote services servers in datacenters with API load balancing must be configured to use the same RabbitMQ instance.
The Remote Services Servers only communicate with the datacenter notifications queue in the datanode RabbitMQ instance. DO NOT change the Redis properties for the datacenters.
A single Abiquo installation can handle multiple datacenters, as shown in the following diagram. Divisions between datacenters are questions of partitioning and geoproximity. For example, the considerations for adding a new datacenter can be in terms of adding a service that is geographically closer to the users. Or to minimize the impact of in terms of the number of users that will be affected by an outage if a datacenter service falls.
To scale up the cloud service you should add another API node. Adding another API node will distribute the load across one more node. The following simplified diagram shows the same multiple-datacenter environment with multiple API nodes.
The asynchronous tasks between API nodes and remote services instances are coordinated? with RabbitMQ. All APIs are able to process requests from clients and queue asynchronous tasks to remote services and the API itself. The API leader node is the only one that consumes from the scheduler queue (because the requests are to be processed one by one) and the remote services response queues (because all of the messages must be consumed and processed in order).
To guarantee that there will always be one leader we use Curator framework. This is a well known and widely adopted solution that guarantees there will only be one or no leader (if no API is up and running).
In a worst-case scenario, when the leader fails while processing a message, another leader will be elected and continue with the job. Asynchronous jobs in the leader API will take care of the message left behind.
When a node takes the leadership will print in api.log
All the API participants are registered in the zpaht /api/leader-election
The current leader is the registered node with the lower lock value.
0000000008 < 0000000009 --> ''apuig/10.60.1.223''
The znode content is the node hostname, so its important to configure it to avoid useless localhost/127.0.0.1.
For example, for the asynchronous deployment task of a virtual machine, there are three jobs:
In a multiple-API environment, any API can queue the scheduling task. The leader will process the scheduling task. The leader then queues the deploy task (configure and power-on jobs) in the virtual factory. When the vf completes each job, the result is put in the datacenter notification queue.
No specific configuration is required to add a node to a running cluster. Just replicate the properties in the node, and configure the load balancer used in the environment.
Multiple Node Installation : since the technologies used in our datanode (MariaDB, Redis, Rabbit, Zookeeper) are widely known and properly documented in their own project homepages, all issues related with balancing, sharding, replication, etc affecting them are up to system administrators. We currently do not provide any support on how to install or configure these systems replication or balancing features.