Introduction:
Elasticsearch provides cloud offerings for the following cloud providers: AWS, AZURE, and GCP. In this article, we will be performing benchmarking against each of these providers and try to identify the performance between the cloud providers. To perform benchmarking we will be making use of the tool called EsRally which is widely recommended and used by the Elasticsearch community.
Computing Resources:
The usage of the Elasticsearch cluster depends on the computing resources allocated. The Memory, Storage, Compute and Network resources are primary resources involved in the Elasticsearch cluster. To maintain uniformity, we will use a 3-node cluster with the following configuration for all the supported cloud providers.
Note: It is important to note that ARM processors are not available with Microsoft Azure hosting.
Amazon Web Services:
Architecture:
The following are the configurations utilised for the Elasticsearch cluster deployed with AWS.
Note: More information on the hardware profile can be found here
Google Cloud Platform:
Architecture:
The following are the configurations utilised for the Elasticsearch cluster deployed with GCP.
Note: More information on the hardware profile can be found here
Microsoft Azure:
Architecture:
The following are the configurations utilized for the Elasticsearch cluster deployed with Azure.
Note: More information on the hardware profile can be found here
Benchmarking:
Before we begin the benchmarking, we will need to create VM instances (for running load tests) in each of the corresponding regions of the different cloud providers. For example, Elasticsearch cluster setup in AWS US-East will have a VM instance created in AWS in the US-East zone. Similarly, we will create one in GCP and Azure. We will be running the ES Rally from the created VMs to ensure the test doesn’t suffer from a higher degree of latency.
ES Rally is a benchmarking framework from Elasticsearch to do a complete end-to-end performance analysis of an Elasticsearch cluster.
The benchmarking will be split into two sections: Indexing and Search Requests. We will be making use of the already existing tracks: metricbeat and geonames. In addition, we will make use of the following configuration:
Default bulk size as 5000
Clients: 8
Primary Shards: 1
Replica Shards: 1
The benchmarking was run multiple times and the results below are based on the average from every successful run.
Indexing Benchmark:
Metricbeat Index:
The data set used for this benchmark is Metricbeat data with the following specifications:
Total documents: 1.1 Million
Data volume: 1.1GB
We also want to capture the time period between submission of a request and receiving the complete response along with the wait time, i.e. the time the request spends waiting until it is ready to be serviced by Elasticsearch.
The average of the 99th percentile latency over the course of multiple runs was:
Amazon Web Services: 10808.44 ms
Google Cloud Platform: 7741.14 ms
Microsoft Azure: 7682.26 ms
Conclusion:
The maximum indexing throughput is as follows:
With 3 nodes in AWS, we got 12900 events per second
With 3 nodes in GCP, we got 17256 events per second
With 3 nodes in Azure, we got 22990 events per second
Geoname Index:
The data set used for this benchmark is Metricbeat data with the following specifications:
Total documents: 11.4 Million
Data volume: 1.1GB
We also want to capture the time period between submission of a request and receiving the complete response along with the wait time, i.e. the time the request spends waiting until it is ready to be serviced by Elasticsearch.
The average of the 99th percentile latency over the course of multiple runs was:
Amazon Web Services: 7080.53 ms
Google Cloud Platform: 8298.98 ms
Microsoft Azure: 5253.86 ms
Conclusion:
The maximum indexing throughput is as follows:
With 3 nodes in AWS, we got 35256 events per second
With 3 nodes in GCP, we got 29170 events per second
With 3 nodes in Azure, we got 41417 events per second
Search Benchmark:
For the benchmarking of the search queries, we will be looking at the 90th percentiles of the service time for a different set of queries.
Metricbeat Index:
auto-date-histogram
auto-data-histogram-with-tz
date-histogram
Date-histogram-with-tz
Geoname Index:
default
term
phrase
desc_sort_population
asc_sort_population
asc_sort_geonameid
desc_sort_geonameid
From both the Search benchmarks, we can observe that Azure Cloud provider queries have almost 6–7 times higher service times when compared to AWS and GCP.
Conclusions:
To conclude our experimentation,
For the indexing benchmark, the number of documents indexed per second for Azure seems to be on the higher side when compared with AWS and GCP for both the metricbeat and geonames benchmark. But our earlier findings related to the latency, we can conclude that the indexing rate might have been similar between all the three cloud providers provided the latency was significantly similar.
For the search benchmarks, it is very evident that the Azure Elasticsearch cluster lacks the nvme processor firepower that AWS and GCP currently possess. This is the primary reason why the search query service time is 6–7 times higher.
After the benchmark exercise, we were able to conclude that the current infrastructure which Elastic cloud provides for AZURE will not be able to match the search performance with that of AWS and GCP. In addition to the indexing and performance, the pricing for AWS seems comparatively less when compared to GCP and AZURE.
Elasticsearch continuously keeps upgrading its infrastructure behind the scenes and it should be a matter of time before the performance becomes similar between all their service providers.
Comments