Skip to main content

2 posts tagged with "API"

View All Tags

Automate Your OpenSearch/Elasticsearch Backups with S3 and Lambda: A Complete Guide

· 10 min read

In the world of data management and cloud computing, ensuring data security through regular backups is crucial. OpenSearch and Elasticsearch provide robust mechanisms to back up data using snapshots, offering several approaches to cater to different operational needs. This blog post will walk you through setting up and managing snapshots using AWS, with detailed steps for both beginners and advanced users.



Introduction to Snapshots in OpenSearch and Elasticsearch


Snapshots are point-in-time backups of your OpenSearch or Elasticsearch data. By taking snapshots at regular intervals, you can ensure your data is always backed up, which is especially important in production environments. Snapshots can be scheduled to run automatically, whether hourly, daily, or at another preferred frequency, making it easy to maintain a stable backup routine.


Setting Up an OpenSearch Cluster on AWS


Before diving into snapshot creation, its essential to set up an OpenSearch cluster. Here is how:

  1. AWS Console Access: Begin by logging into your AWS Console and navigating to OpenSearch.
  2. Cluster Creation: Create a new OpenSearch domain (essentially your cluster) using the "Easy Create" option. This option simplifies the setup process, especially for demonstration or learning purposes.
  3. Instance Selection: For this setup, select a lower instance size if you are only exploring OpenSearch features and dont require high memory or compute power. For this demo, an m5.large instance with minimal nodes is sufficient.

Configuring the Cluster


When configuring the cluster, adjust the settings according to your requirements:


Memory and Storage


  1. Memory and Storage: Set minimal storage (e.g., 10 GB) to avoid unnecessary costs.
  2. Node Count: Choose a single-node setup if you are only testing the system.
  3. Access Control: For simplicity, keep public access open, though in production, you should configure a VPC and control access strictly.

Snapshot Architecture: AWS Lambda and S3 Buckets


 Snapshot Architecture


AWS provides a serverless approach to managing snapshots via Lambda and S3 buckets. Here is the basic setup:

  1. Create an S3 Bucket: This bucket will store your OpenSearch snapshots.

S3 Bucket


  1. Lambda Function for Snapshot Automation: Use AWS Lambda to automate the snapshot process. Configure the Lambda function to run daily or at a frequency of your choice, ensuring backups are consistent and reliable.

Lambda Function


Writing the Lambda Code


For the Lambda function, Python is a convenient choice, but you can choose other languages as well. The Lambda function will connect to OpenSearch, initiate a snapshot, and store it in the S3 bucket. Here is a simple breakdown of the code structure:

import boto3, os, time
import requests
from requests_aws4auth import AWS4Auth
from datetime import datetime
import logging

from requests.adapters import HTTPAdapter, Retry

# Set the global variables
# include https:// and trailing /
host = str(os.getenv('host'))
region = str(os.getenv('region','eu'))
s3Bucket = str(os.getenv('s3Bucket'))
s3_base_path = str(os.getenv('s3_base_path','daily'))
s3RepoName = str(os.getenv('s3RepoName'))
roleArn = str(os.getenv('roleArn'))

service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

s3 = boto3.client('s3')

def lambda_handler(event, context):
datestamp = datetime.now().strftime('%Y-%m-%dt%H:%M:%S')

# Register repository
# the Elasticsearch API endpoint

path = '_snapshot/'+s3RepoName
url = host + path

snapshotName = 'snapshot-'+datestamp

# Setting for us-east-1. Comment below if another region
payload = {
"type": "s3",
"settings": {
"bucket": s3Bucket,
"base_path": s3_base_path,
"endpoint": "s3.amazonaws.com",
"role_arn": roleArn
}
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)

# Take snapshot - Even though this looks similar to above, but this code is required to take snapshot.
# Snapshot to take with datestamp concatanetated - this creates separate snapshots
path = '_snapshot/'+s3RepoName+'/'+snapshotName

url = host + path

string = snapshotName
bucket_name = s3Bucket


s3 = boto3.resource("s3")
s3.Bucket(bucket_name).put_object(Key=s3_path, Body=string)
print(f"Created {s3_path}")
### Text File copying ends here

while True:
response = requests.put(url, auth=awsauth)
status_code = response.status_code
print("status_code == "+str(status_code))
if status_code >= 500:
# Hope it won't 500 a little later
print("5xx thrown. Sleeping for 200 seconds.. zzzz...")
time.sleep(200)
else:
print(f"Snapshot {snapshotName} successfully taken")
break

print(r.text)
  1. Snapshot API Call: The code uses the OpenSearch API to trigger snapshot creation. You can customize the frequency to take snapshots.
  2. Error Handling: In scenarios where snapshots take long, retries and error handling are implemented in to manage API call failures.
  3. Permissions Setup: Grant your Lambda function the necessary permissions to access OpenSearch and the S3 bucket. This includes setting up roles and policies in AWS Identity and Access Management (IAM).
  4. Invocation Permissions: Lambda function will need to have role that allows access to OpenSearch domain. The role should allow Lambda to upload snapshots to s3 bucket:

{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<BUCKET-NAME>/*"
]
}

Creating an AWS Lambda Layer for the Requests Library

To create a custom AWS Lambda layer specifically for the requests library, follow these steps. This guide will help you set up the requests package as a Lambda layer so it can be reused across multiple Lambda functions.

Follow these steps to create a custom AWS Lambda layer that includes the requests library.


Step 1: Prepare the Requests Dependency


Since Lambda layers require dependencies to be packaged separately, we need to install the requests library in a specific structure.


Set Up a Local Directory

Create a folder structure for installing the dependency.


mkdir requests-layer
cd requests-layer
mkdir python

Install the Requests Library

Use pip to install requests into the python folder:


pip install requests -t python/

Verify Installation

Check that the python directory contains the installed requests package:


ls python

You should see a folder named requests, confirming that the package was installed successfully.


Step 2: Create a Zip Archive of the Layer

After installing the dependencies, zip the python directory:


zip -r requests-layer.zip python

This creates a requests-layer.zip file that you will upload as a Lambda layer.


Step 3: Upload the Layer to AWS Lambda


  1. Open the AWS Lambda Console
  2. Go to the AWS Lambda Console.
  3. Create a New Layer

 New Layer


  1. Select Layers from the left-hand navigation.

layer


  1. Click Create layer.
  2. Configure the Layer
  3. Name: Provide a name like requests-layer.
  4. Description: Optionally, describe the purpose of the layer.
  5. Upload the .zip file: Choose the requests-layer.zip file you created.
  6. Compatible runtimes: Choose the runtime(s) that match your Lambda function, such as Python 3.8, Python 3.9, or Python 3.10. 11.Create the Layer
  7. Click Create to upload the layer.

Step 4: Add the Layer to Your Lambda Function


1.Open Your Lambda Function 2.In the Lambda Console, open the Lambda function where you want to use requests. 3.Add the Layer 4.In the Layers section, click Add a layer. 5.Select Custom layers and choose the requests-layer. 6.Select the specific version (if there are multiple versions). 7.Click Add.


OpenSearch Dashboard Configuration


 OpenSearch


The OpenSearch Dashboard (formerly Kibana) is your go-to for managing and monitoring OpenSearch. Here is how to set up your snapshot role in the dashboard:


  1. Access the Dashboard: Navigate to the OpenSearch Dashboard using the provided domain link.
  2. Role Setup: Go to the security settings and create a new role for managing snapshots. Grant this role permissions to access the necessary indices and S3 bucket. Following is the role that needs to be created:
Trust Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"opensearch.amazonaws.com",
"es.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}

Role Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": [
"arn:aws:s3:::BUCKET-NAME"
]
},
{
"Effect": "Allow",
"Action": [
"s3:*Object"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME/*"
]
},
{
"Sid": "ESaccess",
"Effect": "Allow",
"Action": [
"es:*"
],
"Resource": [
"arn:aws:es:eu-west-2:<ACCOUNT-NUMBER>:domain/*"
]
}
]
}

  1. Mapping the Role: Map the new role to your Lambda functions IAM role to ensure seamless access.

 Mapping the Role


Setting Up Snapshot Policies in the Dashboard


The OpenSearch Dashboard allows you to create policies for managing snapshots, making it easy to define backup schedules and retention periods. Here is how:


  1. Policy Configuration: Define your backup frequency (daily, weekly, etc.) and the retention period for each snapshot.
  2. Retention Period: Set the maximum number of snapshots to keep, ensuring that old snapshots are automatically deleted to save space.

Channel


  1. Notification Channel: You can set up notifications (e.g., via Amazon SNS) to alert you if a snapshot operation fails.

Testing and Troubleshooting Your Snapshot Setup


Once your setup is complete, it is time to test it:

  1. Run a Test Snapshot: Trigger your Lambda function manually and check your S3 bucket for the snapshot data.
  2. Verify Permissions: If you encounter errors, check your IAM roles and permissions. Snapshot failures often occur due to insufficient permissions, so make sure both the OpenSearch and S3 roles are configured correctly.
  3. Monitor Logs: Use CloudWatch logs to review the execution of your Lambda function, which will help in troubleshooting any issues that arise.

Disaster Recovery and Restoring Snapshots


In the unfortunate event of data loss or a disaster, restoring your data from snapshots is straightforward. Here is a simple guide:

  1. New Cluster Setup: If your original cluster is lost, create a new OpenSearch domain.
  2. Restore Snapshot: Use the OpenSearch API to restore the snapshot from your S3 bucket.
  3. Cluster Health Check: Once restored, check the health of your cluster and validate that your data is fully recovered.

Conclusion


Using AWS Lambda and S3 for snapshot management in OpenSearch provides a scalable and cost-effective solution for data backup and recovery. By setting up automated snapshots, you can ensure that your data is consistently backed up without manual intervention. With the additional security and monitoring tools provided by AWS, maintaining the integrity and availability of your OpenSearch data becomes a manageable task.


Explore the various options within AWS and OpenSearch to find the configuration that best fits your environment. And as always, remember to test your setup thoroughly to prevent unexpected issues down the line.


For more tips on OpenSearch, AWS, and other cloud solutions, subscribe to our newsletter and stay up-to-date with the latest in cloud technology! Ready to take your cloud infrastructure to the next level? Please reach out to us

Mastering Data Transfer Times for Cloud Migration

· 7 min read

First, let's understand what cloud data transfer is and its significance. In today's digital age, many applications are transitioning to the cloud, often resulting in hybrid models wherecomponents may reside on-premises or in cloud environments. This shift necessitates robustdata transfer capabilities to ensure seamless communication between on-premises and cloud components.

Businesses are moving towards cloud services not because they enjoy managing data centers, but because they aim to run their operations more efficiently. Cloud providers specialize in managing data center operations, allowing businesses to focus on their core activities. This fundamental shift underlines the need for ongoing data transfer from onpremises infrastructure to cloud environments.

To give you a clearer picture, we present an indicative reference architecture focusing on Azure (though similar principles apply to AWS and Google Cloud). This architecture includes various components such as virtual networks, subnets, load balancers, applications, databases, and peripheral services like Azure Monitor and API Management. This setup exemplifies a typical scenario for a hybrid application requiring data transfer between cloud and on-premises environments.

Indicative Reference Architecture

Calculating Data Transfer Times

A key aspect of cloud migration is understanding how to efficiently transfer application data. We highlight useful tools and calculators that have aided numerous cloud migrations. For example, the decision between using AWS Snowball, Azure Data Box, or internet transfer is a common dilemma. These tools help estimate the time required to transfer data volumes across different bandwidths, offering insights into the most cost-effective and efficient strategies. Following calculators should be used to calculate data transfer costs.

Ref: https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#time

Ref: https://learn.microsoft.com/en-us/azure/storage/common/storage-choose-data-transfer-solution

Following image from Google documentation provides a good chart on data size with respect to network bandwidth:

Calculating Data Transfer Times

Cost-Effective Data Transfer Strategies

Simplification is the name of the game when it comes to data transfer. Utilizing simple commands and tools like Azure's azcopy, AWS S3 sync, and Google's equivalent services can significantly streamline the process. Moreover, working closely with the networking team to schedule transfers during off-peak hours and chunking data to manage bandwidth utilization are strategies that can minimize disruption and maximize efficiency.

[x] Leverage SDK and APIs where applicable [x] Work with the organizations network team [x] Try to split data transfers and leverage resumable transfers [x] Compress & Optimize the data [x] Use Content Delivery Networks (CDNs), caching and regions closer to data [x] Leverage cloud provider products to its strength and do your own analysis

Deep Dive Comparison

We compare data transfer services across AWS, Azure, and Google Cloud, covering direct connectivity options, transfer acceleration mechanisms, physical data transfer appliances, and services tailored for large data movements. Each cloud provider offers unique solutions, from AWS's Direct Connect and Snowball to Azure's ExpressRoute and Data Box, and Google Cloud's Interconnect and Transfer Appliance.

AWSAzureGCP
AWS Direct ConnectAzure ExpressRouteCloud Interconnect
Provides a dedicated network connection from on-premises to AWS.Offers private connections between Azure data centers and infrastructure.Provides direct physical connections to Google Cloud.
Amazon S3 Transfer AccelerationAzure Blob Storage TransferGoogle Transfer Appliance
Speeds up the transfer of files to S3 using optimized network protocols.Accelerates data transfer to Blob storage using Azure's global network.A rackable high-capacity storage server for large data transfers.
AWS Snowball/SnowmobileAzure Data BoxGoogle Transfer appliance
Physical devices for transporting large volumes of data into and out of AWS.Devices to transfer large amounts of data into Azure Storage.Is a high-capacity storage device that can transfer and securely ship data to a Google upload facility. The service is available in two configurations: 100TB or 480TB of raw storage capacity, or up to 200TB or 1PB compressed.
AWS Storage GatewayAzure Import/ExportGoogle Cloud Storage Transfer Service
Connects on-premises software applications with cloud-based storage.Service for importing/exporting large amounts of data using hard drives and SSDs.Provides similar but not ditto same services such as DataPrep.
AWS DataSyncAzure File SyncGoogle Cloud Storage Transfer Service
Automates data transfer between on-premises storage and AWS services.Synchronizes files across Azure File shares and on-premises servers.Automates data synchronization from and to GCP Storage from external sources.
CloudEndureAzure Site RecoveryMigrate 4 Compute Engine
AWS CloudEndure works with both Linux and Windows VMs hosted on hypervisors, including VMware, Hyper-V and KVM. CloudEndure also supports workloads running on physical servers as well as cloud-based workloads running in AWS, Azure, Google Cloud Platform and other environmentsHelp your business to keep doing business—even during major IT outages. Azure Site Recovery offers ease of deployment, cost effectiveness, and dependability.To lift & shift on-prem apps to GCP.

Conclusion

As we wrap up our exploration of the data transfer speed and corresponding services provided by AWS, Azure, and GCP, it should be clear what options to consider for what data size and that each platform offers a wealth of options designed to meet the diverse needs of businesses moving and managing big data. Whether you require direct network connectivity, physical data transport devices, or services that synchronize your files across cloud environments, there is a solution tailored to your specific requirements.

Choosing the right service hinges on various factors such as data volume, transfer frequency, security needs, and the level of integration required with your existing infrastructure. AWS shines with its comprehensive services like Direct Connect and Snowball for massive data migration tasks. Azure's strength lies in its enterprise-focused offerings like ExpressRoute and Data Box, which ensure seamless integration with existing systems. Meanwhile, GCP stands out with its Interconnect and Transfer Appliance services, catering to those deeply invested in analytics and cloud-native applications.

Each cloud provider has clearly put significant thought into how to alleviate the complexities of big data transfers. By understanding the subtleties of each service, organizations can make informed decisions that align with their strategic goals, ensuring a smooth and efficient transition to the cloud.

As the cloud ecosystem continues to evolve, the tools and services for data transfer are bound to expand and innovate further. Businesses should stay informed of these developments to continue leveraging the best that cloud technology has to offer. In conclusion, the journey of selecting the right data transfer service is as critical as the data itself, paving the way for a future where cloud-driven solutions are the cornerstones of business operations.

Call to Action

Choosing the right platform depends on your organizations needs. For more insights, subscribe to our newsletter for insights on cloud computing, tips, and the latest trends in technology. or follow our video series on cloud comparisons.

Interested in having your organization setup on cloud? If yes, please contact us and we'll be more than glad to help you embark on cloud journey.