AWS Datasync Service

Aws DataSync: Documentation, How it Works, Price, and Tutorial

What is DataSync?

AWS DataSync is an AWS managed service that simplifies, automates, and accelerates the copying of large amounts of data to and from AWS Cloud storage services. DataSync is used to transfer data between on-premises storage and the AWS Cloud, and transfer data between AWS Cloud storage services. As a managed service, DataSync reduces your requirements to modify applications, develop scripts, or manage infrastructure to copy data between storage systems and services. The AWS documentation below will brief you about the AWS Datasync price, how it works, and how to monitor AWS Datasync.

Note: DataSync supports data copy or synchronization tasks from a supported source storage system or service to a supported destination storage system or service. DataSync tasks support data transfers in only one direction.

AWS Datasync

DataSync can copy data to and from:

  • Network File System (NFS) file servers
  • Server Message Block (SMB) file servers
  • Hadoop Distributed File System (HDFS)
  • Object storage systems
  • Amazon Simple Storage Service (Amazon S3) buckets
  • Amazon EFS file systems
  • Amazon FSx for Windows File Server file systems
  • Amazon FSx for Lustre file systems
  • Amazon FSx for OpenZFS file systems
  • Amazon FSx for NetApp ONTAP file systems
  • AWS Snowcone devices

AWS DataSync Documentation

DataSync Features:

You can implement secure data transfers from on-premises to the AWS Cloud and inside the AWS Cloud for a few data storage services by using DataSync features. The difficulties with a data transfer that many organizations face are resolved and streamlined by DataSync. Some key features are given below:

Datasync Documentation

AWS Datasync Monitoring

Amazon Cloudwatch – You may examine the records of completed transfers and keep track of ongoing transfers using Amazon CloudWatch. The logs will record the times and files that were transmitted as well as the state of data integrity verification. When a transfer is finished, Cloudwatch events are generated, which you may use to start automations that depend on the transfer being completed.

DataSync Terminology and Components:

Agents – A virtual machine (VM) used to read from or write to a location is called an agent. An agent handles the on-site management of reading and writing activities to NFS or SMB file shares, object storage, or HDFS data clusters. Based on the AWS storage service, the agent’s use and location may differ for in-cloud storage

Note: No agent for DataSync is required to transfer between AWS storage services in the AWS Cloud.

Working with AWS DataSync agents – AWS DataSync (amazon.com)

Agent deployment options – You may choose how many agents to deploy based on how many systems or services you have to agent deployment options. Multi-threaded connections are made between the local DataSync agent and the cloud services, parallelizing transfers to enhance performance over wide area networks (WAN). Additionally, DataSync increases cloud resources automatically to enable higher-volume transfers. The service streamlines adding agents to your on-premises system. About the ways you can deploy DataSync agents are:

Datasync Components

Agent status – The status ONLINE indicates that the agent is functional. An agent changes to OFFLINE status if it is unable to connect to AWS.

Locations – A location is an endpoint of a task or any source or destination storage system or service used in the data transfer task. A location can serve as the source or the destination for different DataSync tasks, whether it is on-premise or in the AWS Cloud. Each task has one source location and one destination location of each location type. DataSync supports the following location types:

  • Network File System (NFS)
  • Server Message Block (SMB)
  • Hadoop Managed File System (HDFS)
  • Object storage systems
  • Amazon Elastic File System (Amazon EFS)
  • Amazon FSx for Windows File Server
  • Amazon FSx for Lustre
  • Amazon FSx for OpenZFS
  • Amazon FSx for NetApp ONTAP
  • Amazon S3

Note: DataSync supports asynchronous single-direction synchronization. DataSync has no collision-detection or version-reconciliation mechanisms. Bidirectional synchronization tasks are not supported.

Working with AWS DataSync locations – AWS DataSync (amazon.com)

Tasks and task configuration settings – A task is an individual data transfer. Configuration settings include the source and destination locations, options, filtering, scheduling and frequency, tags, and logging.

Working with AWS DataSync tasks – AWS DataSync (amazon.com)

Managing modified and deleted files – You can choose how DataSync handles files that were modified and deleted on the source storage when running your data transfer task.

Task execution – A task execution is a single incident of a task that displays data such as start and end times, the number of files transferred, and status. The figure is showing the five transition phases and two possible termination states of task execution. To increase productivity and speed up data transfer, several of the activities operate simultaneously.

Data integrity verification – DataSync locally calculates checksums for every file in the source and destination file systems. The checksums are compared to verify data integrity. Data integrity verification occurs regardless of your chosen VerifyMode selection.

Open and locked file handling – In general, DataSync transfers open files without any limitations. If a file is open and being written during the transfer, DataSync detects data inconsistency in the verifying phase and marks the file as failed.

If a file is locked and the source system prevents the file from being opened, DataSync skips transferring the file. DataSync logs the error during the transferring phase and sends a verification error.

Termination status – Transfers might be successful or failed. # Successful—If the data transfer is successful, a SUCCESS value is returned. # Failed—If the data transfer fails, an ERROR value is returned.

How DataSync Works

It’s not really hard to understand the working model of the Datasync service however, for your better understanding of how datasync works, we have added some snapshots. In DataSync, we can transfer data in three ways:

  • Transfer data between on-premises and AWS

How Datasync works

  • Transfer data between AWS storage services

How Datasync works

  • Transfer data between AWS and other locations
How datasync works
 Steps to transfer data between AWS storage services with the help of DataSync:

1. Log in to the AWS Management Console.

2. Open the DataSync Service on the console.

3. After clicking on the above link you can see the same screen that is below attached.

AWS Datasync
 4. Now click on the ‘Create Task’ button. Then choose ‘create a new location’ and location type ‘S3’. As this is a source location So you choose that region where your data is present.

5. Then choose that bucket with that folder in which you want to transfer data to another region. And for IAM Role click on ‘Autogenerate’ or else create from your end if you want. Click on ‘Next’.

6. Now for the destination location choose ‘create a new location and the location is ‘S3’. Then for the region, in which region do you want to transfer your data.

7. After choosing Region, choose the ‘bucket’ and ‘folder’ where you want to save your data. For IAM Role click on ‘Autogenerate’ or else create from your end. Click on the ‘Next’ button.

8. Now mention ‘Task Name’ and leave everything default or if you want to modify anything do as your requirement. Click on the ‘Next’ button.

9. In the last Review everything and click on ‘Create Task’.

10. For starting to transfer data click on the ‘Start’ button with the default option.

11. Starting it you can see the task status as ‘Running’.

12. Once it is completed you can see complete detail in the ‘History’ option. If your status is showing success then your data is transferred successfully.

For further details, you can check this AWS Datasync tutorial:

Tutorial: Transferring data from Amazon S3 to Amazon S3 in a different AWS account – AWS DataSync

AWS DataSync Use Cases

You can use DataSync anytime you need to copy data between on-premises file shares or compatible object storage and supported AWS storage services also copy data between in-cloud to AWS services. Some use cases are:

Data Migration: Migrating data is a primary DataSync use case. DataSync is designed to help you migrate data from on-premises to the AWS Cloud and also migrate between AWS storage services.

Data Archiving: Using DataSync, you can automatically transfer expensive on-premises storage systems to cold data in safe, long-term storage on the AWS Cloud. To allow easy access, you can transfer your archived data storage into the relevant Amazon S3 storage class. Direct copying is an option for cost-effective, long-term archival storage into an Amazon S3 Glacier storage class.

Data Protection and Recovery: With DataSync, you can create many different copy scenarios, including the following:

  • Copy your data directly into any of the Amazon S3 storage classes that meet your data protection needs.
  • Copy your data into Amazon EFS or Amazon FSx to create a standby file system copy.
  • Copy across accounts using Amazon S3 cross-account transfers.
  • Further, protect your data with additional copies located in other AWS Regions or different AWS storage services.

Data transfer for immediate in-cloud processing: A popular use case is to copy data to the AWS Cloud to perform in-cloud processing on the data. A few popular examples include the following:

  • Compute-intensive application processing
  • Data manipulation
  • Data analytics processing
  • Machine learning modeling or batch inference processing
  • Genomics sequencing

AWS DataSync Price

Predictable usage-based pricing is provided by DataSync. You just pay for the data you actually copy. The usage of network acceleration technologies, managed cloud infrastructure, data validation, and automation in DataSync is based on flat per-gigabyte pricing. There are no upfront charges, no minimum fees, and no resources for you to manage. When you utilize AWS services in addition to DataSync, then there are extra costs apply.

Recently, in U.S. East (Ohio) region there AWS DataSync price for data copied is $0.0125 per gigabyte (GB).

Additional AWS Service fees

Additional charges include the following:

  • Standard request, storage, and data transfer rates to read to and write from AWS services, such as Amazon S3, Amazon EFS, Amazon FXs for Windows File Server, Amazon FSx for Lustre, and Amazon FSx for OpenZFS.
  • Copying data from an AWS storage service to an on-premises storage system. You pay for AWS data transfer at your standard rate.
  • Copying data from one AWS Region to another AWS Region. You pay for AWS data transfer at your standard rate.
  • Standard rates for other AWS services such as CloudWatch and CloudTrail.

Examples: AWS DataSync Price

I have attached a screenshot of the AWS DataSync price for the different data transfer scenarios. To obtain the current DataSync pricing, see AWS DataSync pricing.

AWS Datasync Pricing

Conclusion

AWS DataSync Service can migrate data from on-premises to AWS Services and also within AWS Storage services. AWS DataSync is faster than other services and highly secure, which transfers data with in-transit encryption and end-to-end data validation. Amazon DataSync service is also very easy to use and automatically transfers data you need to set up everything once then this can manage it. Furthermore, the AWS datasync price is quite predictable and easy on the budget which makes it an interesting package overall. We are providing AWS DataSync service as a managed service and if you want any consultant on it then we providing that also. AWS DataSync is a migration and transfer service and we have a good experience with it. Also, we have experienced and certified employees who can do this task securely and quickly.