Simple Storage Service (S3) is used to store in the cloud. S3 can scale to enormous size. You can store an unlimited number of objects and access them from anywhere. You access S3 over HTTP or HTTPS using a REST API.
S3 provides 99.999999999% (that’s 11 nines) durability by storing data multiple times across multiple availability zones within a region. A single object can be anywhere from 1 byte to 5 terabytes and you can store an unlimited number of objects.
Unlike Elastic Block Storage, you cannot attach S3 storage to an instance. All access is through the REST API. In this chapter, I will show you how to create and manage buckets, which are used to store data. I will also show you how to upload and download objects and manage storage options.
Next, we will discuss versioning and object life cycle. I will go on to show you how to save money by using Glacier cold storage. Finally, we will talk about security, including encryption at rest and enabling public access to objects in your bucket.
This chapter has two exercises. The first will show you how to host a static web site in S3. We will deploy and configure a web site using PowerShell. The second will discuss how to create pre-signed URLs that allow a user to access data for a specific period of time. At the end of that period, the URL expires, and the user can no longer access the content. Let’s get started.
Managing Buckets
S3 stores objects in buckets. It may help to think of a bucket as a drive in a file system. Like a drive, a bucket contains files and the files can be organized into a hierarchy of folders. But that is where the analogy ends. Unlike a drive, a bucket is infinitely large and can store an unlimited number of objects. Buckets are also accessible anywhere in the world using HTTP or HTTPS.
Each account can have up to 100 buckets, and each bucket must have a name that is unique across all accounts and regions. To create a bucket, use the
New-S3Bucket command
. For example, to create a bucket named
pwsh-book-exercises, I call
New-S3Bucket and supply the name.
New-S3Bucket -BucketName pwsh-book-exercises
Each bucket is created in a region and data is replicated across multiple availability zones in that region. If you want to specify a region, other than the default region you specified in Chapter
, you can add a region attribute.
New-S3Bucket -BucketName pwsh-book-exercises-02 -Region us-west-2
As you might expect, there is a
Get-S3Bucket command
that can be used to list the buckets in your account. When called without any parameters, it lists all the buckets in your account.
If you want to get information about a specific bucket, you can call
Get-S3Bucket with the
BucketName parameter.
Get-S3Bucket -BucketName pwsh-book-exercises
If you just want to verify that a bucket exists, there is a separate command,
Test-S3Bucket
, that will return true if the bucket exists and false if it does not. Of course, you can always use
Get-S3Bucket and compare the result to null, but
Test-S3Bucket is more convenient.
Test-S3Bucket -BucketName pwsh-book-exercises
The
Get-S3Bucket command returns very little information. It only includes the name and creation date of the bucket. If you want to know where the bucket is located, use the
Get-S3BucketLocation command
.
Get-S3BucketLocation -BucketName pwsh-book-exercises
Finally, if you want to delete a bucket, you can use the
Remove-S3Bucket command
. The bucket must be empty before you can delete it or you can add the –
DeleteObjects parameter to delete the contents of a bucket. Of course, you also need to include the
Force option to avoid being prompted for confirmation.
Remove-S3Bucket -BucketName pwsh-book-exercises -Force
Enough about buckets. Let’s put some data in there already. In the next section, we learn how to read and write objects.
Managing Objects
Now that we have a bucket created, we can start to upload files using the
Write-S3Object command
. For example, the following command uploads the local file HelloWorld.txt to the pwsh-book-exercises bucket and saves it as HelloWorld.txt.
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -File 'HelloWorld.txt'
You can also use the
Content parameter to upload data without storing it on the local file system first. For example:
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello World!!!"
If you want to list the objects in a bucket, you use the
Get-S3Object command
.
Get-S3Object does not return the objects, but rather lists the objects and a few attributes. This is equivalent of a
dir in Windows or a
ls on Linux.
Get-S3Object -BucketName pwsh-book-exercises
You can also use
Get-S3Object to discover information about a specific object. For example, the following command will list information about the HelloWorld.txt file we uploaded earlier.
Get-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt'
When you are ready to download a file, you use the
Read-S3Object command
. Unlike
Write-S3Object,
Read-S3Object does not support the content parameter and must be used to write to a file on the local file system. For example, the following command will download the HelloWorld.txt file and overwrite the original copy.
Read-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -File 'HelloWorld.txt'
Obviously we can create a copy of an object by downloading and uploading it with a different name. But, remember that we pay for the bandwidth used. Therefore, it is more efficient to use the
Copy-S3Object to create a copy on the server without transferring the data. For example:
Copy-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -DestinationKey 'HelloWorldCopy.txt'
We can also use
Copy-S3Object
to copy an object from one bucket to another. These buckets can even be in different regions allowing you to move data directly from one region to another without making a local copy.
Copy-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -DestinationBucket pwsh-book-exercises-02' -DestinationKey 'HelloWorldCopy.txt'
When you no longer need an object, you can delete it using the
Remove-S3Object command
. Remember to use the
Force option to avoid the confirmation prompt.
Remove-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorldCopy.txt' -Force
Now that we know how to create and use objects, let’s look at how we can use folders to organize them.
Managing Folders
In the previous examples, we copied objects into the root of the bucket. As you add more objects, you will end up with a confusing mess. However, we can use folders to organize objects. For example, we could have uploaded the HelloWorld.txt file into a folder called MyFolder by modifying the Key.
Write-S3Object -BucketName pwsh-book-exercises -Key 'MyFolder/HelloWorld.txt' -File 'HelloWorld.txt'
If you want to list the files in a folder, use the
KeyPrefix parameter
with
Get-S3Object.
Get-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'MyFolder'
You may find that on occasion you want to make an empty folder appear in the AWS Management Console. To create an empty folder, just create a dummy object that has a key that ends with a slash.
Write-S3Object -BucketName pwsh-book-exercises -Key 'EmptyFolder/' -Content "Dummy Content"
The
KeyPrefix (or folder) can be really useful. One great feature is the ability to upload an entire directory of files with a single command. For example, the following command will upload all the files in the C:aws folder and prefix all the files with “utils/.”
Write-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'utils' -Folder 'c:aws'
The previous command will ignore subfolders, but there is also an option to recursively upload all files in all of the subfolders.
Write-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'utils' -Folder 'c:aws' -Recurse
When you read files, you can use the
KeyPrefix parameter to download all files that begin with a certain string. Rather than using the
File parameter
as we did in a previous command, you use the
Folder parameter. The
Folder parameter specifies where to put the files on the local file system. Note that
Read-S3Object is always recursive.
Read-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'utils' -Folder 'c:aws'
On occasion you may find that you want to upload files that match a certain pattern. For example, you can upload all executables in the c:aws folder by using the
SearchPattern parameter
.
Write-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'utils' -Folder 'c:aws' -SearchPattern '*.exe'
Unfortunately, there is no
SearchPattern attribute on
Read-S3Object. We can use a combination of
Get-S3Object and
Read-S3Object to produce a little PowerShell magic. For example:
Get-S3Object -BucketName pwsh-book-exercises -KeyPrefix 'utils' |
Where-Object {$_.Key -like '*.exe'} | % {
Read-S3Object -BucketName $_.BucketName -Key $_.Key -File ('C:' + $_.Key.Replace('/',''))
}
As you can see, folders are a really powerful way to act on multiple objects at once. Next, we will look at how to deal with large numbers of files.
Managing Public Access
Many buckets require
public or anonymous access. For example, we might be using S3 to store images for a web site or the installer for our latest application. In both cases we want the objects to be available to the general public. To make an object public, you can add the
PublicReadOnly attribute to
Write-S3Object cmdlet. For example:
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello World!!!" -PublicReadOnly
In
addition, you can make the bucket public read only. That does not make every object in the bucket public; it means anyone can list the contents of the bucket. You still have to mark the individual objects as public when you upload them.
New-S3Bucket -BucketName pwsh-book-exercises –PublicReadOnly
You can also configure a bucket to allow anonymous users to write to a bucket. For example, you might allow customers to upload log files to your server so you can help debug an issue they are having. In general it is dangerous to allow unauthenticated user to upload files. Not only could the individual files be dangerous, but you are also charged for files they upload. If you allow anonymous uploads, there is nothing stopping a nefarious user from uploading large amounts of data, costing you thousands of dollars. If you still want to create a bucket with anonymous read/write access, you can use the
PublicReadWrite attribute with
New-S3Bucket. For example:
New-S3Bucket -BucketName pwsh-book-exercises -PublicReadWrite
We
will discuss Identity and Access Management in detail in the next chapter.
Managing Versions
Often you want to store multiple versions of a document as you make changes. You may have regulatory requirements that demand it, or you may just want the option to roll back. S3 supports this through bucket versioning.
When you enable versioning, S3 stores every version of every document in the bucket. If you overwrite an object, AWS keeps the original. If you delete a document, AWS simply marks the document as deleted, but keeps all the prior versions. When you read a document, AWS returns the latest version, but you can always request a specific version.
Before we enable versioning, let’s overwrite the HelloWorld document we created earlier so we have a clean starting point. When you do, the old copy is replaced by this new copy.
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello World Version 1!!!"
Now, let’s enable versioning. Versioning is always enabled at the bucket. You cannot enable versioning within a specific folder. To enable versioning, use the
Write-S3BucketVersioning command
.
Write-S3BucketVersioning -BucketName pwsh-book-exercises -VersioningConfig_Status 'Enabled'
Now that versioning is enabled, let’s overwrite the HelloWorld document. You do not have to do anything special to create a version. Just write the new object and S3 will create a new version and retain the original.
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello Version 2!!!"
If you were to call
Get-S3Object
, you would not see any difference. In fact, all of the commands we have used so far are unaffected by versioning. The following command will return the latest version, which you can verify by checking the date:
Get-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt'
To list the versions of all the objects in a bucket, use the
Get-S3Version command
. Note that
Get-S3Version returns a complicated structure. You can ignore most of it and use the
Versions property to list the versions. For example:
(Get-S3Version -BucketName pwsh-book-exercises).Versions
Unfortunately, there is no way to specify a specific object, only a prefix. Often this is enough. For example, you could get the versions of our HelloWorld.txt document like this:
(Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.txt').Versions
But, there are times when the prefix is not unique. For example, if we had both HelloWorld.doc and HelloWorld.docx in a folder, it is impossible to list the versions of HelloWorld.doc without getting HelloWorld.docx. Therefore, it is best to check the versions you get back by piping it to
Where-Object.
(Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.doc').Versions | Where-Object {$_.Key -eq 'HelloWorld.doc'}
If you want to download a specific version of a document, the
Read-S3Object accepts a version parameter. First, you have to get the version using
Get-S3Version. Note that
Get-S3Version returns an array and the array is sorted in reverse order so that the latest version is position 0. Once you find the version you want, you can pass the ID to
Read-S3Object
. For example:
$Versions = (Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.txt').Versions
Read-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Version $Versions[1].VersionId -File 'versiontest.txt'
If you check the versiontest.txt file, you can verify that it contains the content from version 1, “Hello World version 1!!!” You can delete a version the same way:
Remove-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -VersionId $Versions[1].VersionId
When you delete a version, it is physically removed from the bucket. But, when you call
Remove-S3Object
without specifying a VersionId, S3 simply marks the object as deleted. If you delete an object and then call Get-S3Object, it appears that the object is gone.
Remove-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Force
Get-S3Object -BucketName pwsh-book-exercises
However, if you list the versions, you will see that there is a new version called a delete marker.
(Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.txt').Versions
Note that the delete marker has the attribute
IsDeleteMaker=True and a size of 0. You can still access the old versions by specifying a version ID. For example:
$Versions = (Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.txt').Versions
Read-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Version $Versions[1].VersionId -File 'deletetest.txt'
You can also undelete an object by removing the delete marker. Just find the version with
IsDeleteMaker=True and use
Remove-S3Object to remove it.
$Marker = (Get-S3Version -BucketName pwsh-book-exercises -Prefix 'HelloWorld.txt').Versions | Where-Object {$_.IsDeleteMaker -eq $true}
Remove-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -VersionId $Marker.VersionId -Force
Once you have versioning enabled, you cannot disable it, but you can choose to suspend versioning. When versioning is suspended, the existing versions are maintained but new versions are not created. To suspend versioning, call
Write-S3BucketVersioning
and set the status to
Enabled.
Write-S3BucketVersioning -BucketName pwsh-book-exercises -VersioningConfig_Status 'Suspended'
As you can imagine, versioning, combined with 99.99999999% durability, will ensure that you almost never lose an object again. Of course, storing objects forever can get expensive. In the next section, we will explore life-cycle policies to manage aging objects.
Using Life-Cycle Management and Glacier
Over time you will accumulate a vast collection of objects. Sometimes you want to save these forever, but usually you do not. You may need to keep to certain documents for a specified period of time. For example, the Sarbanes-Oxley Act, enacted after the Enron collapse, recommends that you keep ledgers for 7 years and invoices for 3.
Obviously you have the tools to create a PowerShell script to delete objects older than a certain date. But, S3 also has a built-in life-cycle policy that can manage retention for you. In addition, life-cycle management can be used to copy objects to a cold storage solution called Glacier.
Glacier provides the same high durability as S3 for about 25% the price. The trade-off is that objects stored in Glacier are not immediately available. You have to request that objects be restored, which takes up to 12 hours.
We describe the policy using a series of .Net objects. Let’s assume our bucket holds log files from a web server running on EC2. The development team often refers to the logs to diagnose errors, but this almost always happens within a few hours of the error occurring. In addition, the security team requires that we maintain logs for 1 year. Therefore, we decide to keep the logs online, in S3, for 1 week. After 1 week, the logs are moved to cold storage, in Glacier, for 1 year. After 1 year the logs can be deleted.
First, we define a
life-cycle transition. The transition defines how long the logs are maintained in S3 and where to move them after. The policy is always defined in days. The transition also defines the storage class to move the document to. In the following example, I am moving the object to Glacier. You can also move an object to Infrequent Access (S3-IA) storage, but I am not going to cover that here.
$Transition = New-Object Amazon.S3.Model.LifecycleTransition
$Transition.Days = 7
$Transition.StorageClass = "Glacier"
Next, we define the
expiration policy. The expiration policy defines how long to keep the object before it is deleted. In this case, I am keeping the object for 365 days. Note that the expiration is defined from the day the object was first uploaded to S3, not the day it was transitioned to Glacier.
$Expiration = New-Object Amazon.S3.Model.LifecycleRuleExpiration
$Expiration.Days = 365
Now that we have both the transition and expiration defined, we can combine them into a single rule and apply it to the bucket. Note that you do not need to define both the transition and expiration. Some rules only define a transition, and the object is maintained in Glacier until you manually delete it. Other rules only define an expiration and the document is deleted from S3 without being transitioned.
$Rule = New-Object Amazon.S3.Model.LifecycleRule
$Rule.Transition = $Transition
$Rule.Expiration = $Expiration
$Rule.Prefix = "
$Rule.Status = 'Enabled'
Write-S3LifecycleConfiguration -BucketName pwsh-book-exercises -Configuration_Rules $Rule
Sometimes you want to have different rules applied to each folder in a bucket. You can define a
folder-level rule by adding a prefix. For example:
$Rule = New-Object Amazon.S3.Model.LifecycleRule
$Rule.Transition = $Transition
$Rule.Expiration = $Expiration
$Rule.Prefix = "logs/"
$Rule.Status = 'Enabled'
Write-S3LifecycleConfiguration -BucketName pwsh-book-exercises -Configuration_Rules $Rule
Now, let’s assume a user of our web site claims his data was deleted a few months ago and we need to understand why. We need to pull the log files from July 22 to diagnose the cause. First we check if the object exists and where it is by using
Get-S3Object
. For example:
Get-S3Object -BucketName pwsh-book-exercises -Key 'logs/2013-07-22.log'
This command returns the following output. Note that the log files have been moved to Glacier, but have not yet been deleted.
Key : logs/2013-07-22.log
BucketName : pwsh-book-exercises
LastModified : Mon, 22 July 2013 23:59:39 GMT
ETag : "793466320ce145cb672e69265409ffeb"
Size : 1147
Owner : Amazon.S3.Model.Owner
StorageClass : GLACIER
To restore the object, we use the
Restore-S3Object command
.
Restore-S3Object requires the bucket and key. In addition, the
Days parameter defines how long to keep the object in S3. In the following example, I request that the object be restored for 7 days. This should be plenty of time to figure out what happened to our user’s data. After 7 days, the object is automatically deleted from S3, but is still stored in Glacier until the expiration date.
Restore-S3Object -BucketName pwsh-book-exercises -Key '/logs/2013-07-22.log' -Days 7
If you want to remove the life-cycle policy from a bucket, you can use the
Remove-S3LifecycleConfiguration command. For example:
Remove-S3LifecycleConfiguration -BucketName pwsh-book-exercises
Life-cycle policies provide an easy solution to managing storage classes and minimizing your S3 spend. Next we look at replicating data from one bucket to another.
Cross-Region Replication
As I already mentioned, S3 provides 99.999999999% durability and 99.99% availability. This is enough for most any use case; however, there may be times that you want even greater durability or availability. S3 replication allows you to replicate objects in one bucket to second bucket. If S3 were to fail in one region, you could still access your objects in another region. Let’s set up replication from Northern Virginia to Ohio.
I’ll start by creating a new
bucket in Northern Virginia (us-east-1) and enabling version. Note that versioning must be enabled on the source bucket.
New-S3Bucket -BucketName pwsh-book-exercises-source -Region us-east-1
Write-S3BucketVersioning -BucketName pwsh-book-exercises-source -VersioningConfig_Status 'Enabled'
Next, I will create a second bucket in Ohio (us-east-2). We need to enable replication again.
New-S3Bucket -BucketName pwsh-book-exercises-destination -Region us-east-2
Write-S3BucketVersioning -BucketName pwsh-book-exercises-destination -VersioningConfig_Status 'Enabled'
Now that the buckets are created, we need to create an IAM
role that grants S3 permission to access our data. The following policy allows S3 read from the source bucket and write to the destination bucket. If you need to review IAM roles and policies, go back to Chapter
.
$AssumeRolePolicy = @"
{
"Version":"2008-10-17",
"Statement":[
{
"Sid":"",
"Effect":"Allow",
"Principal":{"Service":"s3.amazonaws.com"},
"Action":"sts:AssumeRole"
}
]
}
"@
$AccessPolicy = @"
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"s3:GetReplicationConfiguration",
"s3:ListBucket"
],
"Resource":[
"arn:aws:s3:::pwsh-book-exercises-source"
]
},
{
"Effect":"Allow",
"Action":[
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTagging"
],
"Resource":[
"arn:aws:s3:::pwsh-book-exercises-source/*"
]
},
{
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:ReplicateDelete",
"s3:ReplicateTags"
],
"Resource":"arn:aws:s3:::pwsh-book-exercises-destination/*"
}
]
}
"@
$Role = New-IAMRole -RoleName 'CrossRegionReplication' -AssumeRolePolicyDocument $AssumeRolePolicy
Write-IAMRolePolicy -RoleName $Role.RoleName -PolicyName 'ReplicateSourceToDestination' -PolicyDocument $AccessPolicy
Now we can
configure replication. Similar to life-cycle rules we just covered, we are going to use a .Net object to describe the replication rule. Each rule has an ID to help you keep track of what is being replicated. The destination is a second .Net object and identifies the destination bucket by Arn.
$Rule = New-Object Amazon.S3.Model.ReplicationRule
$Rule.Id = 'MyFirstRule'
$Rule.Status = 'Enabled'
$Rule.Destination = New-Object Amazon.S3.Model.ReplicationDestination
$Rule.Destination.BucketArn = 'arn:aws:s3:::pwsh-book-exercises-destination'
Finally, we call Write-S3BucketReplication to configure the rule and specify the source bucket to apply the rule to.
Write-S3BucketReplication -BucketName pwsh-book-exercises-source -Configuration_Rule $Rule -Configuration_Role $Role.Arn
At this point S3 will begin to replicate changes from the source bucket to the destination bucket. In the preceding rule, I am replicating everything in the bucket. If you want to replicate specific folders, you can modify the prefix. Note that you must delete MyFirstRule before adding this one.
$Rule = New-Object Amazon.S3.Model.ReplicationRule
$Rule.Id = 'MySecondRule'
$Rule.Prefix = 'MyFolder/'
$Rule.Status = 'Enabled'
$Rule.Destination = New-Object Amazon.S3.Model.ReplicationDestination
$Rule.Destination.BucketArn = 'arn:aws:s3:::pwsh-book-exercises-destination'
In the following examples, the destination copy is using the same storage class as the source copy. You might want to replicate to a different storage class. For example, you might want to save money on the second copy by using infrequent access. Infrequent access stores data for about half the cost of standard, but you are charged to read the data. This makes sense if the second copy will only be read in the rare case that the primary copy fails.
You can cover destination storage class by specifying it in the destination object. For example:
$Rule = New-Object Amazon.S3.Model.ReplicationRule
$Rule.Id = 'MyThirdRule'
$Rule.Status = 'Enabled'
$Rule.Destination = New-Object Amazon.S3.Model.ReplicationDestination
$Rule.Destination.BucketArn = 'arn:aws:s3:::pwsh-book-exercises-destination'
$Rule.Destination.StorageClass = 'STANDARD_IA'
We are getting close to the end. Before we move on, let’s cover tagging.
Tagging
We have seen the power of tagging in EC2. S3 also supports tagging at the bucket and object level. To tag a bucket, create a tag using the
Write-S3BucketTagging command
and a few .Net classes. For example:
$Tag = New-Object Amazon.S3.Model.Tag
$Tag.Key = 'Owner'
$Tag.Value = 'Brian Beach'
Write-S3BucketTagging -BucketName pwsh-book-exercises -TagSets $Tag
You can also get the tags using the
Get-S3BucketTagging command
Get-S3BucketTagging -BucketName pwsh-book-exercises
And, you can remove all tags using the
Remove-S3BucketTagging command
Remove-S3BucketTagging -BucketName pwsh-book-exercises -Force
Tagging individual objects is similar. We can use the Write-S3ObjectTagSet command to add tags to an object. For example:
$Tags = New-Object Amazon.S3.Model.Tag
$Tags.Key = "Owner"
$Tags.Value = "Brian Beach"
Write-S3ObjectTagSet -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Tagging_TagSet $Tags
Just like buckets, you can query the tags for object using the Get-S3ObjectTagSet command.
Get-S3ObjectTagSet -BucketName pwsh-book-exercises -Key 'HelloWorld.txt'
And, of course you can delete them as well. Don’t forget to add the force attribute to suppress the confirmation dialog.
Remove-S3ObjectTagSet -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Force
In the next section, we will look at a few miscellaneous commands and then move on to the exercises.
Miscellaneous S3 Options
In this section we will look at a few miscellaneous options, none of which are big enough to warrant their own section.
Pagination
As you add more and more objects to S3, it can become very difficult to sort through them all. AWS gives you the ability to list files in batches. This is really convenient if you are trying to display the objects on a web page or other user interface.
Imagine you have hundreds of files in a bucket and you need to browse through them all. The following example will return the first ten objects in the bucket:
$Objects = Get-S3Object -BucketName pwsh-book-exercises -MaxKeys 10
After you
browse through these first ten, you want to get ten more. You can use the
MaxKeys parameter to tell the S3 to return the next ten objects. For example:
$Objects = Get-S3Object -BucketName pwsh-book-exercises -MaxKeys 10 -Marker $Objects[9].Key
Encryption
When you upload an
object to S3, you can have S3 encrypt the file before saving it. To enable encryption, use the ServerSideEncryption parameter.
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello World!!!" -ServerSideEncryption AES256
Encryption is a critical part of your security strategy, but maintaining an audit log is equally important. Let’s look at how to enable logging.
Logging
S3 supports access
logs to audit access to the objects in your bucket. When you enable logging, S3 writes log files to the bucket you specify. In this example I am going to create a new bucket to hold all my log files. I am adding the log-delivery-write ACL to grant the logging service the ability to write log files.
New-S3Bucket -BucketName pwsh-book-exercises-logging -CannedACLName log-delivery-write
Now I am going to enable logging on pwsh-book-exercises that writes log files to MyLoggingBucket I just created. You should specify a prefix so you can keep track of which logs came from which sources.
Write-S3BucketLogging -BucketName pwsh-book-exercises -LoggingConfig_TargetBucketName pwsh-book-exercises-logging -LoggingConfig_TargetPrefix 'Logs/pwsh-book-exercises/'
If you do enable logging, you should consider enabling a life-cycle policy to clean up the log files over time. Let’s wrap up with a quick look at the ability to manage content types.
Content Type
When you upload an
object, the content type is set to “application/octet-stream.” You can optionally include the content type to tell the client what type of file it is. For example, your browser will always download files of type “application/octet-stream”. If you want the browser to display the file, change the type to “text/plain.”
Write-S3Object -BucketName pwsh-book-exercises -Key 'HelloWorld.txt' -Content "Hello World!!!" -ContentType 'text/plain'
We will see an example of content type used in Exercise 11.1 where we create a static web site.
Exercise 11.2: Using Pre-Signed URLs
At the beginning of this chapter, we discussed enabling anonymous access to a bucket, and I mentioned there is a better way: pre-signed URLs. This is a really simple command to use and does not warrant an exercise of its own, but it is a great opportunity to describe how AWS authentication works using access keys.
Imagine that you run a help desk and you often need to make tools and patches available to customers. You want these tools available only to customers who call the help desk. Furthermore, customers should not be able to download the tools later or share the link with friends. You could create a username and password for the user, but then you have to manage another user. This is a great use case for a pre-signed URL.
A pre-signed URL has been signed with a secret key. In addition, the URL includes an expiration date, after which it can no longer be used. Note that the URL has been signed with the secret key, but does not include the secret key. This allows AWS to prove the authenticity of the URL without exposing the secret key to the customer.
In fact, this is how all AWS web service calls work. Your secret key is never sent to AWS. Whenever we use a PowerShell method, PowerShell creates the request and includes a digital signature to prove that the user knows the secret.
Let’s get back to the help desk. You want to create a pre-signed URL. PowerShell has a command for this called Get-S3PresignedURL
. You need to pass in your access key and secret key as well as the HTTP verb, bucket, key, and expiration date.
Note You should use StoredCredentials rather than passing the access keys explicitly. (See Chapter
for details.) I am including them here only to help explain how the encryption works.
#Authentication Keys
$AccessKey = 'AKIAJ5N3RMX5LGUMP6FQ'
$SecretKey = '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'
#Web Query
$Verb = "GET"
$ExpirationDate = [DateTime]::Parse('2019-01-01')
$Bucket = 'pwsh-book-exercises'
$Key = 'HelloWorld.txt'
Get-S3PreSignedURL -Verb $Verb -Expires $ExpirationDate -Bucket $Bucket -Key $Key -AccessKey $AccessKey -SecretKey $SecretKey
The preceding code will return the following URL, which you can share with your customer. Notice that the URL includes the access key and expiration date we supplied. The expiration date has been converted to seconds from January 1, 1970. In addition, the URL incudes a signature created by the PowerShell command. Also notice that your secret key is not included in the URL.
https://s3.amazonaws.com/MyBucket/MyPath/MyFile.txt?AWSAccessKeyId=
AKIAIQPQNCQG3EYO6LIA&Expires=1388552400&Signature=wBUgYztEdlE%2Btw9argXicUKvftw%3D
You can share this URL with your customer and they can download a single file. They do not have the secret key and therefore cannot use it for anything else. In addition, AWS will refuse it after the expiration date. If the customer changes anything in the URL, he or she will invalidate the signature and AWS will refuse it. What a simple solution to a difficult problem.
While the Get-PreSignedURL method
is really simple to use, this is a great opportunity to see how AWS signatures work. Let’s write our own code to create a signature so we better understand how it works. If you’re not interested, feel free to skip the rest of this example, but remember the Get-S3PreSignedURL method.
First, we will accept the same parameters as the
Get-PreSignedURL command. My method only works for GET requests, but you could easily add support for other HTTP verbs.
Param
(
[string][parameter(mandatory=$true)]$AccessKey,
[string][parameter(mandatory=$true)]$SecretKey,
[string][parameter(mandatory=$false)]$Verb = 'GET',
[DateTime][parameter(mandatory=$true)]$Expires,
[string][parameter(mandatory=$true)]$Bucket,
[string][parameter(mandatory=$true)]$Key
)
Next, we must calculate the expiration. Remember that the expiration is expressed in seconds since January 1, 1970. Also note that I am converting the time to UTC because the AWS servers may be in a different time zone than our client.
$EpochTime = [DateTime]::Parse('1970-01-01')
$ExpiresSeconds = ($Expires.ToUniversalTime() - $EpochTime).TotalSeconds
Then, we need to canonicalize the input parameters to be signed. Before we can sign the data, we must agree on how the data will be formatted. If both sides don’t agree on a common format, the signatures will not match. This process is called canonicalization.
For
AWS, we include the following data separated by a newline character:
HTTP verb
MD5 hash of the content
Content type
Expiration date
Optional HTTP headers
URL-encoded path
In our case, we are only supporting GET; therefore, the content and content type will always be blank. In addition, I am not supporting any HTTP headers.
$Path = [Amazon.S3.Util.AmazonS3Util]::UrlEncode("/$Bucket/$Key", $true)
$Data = "$Verb`n`n`n$ExpiresSeconds`n$Path"
Now that we have the canonicalized data, we can use the .Net crypto libraries to sign it with our secret key. Here I am using the
SHA1 algorithm to generate the signature. Note that you must be very careful with how data is encoded. The secret key must be UTF8 encoded, and the resulting signature must be URL encoded.
$HMAC = New-Object System.Security.Cryptography.HMACSHA1
$HMAC.key = [System.Text.Encoding]::UTF8.GetBytes($SecretKey);
$signature = $HMAC.ComputeHash(
[System.Text.Encoding]::UTF8.GetBytes($Data.ToCharArray()))
$signature_encoded = [Amazon.S3.Util.AmazonS3Util]::UrlEncode(
[System.Convert]::ToBase64String($signature), $true)
Finally, we can build the URL. The result should be identical to what Get-PreSignedURL returned earlier.
"https://s3.amazonaws.com/$Bucket/$Key" + "?AWSAccessKeyId=$AccessKey&Expires=$ExpiresSeconds&Signature=$signature_encoded"
That may have been a bit more than you wanted to know, but now that you know how to sign a request, you can call the S3 web service methods directly in any language.
Summary
In this chapter, we reviewed Simple Storage Service (S3). S3 allows you to store a seemingly limitless number of objects in the cloud. We learned to create and manage buckets and folders, and we learned to upload and download objects.
We learned how versioning can be used to store multiple versions of a document as it changes over time. We also learned to use life-cycle policies to create retention rules and how to use Glacier cold storage to reduce costs for long-term storage.
In the exercises, we created a static web site hosted entirely in S3 and then learned to create a pre-signed URL that can be shared without needing AWS credentials. We also learned how AWS uses digital signatures in authentication. In the next chapter, we will learn how to use PowerShell to automate Identity and Access Management.