Azure Blob Storage Versioning

Learn via video courses
Topics Covered

Introduction

Azure Blob Storage provides a highly scalable and cost-effective solution for storing and managing various types of unstructured data. This service is designed to accommodate a wide range of data, from text and images to videos and log files. Azure Blob Storage presents three fundamental resources for data management needs:

  1. Blobs: Blobs, or Binary Large Objects, are the primary entities stored within Azure Blob Storage. Blobs can be classified into three types:

    • Block Blobs: Ideal for large-scale storage and streaming of data, block blobs break down files into blocks and upload them in parallel for efficient data transfer.
    • Append Blobs: Append blobs are optimized for scenarios requiring high-speed append operations, such as logging or recording telemetry data.
    • Page Blobs: Commonly used for virtual machine disks, page blobs offer random access to read and write data on fixed-size pages.
  2. Containers: Logical containers within Azure Blob Storage, allow you to organize and manage your blobs effectively.

  3. Storage Accounts: Storage accounts are the top-level entities that provide a unique namespace for your storage resources and are associated with a specific Azure region.

configure-blob-storage

However, as data undergoes changes, it's essential to maintain a version history for audit trails, compliance, and disaster recovery purposes. This is where Azure Blob Storage Versioning comes into play.

What is Azure Blob Storage Versioning?

Azure Blob Storage Versioning is a feature that allows users to keep track of different versions of a blob (file) in a container. It ensures that any updates or deletions made to a blob are recorded, preserving the previous states of the object. This feature is part of the data protection strategy provided for Azure along with Container soft delete(restore deleted container) and Blob soft delete(restore deleted blob).

How Blob Versioning Works

Blob Versioning operates by assigning a unique identifier, known as a version ID, to each version of a blob. This version ID is immutable and can be used to access a specific version of the blob. When a blob is updated or deleted, a new version with a distinct version ID is created.

Consider a scenario where a text document is uploaded as a blob.

  • Initially, there exists only one version - the current version.
  • If modifications are made to the document, the current version transitions to a previous version, and a new version is generated to capture the updated content.
  • The new version then becomes the current one.

versioning-overview

Version ID

  • A version ID is a timestamp indicating when the version was created.
  • You can perform specific operations on a particular version by providing its version ID. If the version ID is omitted, the operation targets the current version.

For instance, after a write operation, Azure Storage returns the x-ms-version-id header in the response, containing the version ID for the newly created current version.

Versioning on Write Operations

  • With blob versioning enabled, every write operation such as uploading, modifying or copying to a blob triggers the creation of a new version.
  • Certain operations, like Put Page for page blobs and Append Block for append blobs, don't create new versions. To capture changes from these operations, manual snapshots are recommended.

Versioning on Write Operations

Versioning on Delete Operations

  • Deleting a blob without specifying a version ID results in the current version becoming a previous version, effectively removing the current version status.
  • All existing previous versions are retained.
  • You can also target a specific version for deletion by providing its version ID. In case blob soft delete is enabled, the version is retained in the system until the soft delete retention period expires.

The following images illustrate the operation of deleting a blob and creating a new blob version with new data.

blob and creating

new blob version

Access Tiers

There are different access tiers in Azure Blob Storage:

  1. Hot Access Tier: Designed for data that are frequently accessed or modified with low-latency access and higher cost.
  2. Cool Access Tier: Suitable for less frequently accessed data which still requires quick retrieval. It has lower storage costs compared to the Hot tier.
  3. Archive Access Tier: Used for data that is rarely accessed. It offers the lowest storage costs but comes with a retrieval time measured in hours.

Azure Storage allows for the movement of any version of a block blob, including the current one , to a different blob access tier. This is accomplished through the Set Blob Tier operation, which can lead to potential cost savings by placing older versions in cooler storage tiers.

Enable or Disable Blob Versioning

Enabling or disabling Blob Storage Versioning can be done in the Data protection section of the storage account's settings.

Disable Blob Versioning

Disabling Blob Versioning in Azure Storage has the following effects:

  • Disabling Blob Versioning does not erase existing blobs, versions, or snapshots. Instead, it halts the creation of new versions.
  • Modifying the current version after versioning is disabled results in the creation of a blob that isn't versioned. Subsequent updates overwrite the data without retaining the previous state.

It's essential to note that object replication relies on Blob Versioning. Therefore, before disabling versioning, any object replication policies on the account must be deleted.

Example:

  • Let's say a file, 'blob.txt', is initially uploaded with Blob Versioning enabled. This creates Version 1 (V1).
  • After making two revisions, new Version 2 (V2) and 3 (V3) are generated to capture the updated content.
  • If Blob Versioning is disabled and further changes are made, the current version isn't versioned, and any existing versions remain accessible as previous versions.

existing versions

Here is the difference in data management behavior when the blob versioning is enabled or disabled.

AspectBlob Versioning EnabledBlob Versioning Disabled
Version CreationAutomatically creates a new version on modification or creation.Halts the creation of new versions.
Previous VersionsPrevious versions are accessible and preserved.Previous versions remain accessible and preserved.
Overwriting Current VersionDoesn't overwrite current version; creates a new version.Overwrites current version; no new version is created.

Blob Versioning and Soft Delete

The soft delete feature provides an added layer of protection by retaining deleted blobs and their versions for a specified retention period. This minimizes data loss by accidental deletion.

Let us explore different cases of how this feature can help us in data protection.

Overwriting a Blob:

With both features active, overwriting a blob automatically generates a new version. This new version isn't subject to soft deletion and remains intact even after the soft-delete retention period expires. No soft-deleted snapshots are created in this process.

Deleting a Blob or Version:

  • When you delete a blob, the current version transitions to a previous version. Unlike when Blob Versioning is the only enabled feature, no new version is created, and soft-deleted snapshots aren't generated. The soft delete retention period doesn't apply to the deleted blob.
  • Deleting a previous version initiates soft deletion for that specific version. The soft-deleted version persists until the soft-delete retention period is over, after which it's permanently removed.

Deleting a Blob

Restoring a Soft-Deleted Version:

  • Azure provides the Undelete Blob operation to restore soft-deleted versions within the designated retention period.
  • This operation restores all soft-deleted versions associated with the blob. Note that it's not possible to selectively restore a single soft-deleted version.
  • It's important to understand that restoring soft-deleted versions does not automatically promote any version to become the current version.
  • To restore the current version, you must first reinstate all soft-deleted versions and then utilize the Copy Blob operation to create a new current version.

Soft-Deleted Version

Blob Versioning and Blob Snapshots

  • Blob Snapshots offer a point-in-time copy of a blob, allowing users to create read-only versions for backup or archival purposes.
  • When used together with Blob Storage Versioning, snapshots enhance data resilience by providing additional layers of redundancy.

We can also take snapshots of the versioned blow and this will create a new version and a new current version is simultaneously created. This is shown in the diagram below,

Blob Snapshots

In the diagram, versions and snapshots with version IDs 2 and 3 contain identical data, illustrating the simultaneous creation of versions and snapshots.

Authorize Operations on Blob Versions

Azure provides robust authentication and authorization mechanisms for managing permissions effectively:

Azure Role-Based Access Control (Azure RBAC):

  • Azure RBAC enables precise control over access rights, enhancing security.
  • It grants permissions to an Azure Active Directory (Azure AD) security principal.
  • Microsoft highly recommends using Azure AD for its robust security measures and user-friendly experience.

Deleting a blob version necessitates specific permissions to avoid accidental or unauthorized deletions

DescriptionOperationRBAC Action RequiredBuilt-in Role Support
Deleting the Current VersionDelete BlobdeleteStorage Blob Data Contributor
Deleting a Previous VersionDelete BlobdeleteBlobVersion/actionStorage Blob Data Owner

Shared Access Signatures (SAS):

  • SAS tokens offer a flexible way to grant limited access to Azure Storage resources, enhancing security and control
  • Delegates access to blob versions using a SAS token. Specify the version ID (bv) to create a token for operations on a specific version.

Account Access Keys:

  • Shared Key authentication provides a direct, secure method for authorization.
  • Utilizes the account access keys to authorize operations on blob versions with Shared Key.

Pricing and Billing

Versioning Costs:

Blob versions, much like snapshots, have charged at the same rate as active data. The billing structure depends on whether you've explicitly set the tier for the current or previous versions of a blob (or snapshots).

Billing Based on Blob Tiers:

  • If you haven't altered the tier for a blob or its versions, charges apply for unique data blocks across the blob, its versions, and any associated snapshots.
  • When you modify the tier for a blob or its version, charges encompass the entire object, regardless of future tier changes.

Considerations:

  • For frequently overwritten data, enabling versioning may lead to higher storage capacity costs.
  • Versioning can potentially introduce additional latency during listing operations.

To address these concerns, consider moving frequently updated data into a separate storage account without versioning enabled.

Scenarios

Unique Block Charges:

If blob versions don't have explicitly set tiers, charges apply for unique data blocks across all versions and any associated snapshots. Shared data across versions is billed only once.

Unique Block Charges

Blob has a previous version, but no updates since its creation. Charges apply only for unique blocks (1, 2, and 3).

Replacing Blocks:

Replacing a block within a block blob results in it being charged as a unique block. Even if it has the same ID and data as in the previous version, it's considered distinct, incurring additional charges.

Replacing Blocks

One block is updated (block 3), even though it has the same ID and data as before. It's considered a new unique block, resulting in charges for four blocks.

No Identical Data Determination:

Blob storage treats every uploaded and committed block as unique, regardless of block ID or content. This means even identical blocks are billed separately.

No Identical Data Determination

One block is updated (block 3), even though it has the same ID and data as before. It's considered a new unique block, resulting in charges for four blocks.

Optimizing Update Operations:

With versioning enabled, use operations like Put Block and Put Block List for fine-grained control over blocks. Put Blob, however, replaces the entire content, potentially leading to extra charges.

Optimizing Update Operations

The current version is entirely updated and contains none of its original blocks. Charges apply for all eight unique blocks - four in the current version and four combined in the two previous versions. This occurs with Put Blob operations, replacing the entire blob content.

Feature Support

Blob Storage Versioning is supported across various Azure Blob types, including block blobs, append blobs, and page blobs. This ensures seamlessly integrated of Blob Storage Versioning into your workflow.

Conclusion

  • Azure Blob Storage Versioning allows you to capture and manage different versions of your data, protecting against accidental deletions or overwrites.
  • Azure Blob Storage offers three primary resources for data storage: block blobs, page blobs, and append blobs. Storage Versioning is supported for blockBlob and appendBlob only.
  • Versioning is triggered on write operation to a blob and is managed through unique version IDs, and each blob can have a single current version at a time.
  • Overwriting a blob creates a new version, even when Soft Delete is enabled. You can restore soft-deleted versions during the retention period.
  • Authorization for blob versions can be done through Azure RBAC, SAS tokens, or account access keys. Careful consideration of access control helps prevent unauthorized deletions.
  • Blob Snapshots are read-only versions for backup or archival purposes and can be taken even in versioned blobs, creating new versions simultaneously.
  • Blob versioning is billed at the same rate as active data and depends on whether blob tiers are explicitly set. Features such as Lifecycle Management can be utilized to reduce cost.