Lifecycle Management in Azure Blob

Learn via video courses
Topics Covered

Overview

Among the various storage options in cloud computing, managing data efficiently is a major concern. Azure Blob Storage provides a reliable and cost-effective solution for storing various types of unstructured data.

However, as data accumulates, it becomes crucial to implement effective management strategies. The Lifecycle Management in Azure Blob feature allows users to automate the process of data management.

What is Lifecycle Management in Azure Blob?

Lifecycle Management in Azure Blob is a feature that enables users to define rules and policies for the automated management of their data stored in Blob Storage. This feature uses access tiers of blob storage to optimize storage costs, improve data accessibility, and ensure compliance with data retention policies.

There are different access tiers in Azure Blob Storage:

  1. Hot Access Tier:
    Designed for data that are frequently accessed or modified with low-latency access and higher cost.
  2. Cool Access Tier:
    Suitable for less frequently accessed data which still requires quick retrieval. It has lower storage costs compared to the Hot tier.
  3. Archive Access Tier:
    Used for data that is rarely accessed. It offers the lowest storage costs but comes with a retrieval time measured in hours.

Here are some key capabilities of Lifecycle Management:

  • Immediate Transition:
    You can instantly move blobs from cool or cold storage tiers to the hot tier as soon as they're accessed.
  • Tier Transition based on Access:
    Current versions, previous versions, or snapshots of a blob can be moved to a cooler storage tier if they haven't been accessed or modified for a specific period.
  • Automated Deletion:
    Lifecycle Management allows for the automatic deletion of blobs, whether they are current versions, previous versions, or snapshots, once they reach the end of their lifecycles.
  • Scheduled Rule Execution:
    Rules can be defined at the storage account level to run once per day. This ensures that data management processes are consistently applied.
  • Granular Application of Rules:
    Rules can be applied at both the container level and to specific subsets of blobs. This can be based on name prefixes or blob index tags, offering fine-grained control over the application of policies.

Example:

Consider a scenario where a company stores a large amount of historical data in Azure Blob Storage. This data is rarely accessed but needs to be retained for a certain period due to compliance reasons.

With Lifecycle Management, the company can set up a rule that automatically moves this data to a coole tire tier after a predefined time, followed by archive tire and eventually deleting the data. This optimization significantly reduces storage costs without compromising compliance requirements.

lifecycle management in azure blob

Lifecycle Policy Definition

A lifecycle policy in Azure Blob Storage consists of a set of rules and actions that dictate how objects within a container are managed over time. These policies are defined at the container level and are applied to all objects within that container.

A lifecycle management policy can be represented using a JSON document, which has the criteria and actions governing the management of data within Azure Blob Storage. Below is a sample JSON illustrating a complete rule definition:

Within a lifecycle management policy, there exists a collection of rules. A minimum of one rule is mandatory within a policy, while you can define up to a maximum of 100 rules.

Every rule within a policy is characterized by the following parameters:

  • Name:
    A string representing the rule's name. This can encompass up to 256 alphanumeric characters and is case-sensitive. It must be unique within the policy.
  • Enabled:
    A boolean parameter that provides the option to temporarily disable a rule. If not explicitly set, the default value is true.
  • Type:
    An enum value, Lifecycle that indicates that the rule being defined is specifically related to lifecycle management.
  • Definition:
    An object that specifies the lifecycle rule. This definition is composed of a filter set and an action set.

Lifecycle Management Rule Definition

Within a lifecycle policy, users can define various rules to govern the actions to be taken on specific objects within a container Each rule definition encompasses two key components:

  • The Filter set restricts the rule's actions to a specific subset of objects within a container or specific object names. It serves as a criteria-based filter that refines the scope of the rule.
  • The Action Set applies the designated actions (such as tiering or deletion) to the filtered set of objects based on the specified conditions.

Consider a scenario where a company needs to archive certain types of data for compliance purposes. They have a container named compliance-data where these files are stored. The goal is to move any data that hasn't been modified for 180 days to a lower-cost archival tier. We can achieve this using the following lifecycle management policy.

  • The rule is named compliance-archival and is set to be enabled.
  • Rule Actions:
    Executed on the filtered blobs when the specified conditions are met.
    • tierToCool/tierToCold:
      Move blobs to cooler storage tiers, optimizing for cost.
    • tierToArchive:
      Move blobs to the archive tier for long-term, cost-effective storage.
    • delete:
      Permanently remove blobs based on specified conditions.
  • Action Run Condition:
    The condition to run the action.
    • daysAfterModificationGreaterThan:
      This applies to the current version of a blob and triggers actions if the blob hasn't been modified for a specified number of days.
    • daysAfterCreationGreaterThan:
      This applies to the current version, previous versions, or blob snapshots and is triggered based on the creation time of the blob.
    • daysAfterLastAccessTimeGreaterThan1:
      This applies to the current version when access tracking is enabled and triggers based on the last access time of the blob.
    • daysAfterLastTierChangeGreaterThan:
      Unique to tierToArchive actions and requires the daysAfterModificationGreaterThan condition. It applies after the last tier change of the blob.
  • Filters:
    Limits the rule actions to a specific subset of blobs within the storage account.
    • blobTypes:
      Specifies the type of blob (blockBlob or appendBlob).
    • prefixMatch:
      Limits the rule to blobs with names starting with compliance-data/.

Lifecycle Policy Runs

  • Lifecycle policies in Azure Blob Storage are evaluated by the system at regular intervals, known as policy runs.
  • Azure Blob Storage executes lifecycle policies once a day, providing an automated approach to manage your data.
  • After configuring or modifying a policy, it may take up to 24 hours for the changes to become effective.
  • Once the policy is active, it might take an additional 24 hours for some actions to commence. Consequently, the entire lifecycle policy process can take up to 48 hours to complete.
  • Should you choose to disable a policy, no new policy runs will be scheduled. However, if a run is already underway, it will persist until its completion.

Upon the successful execution of a lifecycle management policy, the platform generates a LifecyclePolicyCompleted event. This event provides a comprehensive report regarding the actions performed as defined by the policy. The following JSON illustrates an example

Explanation:

  • topic:
    Indicates the resource's topic, which includes details about the subscription, resource group, and storage account.
  • subject:
    Specifies the category and type of the event. In this case, it's related to Blob Data Management and LifeCycle Management, providing a Summary Report.
  • eventType:
    Identifies the type of event.
  • eventTime:
    Represents the exact date and time when the event occurred. It's formatted in ISO 8601.
  • id:
    Provides a unique identifier for the event.
  • data:
    Contains detailed information related to the event. This includes:
    • scheduleTime:
      Indicates when the lifecycle policy was scheduled.
    • deleteSummary:
      Offers a summary of the results for blobs scheduled for deletion. It includes the total count of objects, the number of successful deletions, and a list of any errors encountered.
    • tierToCoolSummary:
      Provides a summary of results for blobs scheduled for tier-to-cool operation, including total object count, successful operations, and any encountered errors.
    • tierToArchiveSummary:
      Offers a summary of results for blobs scheduled for tier-to-archive operation, similar to the above.
  • dataVersion:
    Specifies the version of the data schema being used.
  • metadataVersion:
    Indicates the version of the metadata schema.

This LifecyclePolicyCompleted event provides a detailed report of the actions performed by the lifecycle policy, including the number of successful operations and any encountered errors for each type of action.

Examples of Lifecycle Policies

Example - 1: Move Infrequently Accessed Data to a Cooler Tier

Consider a scenario where you have a storage account containing numerous blobs, and you want to optimize costs by moving less frequently accessed data to a cooler storage tier. The following policy can be used to achieve that.

In this example, blobs in the data/documents and backup/archives prefixes will be moved to the cool storage tier if they haven't been accessed for more than 30 days. This helps to reduce storage costs without compromising accessibility.

Example - 2: Archive Data After Ingestion

Suppose you have a container where you ingest large volumes of data, but over time, this data becomes less frequently accessed. To optimize costs, you can set up a policy to automatically transition this data to the archive tier shortly after it's ingested.

Here, blobs in the ingestion/container prefix will be moved to the archive tier immediately after ingestion. This ensures that the data is stored efficiently without incurring unnecessary costs.

Example - 3: Expire Data Based on Age

In many scenarios, data is expected to have a defined retention period. Consider a use case where you want to delete all block blobs that haven't been modified in the last 365 days.

This policy ensures that any block blob not modified within the last year will be automatically deleted, helping to maintain compliance with retention policies.

Example - 4: Delete Data with Blob Index Tags

In some cases, you might want to apply deletion policies based on specific attributes or tags associated with the data. Consider a scenario where you want to delete all block blobs tagged with Project = Contoso.

This policy will delete all block blobs with the specified tag, providing a targeted approach to data management.

Example - 5: Manage Previous Versions

For data that undergoes frequent modification, maintaining previous versions can be crucial. By enabling versioning, you can automatically manage these versions. This policy moves previous versions older than 90 days to the cool tier and deletes versions older than a year.

In this example, the policy ensures that previous versions are managed efficiently, reducing storage costs while maintaining accessibility.

Feature Support

Azure Blob Lifecycle Management is supported for both block blobs and append blobs, providing comprehensive coverage for a wide range of use cases. However, it is important to note that this feature is not currently available for page blobs.

Regional Availability and Pricing

  • Azure Blob Lifecycle Management is available in various Azure regions, ensuring global accessibility.
  • Creating and running Lifecycle Management policies are free, but are billed based on the movement of blobs between different tires.
  • Delete operations are also free of cost, but may involve charges through other services like Microsoft Defender for Storage, if configured.

Known Issues and Limitations

  • Premium block blob storage accounts do not yet support tiering. Tiering is restricted to block blobs in all other account types, excluding append and page blobs.
  • Lifecycle management policies must be read or written in their entirety; partial updates are not supported.
  • Each rule is limited to 10 case-sensitive prefixes and up to 10 blob index tag conditions.
  • Enabling firewall rules for your storage account may block lifecycle management requests. To unblock these requests, exceptions for trusted Microsoft services must be provided.
  • A lifecycle management policy cannot alter the tier of a blob using an encryption scope.
  • The delete action in a lifecycle management policy is incompatible with blobs in an immutable container.

Conclusion

  • Azure Blob Lifecycle Management is a powerful tool for automating data management tasks in Azure Blob Storage.
  • Policy Definition involves creating a JSON document containing rules. Each rule includes a name, type, and definition consisting of filter and action sets.
  • Filters narrow down rule actions to a specific subset of blobs. These include blob types, prefix matches, and blob index tag conditions.
  • Actions are applied based on conditions defined in the policy. These can include tiering, deletion, and more.
  • Policy Runs occur once daily, and changes may take up to 48 hours to complete. Disabling a policy stops future runs but allows ongoing runs to be completed.
  • LifecyclePolicyCompleted Event provides detailed reports on policy actions, including successful and error counts for each action type.
  • Examples of Policies demonstrate scenarios like transitioning to cooler tiers, managing access time, archiving data, expiring data, and more.
  • Known Issues and Limitations include limitations on premium block blob storage, restrictions on partial updates to policies, and considerations for firewall rules.