Best Practices For CDN Origin Storage
Because a CDN is a highly scalable network it handles most requests from edge cache without impacting the content distributor’s origin and application infrastructure. However, the content must be available for retrieval on cache miss or when the request has to pass to the application. Whether the assets are videos, images, files, software binaries, or other objects, they must reside in an origin storage system that is accessible by the CDN. Thus, the origin storage system becomes a critical performance component on cache miss or application request.
Most CDNs have historically offered file-based solutions designed and architected to permanently store content on disk and act as the origin server and location for CDN cache-fill. Other alternatives include object stores, general-purpose cloud storage services including Amazon S3, and CDN-based cloud storage solutions with object-based architectures. What’s interesting about origin storage services within CDNs is that they should be able to offer an advantage over monolithic cloud environments. CDNs are essentially large distributed application platforms, and one of those applications is storage.
To be clear, origin storage is fundamentally different from the caching applications that CDNs also provide. Storage implies some permanence and durability, whereas with caching, the objects are ultimately evicted when they become less popular or expire. CDNs all operate some form of distributed architecture that is connected with last mile telco providers. If storage is distributed throughout the CDN in multiple locations, requests from end-users for content that is not already in cache can be delivered significantly faster from a nearby storage location. However, performance suffers if a request has to traverse the CDN network and potentially the open Internet to access remote origin storage.
Some content distributors have accepted the risk that the availability and durability metrics for single storage locations offered up by cloud storage providers are good enough and their applications will continue to work even if some issues are experienced by their cloud provider. With hindsight and experience though, it is clear that is not always the case, as can be seen from the fallout from the Eastern USA S3 outage in early 2017. The solution offered by Amazon is to use their tools to architect and build your own HA solution and redundancy using multiple storage locations and versioning your objects. This is a complex change in operations from simply uploading content to a single location. The operational overhead and cost of doing this for multi-terabyte or petabyte libraries is significant.
I also hear a lot of customers who focus on the cost of storage at rest but they don’t consider the additional costs of replicating content or all of the storage access fees. For traditional cloud storage workflows, the costs of accessing content can be even more expensive than the costs of storing the content. Content owners should pick a CDN that charges a flat fee to store multiple copies of a customer’s content, without any additional charges for moving content into storage or accessing the content when it is requested by users. In many cases, storage from a CDN is actually more cost effective for customers who need to frequently access content from storage, than from a traditional cloud storage provider. Storing content in multiple locations also allows faster delivery of content that is not already in the cache. While it can be difficult to assign a specific value to delivery performance, the improved customer satisfaction of faster delivery can potentially outweigh any additional cost of replicating content closer to users. Storage costs can potentially be relatively small compared to the benefits of customer satisfaction and retention from the improved performance.
Storing content in multiple locations also allows faster delivery of content that is not already in the cache. While it can be difficult to assign a specific value to delivery performance, the improved customer satisfaction of faster delivery can potentially outweigh any additional cost of replicating content closer to users. Storage costs can potentially be relatively small compared to the benefits of customer satisfaction and retention from the improved performance.
At the Content Delivery Summit last month, I had a conversation with Limelight Networks about what customers are asking for when it comes to origin storage and what a CDN should be doing to provider better performance than cloud storage providers. What Limelight said they have done is consider why and how a company would choose object storage integrated with a CDN, as well as the challenges of migrating content, and architected a solution. So they purpose-built something called Intelligent Ingest, which automates the movement of objects into integrated origin storage based on either audience demand or a manifest of files. In load-on-demand mode, audience requests that require retrieval from origin storage deliver the content and load it into edge cache. In addition, the content is also automatically stored in Limelight’s origin storage services. In manifest mode, content distributors provide a list of content to migrate and parameters to control the rate of migration.
In picking and choosing the best origin storage solution, content owners should look for one that has automatic replication to multiple locations based on regional policies. Customers can choose policies based on audience location such as a single region like the Americas, EMEA or APAC, weighted policies across geographies, or fully global policies. By doing so, content is then automatically positioned appropriately to be close to the audience and future origin storage calls, whether due to cache-miss or refresh checks, are automatically served from the best origin storage location available for that request.
There are a number of workflows and use cases where these features could be useful for customers. For new content production, one could automate the movement of new content into the CDN storage environment as it is published and end users start requesting it. It could also be useful if you are migrating a library from your existing solution to a CDNs origin storage, which Limelight said is a frequent use case. Enabling load-on-demand lets audience requests determine which assets to migrate, and providing a manifest of files automates migration of those assets. Another use case is pre-positioning, or what is sometimes known as pre-caching or cache warming. In advance of a launch, the CDN could distribute all the necessary files across their origin storage and when the launch is pushed live, the CDN handles the subsequent traffic spikes, offloading demand from the customer’s infrastructure.
Looking at the CDN market today, it is clear the emphasis is not only on high quality and highly efficient delivery solutions but also on the range of services provided to help manage production workflows and improve the end-user experience. Moving more logic to the CDN edge to incorporate smart solutions for request handling — and as discussed above, automating content asset migration and distribution to improve performance and QoE — are areas where CDNs can make a clear difference.