Object storage
Object storage (also referred to as object-based storage) is a general term that refers to the way in which we organize and work with units of storage, called objects. Every object contains three things:
-
The data itself. The data can be anything you want to store, from a family photo to a 400,000-page manual for assembling an aircraft.
-
An expandable amount of metadata. The metadata is defined by whoever creates the object storage; it contains contextual information about what the data is, what it should be used for, its confidentiality, or anything else that is relevant to the way in which the data is used.
-
A globally unique identifier. The identifier is an address given to the object in order for the object to be found over a distributed system. This way, it’s possible to find the data without having to know the physical location of the data (which could exist within different parts of a data center or different parts of the world).
Object storage works very well for unstructured data sets where data is generally read but not written-to. Static Web content, data backups and archival images, and multimedia (videos, pictures, or music) files are best stored as objects. Databases in an object storage environment ideally have data sets that are unstructured, where the use cases suggests the data will not require a large number of writes or incremental updates.
Geographically distributed back-end storage is another great use case for object storage. The object storages applications present as network storage and support extendable metadata for efficient distribution and parallel access to objects. That makes it ideal for moving your back-end storage clusters across multiple data centers.
Object storage took off because it greatly simplified the developer experience. Because the API consists of standard HTTP requests, libraries were quickly developed for most programming languages. Saving a blob of data became as easy as an HTTP PUT request to the object store. Retrieving the file and metadata is a normal GET request. Further, most object storage services can also serve the files publicly to your users, removing the need to maintain a web server to host static assets.
On top of that, object storage services charge only for the storage space you use (some also charge per HTTP request, and for transfer bandwidth). This is a boon for small developers, who can get world-class storage and hosting of assets at costs that scale with use.
Amazon S3
Amazon Simple Storage Service (S3) is a storage system for the internet, where you can store and retrieve any amount of data, anytime, anywhere. This make web-scaling computing easier for developers, and it also gives them access to the infrastructure that Amazon uses to conduct a global network of websites. The Amazon S3 API offers a common path for rapid development and the creation of hybrid cloud deployments at scale.
S3 is Amazon’s Object Storage Service. It is a highly durable, scalable, and fast, secure storage system that is highly available via a web interface for you and me to upload and download any amount of data from anywhere in the world. You can store any kind of data, such as images, documents, and binaries, as long as the size of a single object doesn’t exceed 5 TB.
Every time you upload something to S3, it gets replicated to multiple hard drives across multiple zones on AWS. That way you can never really lose anything on AWS. Who cares if one of the hard drives crash; there are multiple copies of your data sitting on multiple other physical drives!
S3 uses buckets to group objects. A bucket is a container for objects with a globally unique name. By unique we really mean unique — you have to choose a bucket name that isn’t used by any other AWS customer in any other region.
You can have one or more buckets. For each bucket, you can control access to it (who can create, delete, and list objects in the bucket), view access logs for it and its objects, and choose the geographical region where Amazon S3 will store the bucket and its contents.
Typical use cases are as follows:
- Backing up and restoring files with S3 and the help of the AWS CLI
- Archiving objects with Amazon Glacier to save money compared to Amazon S3
- Integrating Amazon S3 into applications with the help of the AWS SDKs to store and fetch objects such as images
- Hosting static web content that can be viewed by anyone with the help of S3
- Building data pipelines with S3 Event Notifications and Lambda
MinIO
MinIO is an S3-compatible object-storage server that you can run on your own.
MinIO is a cloud storage server compatible with Amazon S3, released under Apache License v2. MinIO cloud storage server is designed to be minimal and scalable. It is light enough to be bundled along with the application stack, similar to NodeJS and Redis.
As an object store, MinIO can store unstructured data such as photos, videos, log files, backups and container images. The maximum size of an object is 5TB.
More details can be found from the MinIO website.
Further reading