Object Storage / S3
Object storage is a data store that originates in cloud environments and can be used to store and share data. An object consists of a unique name, the actual data, and associated metadata (such as access permissions and user-defined metadata). In contrast to file systems, objects are not stored in a hierarchy but in a flat container (so-called buckets). Access is possible via HTTP-based protocols/APIs such as S3 or Swift.
Access
Staff and students can request access to the object storage service via the TU Dresden self-service portal. In doing so, the user name as well as the email address is transmitted to the service and used to identify the user account.
Once granted, the user receives an ID assigned to the ZIH login consisting of Access Key (comparable to a pseudonymized user ID) and Secret Key (a password to be kept secret), which can be viewed in the self-service portal. If required, a new secret key can be generated at any time.
No separate group access or access not linked to a ZIH login are provided. For such purposes, a ZIH function login can be requested and the associated S3 access can be used.
Limitations
The maximum storage space (quota) provided to a user is limited, initially to 20 GB, but can be increased to up to 200 GB via the self-service portal. Additional storage space requirements can be met upon request.
Each user may create up to 50 buckets in which he can store his objects in a structured manner. The number of allowed objects is currently not limited but may be subject to practical limitations.
Security notes
Attention: by default, all data in the object store is stored unencrypted. The underlying storage system does not encrypt the data either, which is why it is absolutely necessary to make sure that confidential data is only stored in the object storage in encrypted form.
If the client software itself does not support encryption, you can use server-side encryption, which is specified in the S3 protocol. In this case, the data is encrypted during storage using a key provided by the client. The transport path is encrypted (HTTPS).
Client configuration (example)
A commonly used client software is s3cmd, which can be controlled from the command line. The tool can be configured interactively after installation:
s3cmd --configure
- Enter Access Key and Secret Key
- Keep the default region (Enter)
- S3 Endpoint: s3.zih.tu-dresden.de
- DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.zih.tu-dresden.de
If encryption is desired:
- Encryption password: <password>
- Path to GPG program [/usr/bin/gpg]: <Enter>
- Use HTTPS protocol: Yes
HTTP proxy server name: <Enter>
A configuration file ~/.s3cfg is created, which could also be filled directly in a minimalistic way as follows:
[default]
access_key = XXXXXXXXXXXXXXXXXX
secret_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
host_base = s3.zih.tu-dresden.de
host_bucket = %(bucket)s.s3.zih.tu-dresden.de
use_https = True
Create a bucket for testing:
s3cmd mb s3://testbucket
s3cmd ls
Upload a file as an object:
echo "Hello world" > mytestfile.txt
s3cmd put mytestfile.txt s3://testbucket
s3cmd info s3://testbucket/mytestfile.txt
Note on client-side encryption of s3cmd: this must be explicitly enabled by specifying the -e parameter (for put), or alternatively encrypt = True must be specified in the configuration file for objects to be stored encrypted.
Namespace
Each user is assigned their own namespace for buckets so that there are no collisions between users when assigning bucket names. However, this does affect the sharing of objects with other users and the public provision of URLs. Therefore, the namespace must always be specified in order to be able to uniquely identify buckets.
Enable public access to objects
s3cmd setacl --acl-public s3://testbucket/mytestfile.txt
s3cmd info s3://testbucket/mytestfile.txt
Access: curl https://s3.zih.tu-dresden.de/zihloginname:testbucket/mytestfile.txt
Note: Access via subdomain-based bucket names is not possible for unauthenticated access (i.e. via the web browser) in this case, since the buckets are all located in a specific namespace to which the owner is assigned. This namespace must be explicitly specified. Colons (:) are not a valid character in domain names, which is why it is only possible to use the path-based specification in this case.
Revoke public access: s3cmd setacl --acl-private s3://testbucket/mytestfile.txt
Pre-signed URLs
So-called pre-signed URLs can be used to make objects accessible to a restricted group of people without them having to have S3 access credentials themselves. This is possible by creating a signed link with one's own user account, giving access to an object for everyone with the knowledge of this link. Usually such pre-signed URLs are created with a limited validity period so that they expire after some time.
s3cmd signurl s3://testbucket/mytestfile.txt $(date -d 'now + 1 year' +%s)
or:
s3cmd signurl s3://testbucket/mytestfile.txt +3600
Since the assignment to a namespace is done via the access key contained in the URL, an explicit specification of the namespace is not necessary.
Data security
Redundant storage of all data in the object storage system is achieved in the underlying storage system (Ceph replicated pool), even across two sites, to be prepared against failure of individual parts. However, in the event of a failure of the entire storage system, no separate backup is available (-> no disaster recovery).
For the traceability of changes and to protect against accidental changes, so-called versioned buckets are supported. Whenever an object is changed, the old status is saved and can be restored, similar to a snapshot in file system-based storage. The additional versions count to the user quota but are stored incrementally, so that only the changed portion occupies storage space.
It is also possible to protect objects from subsequent modification and deletion for a certain period of time (so-called object locking), which means that a write-once-read-many (WORM) storage model can be implemented, providing effective protection against ransomware, for example.