Object Storage / S3
Object storage is a data storage system that originated in cloud environments and can be used to store and share data. An object consists of a unique name, the actual data and associated metadata (such as access authorizations and user-defined metadata). In contrast to file systems, objects are not stored in a hierarchy but in a flat container (so-called buckets). Access is via HTTP-based protocols/APIs such as S3 or Swift.
Access
Employees and students can request or set up access to the object storage service via the TU Dresden self-service portal. The user name and e-mail address are transmitted to the service and used to identify the user account.
Once granted, you will receive an identifier assigned to the ZIH login consisting of an access key (access key, comparable to a pseudonymized user ID) and secret key (secret key, i.e. a password to be kept secret), which can be viewed in the self-service portal. A new secret key can be generated at any time if required.
No separate group accesses or accesses that are not linked to a ZIH login are provided. For such purposes, a ZIH functional login can be requested and the associated S3 access used.
Limitations
The maximum storage space (quota) made available to a user is limited, initially to 20 GB, but can be increased to up to 200 GB via the self-service portal. Additional storage space requirements can be met on request.
Each user may create up to 50 buckets in which they can store their objects in a structured manner. The number of permitted objects is currently not limited, but may be subject to practical limitations.
There is no automatic backup from the ZIH for the data in S3.
Security information
Attention: By default, all data in the object storage is stored unencrypted. The underlying storage system does not encrypt the data either, which is why it is essential to ensure that confidential data is only stored in encrypted form in the object storage.
If the client software itself does not support encryption, server-side encryption, which is specified in the S3 protocol, can be used. The data is encrypted with a key provided by the client when it is saved. The transport route is encrypted (HTTPS).
Client configuration (example)
A frequently used client software is s3cmd, which can be controlled via the command line. The tool can be configured interactively after installation:
s3cmd --configure
- Enter access key and secret key
- Keep the default region (Enter)
- S3 endpoint: s3.zih.tu-dresden.de
- DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.zih.tu-dresden.de
If encryption is required:
- Encryption password: <password>
- Path to GPG program [/usr/bin/gpg]: <Enter>
- Use HTTPS protocol: Yes
HTTP Proxy server name: <Enter>
A configuration file ~/.s3cfg is created, which could also be filled directly in a minimalist way as follows
[default]
access_key = XXXXXXXXXXXXXXXXXXXXXXXX
secret_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
host_base = s3.zih.tu-dresden.de
host_bucket = %(bucket)s.s3.zih.tu-dresden.de
use_https = True
Create a bucket for testing:
s3cmd mb s3://testbucket
s3cmd ls
Upload a file as an object:
echo "Hello world" > mytestfile.txt
s3cmd put mytestfile.txt s3://testbucket
s3cmd info s3://testbucket/mytestfile.txt
Note on the client-side encryption of s3cmd: This must be activated explicitly by specifying the -e parameter (for put), or alternatively encrypt = True must be specified in the configuration file so that objects are stored in encrypted form.
Namespace
Each user is assigned their own namespace for buckets so that there are no collisions between users when assigning bucket names. However, this affects the sharing of objects with other users and the public provision of URLs. The namespace must always be specified in order to uniquely identify buckets.
Enable public access to objects
s3cmd setacl --acl-public s3://testbucket/mytestfile.txt
s3cmd info s3://testbucket/mytestfile.txt
Access: curl https://s3.zih.tu-dresden.de/zihloginname:testbucket/mytestfile.txt
Note: Access via subdomain-based bucket names is not possible for unauthenticated access (e.g. via the web browser) in this case, as the buckets are all located in a specific namespace to which the owner is assigned. This must be specified explicitly. Colons (:) are not a valid character in domain names, which is why the specification here is only possible via the path-based specification.
Revoke public access: s3cmd setacl --acl-private s3://testbucket/mytestfile.txt
Pre-signed URLs
So-called pre-signed URLs can be used to make objects accessible to a restricted group of people without them having to have S3 access data themselves. This involves creating a signed link with your own user, which enables access to an object. Such pre-signed URLs are usually provided with a limited period of validity.
s3cmd signurl s3://testbucket/mytestfile.txt $(date -d 'now + 1 year' +%s)
or:
s3cmd signurl s3://testbucket/mytestfile.txt +3600
As the assignment to a namespace is made via the access key contained in the URL, it is not necessary to explicitly specify the namespace.
Data security
Redundant storage of all data in the object storage is achieved in the underlying storage system (Ceph replicated pool), even across two locations, in order to be equipped against the failure of individual parts. However, if the entire storage system fails, no separate backup is available (-> no disaster recovery).
So-called versioned buckets are supported for the traceability of changes and to protect against accidental changes. Each time a change is made to an object, the old version is saved and can be restored, similar to a snapshot in file system-based storage. The additional versions count towards the user quota, but are saved incrementally so that only the changed portion takes up storage space.
It is also possible to protect objects from subsequent modification and deletion for a certain period of time (known as object locking), which means that a write-once-read-many (WORM) storage model can be implemented, providing effective protection against ransomware, for example.