eu_s3_to_gcp.py

Copies one or more ENCODE files from AWS S3 storage to GCP storage by using the Google Storage Transfer Service. See encode_utils.transfer_to_gcp.Transfer for full documentation.

Note: Currently, only priviledged users with appropriate DCC API keys will be able to make use of this script because the Google STS requires that the source buckets be publicly discoverable. Sice the encode bucket policies deny the action s3:GetBucketLocation on the public principal. Non-priviledged users may find the alternative script eu_create_gcp_url_list.py to be a solution.

usage: eu_s3_to_gcp.py [-h] [-m DCC_MODE]
                       (-f FILE_IDS [FILE_IDS ...] | -i INFILE)
                       [-gb GCPBUCKET] -gp GCPPROJECT [-d DESCRIPTION]
                       [-c S3CREDS]

Named Arguments

-m, --dcc-mode
The ENCODE Portal site (‘prod’ or ‘dev’, or an explicit host name, i.e. ‘demo.encodedcc.org’) to connect to.
-f, --file-ids
An alternative to –infile, one or more ENCODE file identifiers. Don’t mix ENCODE files from across buckets.
-i, --infile
An alternative to –file-ids, the path to a file containing one or more file identifiers, one per line. Empty lines and lines starting with a ‘#’ are skipped.
-gb, --gcpbucket
 
The name of the GCP bucket.
-gp, --gcpproject
 
The GCP project that is associated with gcp_bucket.
-d, --description
 
The description to show when querying transfers via the Google Storage Transfer API, or via the GCP Console. May be left empty, in which case the default description will be the value of the first S3 file name to transfer.
-c, --s3creds
AWS credentials. Provide them in the form AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY. Ideally, they’ll be stored in the environment in variables by the same names. However, for additional flexability you can specify them here as well.