eu_register.py

Given a tab-delimited or JSON input file containing one or more records belonging to one of the profiles listed on the ENCODE Portal (such as https://www.encodeproject.org/profiles/biosample.json), either POSTS or PATCHES the records. The default is to POST each record; to PATCH instead, see the --patch option.

When POSTING file records, the md5sum of each file will be calculated for you if you haven’t already provided the md5sum property. Then, after the POST operation completes, the actual file will be uploaded to AWS S3. In order for this to work, you must set the submitted_file_name property to the full, local path to your file to upload. Alternatively, you can set submitted_file_name to and existing S3 object, i.e. s3://mybucket/reads.fastq.

Note that there is a special ‘trick’ defined in the encode_utils.connection.Connection() class that can be taken advantage of to simplify submission under certain profiles. It concerns the attachment property in any profile that employs it, such as the document profile. The trick works as follows: instead of constructing the attachment propery object value as defined in the schema, simply use a single-key object of the following format:

{"path": "/path/to/myfile"}

and the attachment object will be constructed for you.


usage: eu_register.py [-h] [-m DCC_MODE] [-d] [--no-aliases]
                      [--no-upload-file] -p PROFILE_ID -i INFILE [-w]
                      [-r REMOVE_PROPERTY] [--patch | --rm-patch]

Named Arguments

-m, --dcc-mode
The ENCODE Portal site (‘prod’ or ‘dev’, or an explicit host name, i.e. ‘demo.encodedcc.org’) to connect to.
-d, --dry-run
Set this option to enable the dry-run feature, such that no modifications are performed on the ENCODE Portal. This is useful if you’d like to inspect the logs or ensure the validity of your input file.

Default: False

--no-aliases
Setting this option is NOT advised. Set this option for doing a POST when your input file doesn’t contain an ‘aliases’ column, even though this property is supported in the corresponding ENCODE profile. When POSTING a record to a profile that includes the ‘aliases’ property, this package requires the ‘aliases’ property be used for traceability purposes and because without this property, it’ll be very easy to create duplicate objects on the Portal. For example, you can easily create the same biosample as many times as you want on the Portal when not providing an alias.

Default: False

--no-upload-file
 

Don’t upload files when POSTing file objects

Default: False

-p, --profile_id
 
The ID of the profile to submit to, i.e. use ‘genetic_modification’ for https://www.encodeproject.org/profiles/genetic_modification.json. The profile will be pulled down for type-checking in order to type-cast any values in the input file to the proper type (i.e. some values need to be submitted as integers, not strings).
-i, --infile

The JSON input file or tab-delimited input file.

The tab-delimited file format: Must have a field-header line as the first line. Any lines after the header line that start with a ‘#’ will be skipped, as well as any empty lines. The field names must be exactly equal to the corresponding property names in the corresponding profile. Non-scematic fields are allowed as long as they begin with a ‘#’; they will be skipped. If a property has an array data type (as indicated in the profile’s documentation on the Portal), the array literals ‘[‘ and ‘]’ are optional. Values within the array must be comma-delimited. For example, if a property takes an array of strings, then you can use either of these as the value:

  1. str1,str2,str3
  2. [str1,str2,str3]

On the other hand, if a property takes a JSON object as a value, then the value you enter must be valid JSON. This is true anytime you have to specify a JSON object. Thus, if you are submitting a genetic_modification and you have two ‘introduced_tags’ to provide, you can supply them in either of the following two ways:

  1. {“name”: “eGFP”, “location”: “C-terminal”},{“name”: “FLAG”,”C-terminal”}
  2. [{“name”: “eGFP”, “location”: “C-terminal”},{“name”: “FLAG”,”C-terminal”}]

The JSON input file Can be a single JSON object, or an array of JSON objects. Key names must match property names of an ENCODE record type (profile).

The following applies to either input file formats When patching objects, you must specify the ‘record_id’ field to indicate the identifier of the record. Note that this a special field that is not present in the ENCODE schema, and doesn’t use the ‘#’ prefix to mark it as non-schematic. Here you can specify any valid record identifier (i.e. UUID, accession, alias).

Some profiles (most) require specification of the ‘award’ and ‘lab’ attributes. These may be set as fields in the input file, or can be left out, in which case the default values for these attributes will be pulled from the environment variables DCC_AWARD and DCC_LAB, respectively.

-w, --overwrite-array-values
 
Only has meaning in combination with the –patch option. When this is specified, it means that any keys with array values will be overwritten on the ENCODE Portal with the corresponding value to patch. The default action is to extend the array value with the patch value and then to remove any duplicates.

Default: False

-r, --remove-property
 
Only has meaning in combination with the –rm-patch option. Properties specified in this argument will be popped from the record fetched from the ENCODE portal. Can specify as comma delimited string.
--patch
Presence of this option indicates to PATCH an existing DCC record rather than register a new one.

Default: False

--rm-patch
Presence of this option indicates to remove a property, as specified by the -r argument, from an existing DCC record, and then PATCH it with the payload specified in -i.

Default: False