encode_utils.utils

Contains utilities that don’t require authorization on the DCC servers.

encode_utils.utils.REQUEST_HEADERS_JSON = {'content-type': 'application/json'}

Stores the HTTP headers to indicate JSON content in a request.

encode_utils.utils.is_jpg_or_tiff(filename)[source]

Checks if the provided file is an image file that is formatted as either JPEG or TIFF.

Parameters:filenamestr. Local file.
Returns:The provided file is not a JPEG or TIFF image. str: ‘JPEG’ if this is a JPEG image, or ‘TIFF’ if this is a TIFF image.
Return type:False
Raises:OSError – The provided file isn’t a recognized image format.
encode_utils.utils.orient_jpg(image)[source]

Given a JPG or TIFF, attempts to read the EXIF data to determine the orientation and then transform the image if needed. This function is called in connection.Connection.set_attachment().

EXIF - exchangeable image file format - is only supported by JPG and TIFF formatted images. Such images aren’t even required to set EXIF metadata. Imaging software sometimes sets EXIF to allow clients to read metadata such as what software took the picture, and what orientation it’s in. This function is concerned with the oriention being in an upright position.

Note! Existing EXIF data will be lost for any transformed image. That’s not a big issue for orientation, however, since software should consider the orientation to be 1 when EXIF isn’t present anyways.

Parameters:imagestr or bytes instance. Use a string to supply the path to local JPG or TIFF file. Use a bytes object if you have the image data already in memory.
Returns:Dictionary with keys being:
  1. from - int. The orientation that was read in, or 0 if unknown.
  2. transformed - boolean. True if this function transformed the image, False otherwise. Note that False could either mean that the image didn’t need any transformation or that the need for a transformation could not be determined based on EXIF metadata or lack thereof.
  3. stream - A bytes instance.
Return type:dict
Raises:InvalidExifOrientation – The EXIF orientation data is present, but the orientation value isn’t in the expected range of [1..8].
encode_utils.utils.url_join(parts=[])[source]

Useful for joining URL fragments to make a single cohesive URL, i.e. for searching. You can see several examples of its use in the connection.Connection class.

encode_utils.utils.get_record_id(rec)[source]

Extracts the most suitable identifier from a JSON-serialized record on the ENCODE Portal. This is useful, for example, for other applications that need to store identifiers of specific records on the Portal. The identifier chosen is determined to be the ‘accession’ if that property is present, otherwise it’s the first alias of the ‘aliases’ property is present, otherwise its the value of the ‘uuid’ property.

Parameters:recdict. The JSON-serialization of a record on the ENCODE Portal.
Returns:The extracted record identifier.
Return type:str
Raises:Exception – An identifier could not be extracted from the input record.
encode_utils.utils.err_context(payload, schema)[source]

Validates the schema instance against the provided JSON schema.

Parameters:
  • payload – dict.
  • schema – dict.
Returns:

None if there aren’t any instance validation errors. Otherwise, a two-item tuple where the first item is the main error message; the second is a dictionary-based error hash that contains the contextual errors. This latter item may be empty.

encode_utils.utils.calculate_md5sum(file_path)[source]

Calculates the md5sum for a local file or a S3 URI. If an S3 URI, the md5sum will be set as the objects ETag.

Parameters:file_pathstr. The path to a local file or an S3 URI, i.e. s3://bucket-name/key.
Returns:The md5sum.
Return type:str
Raises:FileNotFoundError – The given file_path does not exist.
encode_utils.utils.calculate_file_size(file_path)[source]

Calculates the file size in bytes for a local file or a S3 URI.

Parameters:file_pathstr. The path to a local file or an S3 URI, i.e. s3://bucket-name/key.
Returns:int.
Raises:FileNotFoundError – The given file_path does not exist.
encode_utils.utils.print_format_dict(dico, indent=2, truncate_long_strings=False)[source]

Formats a dictionary for printing purposes to ease visual inspection. Wraps the json.dumps() function.

Parameters:
  • indentint. The number of spaces to indent each level of nesting. Passed directly to the json.dumps() function.
  • truncate_long_stringsbool. Defaults to False. If True, then long strings will be truncated before the object is serialized.
encode_utils.utils.truncate_long_strings_in_objects(obj, max_num_chars=1000)[source]

Recursively truncates long strings in JSON objects, useful for reducing size of log messsages containing payloads with attachments using data URLs.

Parameters:
  • obj – Any type supported by json.dump, usually called with a dict.
  • max_num_charsint. The number of characters to truncate long strings to.
encode_utils.utils.clean_aliases(aliases)[source]

Removes unwanted characters from the alias name. This function replaces:

-both ‘/’ and ‘' with ‘_’. -# with “”, as it is not allowed according to the schema.

Can be called prior to registering a new alias if you know it may contain such unwanted characters. You would then need to update your payload with the new alias to submit.

Parameters:aliaseslist. One or more record alias names to submit to the Portal.
Returns:The cleaned alias.
Return type:str
Example::
clean_alias_name(“michael-snyder:a/troublesomelias”) # Returns michael-snyder:a_troublesome_alias
encode_utils.utils.create_subprocess(cmd, check_retcode=True)[source]

Runs a command in a subprocess and checks for any errors.

Creates a subprocess via a call to subprocess.Popen with the argument shell=True, and pipes stdout and stderr.

Parameters:
  • cmdstr. The command to execute.
  • check_retcodebool. When True, then a subprocess.SubprocessError is raised when the subprocess returns a non-zero return code. The error message will display the command that was executed along with its actual return code, as well as any messages that the subprocess sent to STDOUT and STDERR. When False, the subprocess.Popen instance will be returned instead and it is expected that the caller will call its communicate method.
Returns:

Two-item tuple containing the subprocess’s STDOUT and STDERR streams’ content if check_retcode=True, otherwise a subprocess.Popen instance.

Raises:

subprocess.SubprocessError – There is a non-zero return code and check_retcode=True.

encode_utils.utils.strip_alias_prefix(alias)[source]

Splits alias on ‘:’ to strip off any alias prefix. Aliases have a lab-specific prefix with ‘:’ delimiting the lab name and the rest of the alias; this delimiter shouldn’t appear elsewhere in the alias.

Parameters:aliasstr. The alias.
Returns:The alias without the lab prefix.
Return type:str

Example:

strip_alias_prefix("michael-snyder:B-167")
# Returns "B-167"
encode_utils.utils.add_to_set(entries, new)[source]

Adds an entry to a list and makes a set for uniqueness before returning the list.

Parameters:
  • entrieslist.
  • new – (any datatype) The new member to add to the list.
Returns:

A deduplicated list.

Return type:

list

encode_utils.utils.does_lib_replicate_exist(replicates_json, lib_accession, biological_replicate_number=False, technical_replicate_number=False)[source]

Regarding the replicates on the specified experiment, determines whether any of them belong to the specified library. Optional constraints are the ‘biological_replicate_number’ and the ‘technical_replicate_number’ props of the replicates.

Parameters:
  • replicates_jsonlist. The value of the replicates property of an Experiment record.
  • lib_accessionstr. The value of a library object’s accession property.
  • biological_replicate_number – int. The biological replicate number.
  • technical_replicate_number – int. The technical replicate number.
Returns:

The replicate UUIDs that pass the search constraints.

Return type:

list

encode_utils.utils.remove_duplicate_objects(objects)[source]

Checks for duplicates in array properties containing dictionary elements.

Parameters:objectslist.
Returns:Deduplicated list.