Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-5976

VIROGENESIS: problems and user experience improvements connected with data

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.28
    • Fix Version/s: 1.31
    • Component/s: NGS, Workflow
    • Labels:
    • Epic Name:
      VIROGENESIS-Data
    • Affect Type:
      Userdefined

      Description

      The main objective of this epic is to make work with different file formats of reference and input data in the VIROGENESIS framework smoothly and handy.

      Archive file formats
      After Stage I (UGENE-5879) we have to unpack all data. For efficiency and tangible economy of disk space make it possible to work with archive formats 7z and GZ.
      Note that 7z is used by the UGENE installer. It is x5 times more efficient on some data than GZ. The GZ format is a common format for storage of reference data and NGS FASTQ files.

      Other format issues
      There a few issues with the formats, for example:

      • The original CLARK tool supports only "one FASTA per file"-format, that makes it harder to use with data downloaded from the NCBI FTP.
      • Kraken requires a certain header of input sequences to build a database, therefore RefSeq (or other sequence data) currently require additional processing. It is better to modify the headers on the fly.

      Data sources
      The data are:

      • User input data
      • Taxonomy
      • RefSeq
        • In terms of the default CLARK database
        • In terms of data for building databases for Kraken and CLARK
      • Uniprot (for DIAMOND)

      Note that MiniKraken (for Kraken) will be skipped for now, as it is not so big the other data packages.

      Version control
      The data version control should be supported in the framework. The process of the data updating should be automated.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              oigl Olga Golosova
              Watchers:
              0 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: