Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6080

Add "Classification Report" workflow element

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: virogenesis
    • Fix Version/s: 1.31
    • Component/s: NGS, Workflow
    • Labels:
      None
    • Affect Type:
      Userdefined

      Description

      Element name and description

      • Name of the element: "Classification Report"
      • Description of the element on the Scene: "Generate a detailed classification report"
      • Description of the element in the Property Editor:
        "Based on the input taxonomy classification data the element generates a detailed report and saves it in a tab-delimited text format."

      Input data

      There is one input port:

      Item Value
      Port name in GUI Input taxonomy data
      Port description Input taxonomy data from one of the classification elements (Kraken, CLARK, etc.).
      Port ID in UWL in
      Number of slots 1
      Slot #1 name in GUI Input tax data
      Slot #1 ID in UWL tax_data
      Slot #1 data type Taxonomy classification

      Parameters

      # Parameter Description Value in GUI Default value
      1 Output file Specify the output text file name. A line edit with the browse button. The value is mandatory ("Required"). Auto (this equals to "input_file_name_report.txt"

      Data processing by the element

      • The element takes input taxonomy classification data.
      • It generated the following header line in the output text file:
        tax_id\ttax_name\trank\tlineage\tkingdom_tax_id\tkingdom_name\tphylum_tax_id\tphylum_name\tclass_tax_id\tclass_name\torder_tax_id\torder_name\tfamily_tax_id\tfamily_name\tgenus_tax_id\tgenus_name\tspecies_tax_id\tspecies_name\tdirectly_num\tdirectly_proportion_all(%)\tdirectly_proportion_classified(%)\tclade_num\tclade_proportion_all(%)\tclade_proportion_classified(%)
        
      • For each taxID it generates a tab-delimited line in a text file with the following columns:
        • the read tax ID
        • scientific name of this taxon
        • rank of the tax ID (as in the NCBI taxonomy, including such values as "subspecies", "superorder", etc.)
        • lineage, i.e. all parent taxa in semicolon-separated list (see CLARK "estimate_abundance" output)
        • kingdom tax ID of the parent taxa at the kingdom level
        • the corresponding kingdom scientific name
        • ... (the same for all levels, "−" in case the info is not available)
        • "directly_num", i.e. the number of reads, directly assigned to this taxon.
        • proportion of the reads in all reads (percentage value)
        • proportion in classified reads only
        • "clade_num", i.e. the number of reads, covered by the clade rooted at this taxon (the reads directly assigned and all their children)
        • proportion of reads, covered by the clade rooted at this taxon, in all reads
        • proportion of reads, covered by the clade rooted at this taxon, in classified reads

        Attachments

          Activity

            People

            Assignee:
            varlax Alexey Varlamov
            Reporter:
            oigl Olga Golosova
            Assigned Tester:
            Kirill Rasputin
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: