Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6290

Add "Analysis type" and other parameters to MetaPhlAn2

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: master
    • Fix Version/s: 1.32
    • Component/s: NGS, Workflow
    • Labels:
      None
    • Story Points:
      3
    • Epic Link:
    • Sprint:
      DEV-32-4, DEV-32-5
    • Affect Type:
      Userdefined

      Description

      In UGENE-6254 MetaPhlAn2 workflow element was created and six core parameters were added. Add also to the element parameters described below.

      Parameters and their values

      • "Analysis type" - a combo box with values:
        • "Relative abundance" (this is the default value)
        • "Relative abundance with reads statistics"
        • "Reads mapping"
        • "Clade profiles"
        • "Marker abundance table"
        • "Marker presence table"
      • "Tax level" - the parameter is present only when "Analysis type" is equal to "Relative abundance" and "Relative abundance with reads statistics". It is a combo-box with values:
        • All (this is the default value)
        • Kingdoms
        • Phyla
        • Classes
        • Orders
        • Families
        • Genera
        • Species
      • "Normalize by metagenome size" - the parameter is present only when "Analysis type" is equal to "Marker abundance table". It is a combo box with values "Skip" (default) and "Normalize".
      • "Presence threshold" - the parameter is present only when "Analysis type" is equal to "Marker presence table". It is an INT value >= 0. The default value is 1.

      Parameters and values IDs in UWL

      • "analysis-type", "rel-ab", "rel-ab-w-read-stats", "reads-map", "clade-profiles", "marker-ab-table", "marker-pres-table"
      • "tax-level", "all", "kingdoms", "phyla", "classes", "orders", "families", "genera", "species"
      • "normalize-by-size", "skip", "normalize"
      • "presence-threshold"

      Parameters description

      • Analysis type
        Specify type of analysis to perform:
          * Relative abundance - profiling of metagenomes in terms of relative abundances (corresponds to "-t rel_ab")
          * Relative abundance with reads statistics - profiling of metagenomes in terms of relative abundances and estimate the number of reads coming from each clade ("-t rel_ab_w_read_stats")
          * Reads mapping - mapping from reads to clades, the output contains reads that hit a marker only ("-t reads_map") 
          * Clade profiles - normalized marker counts for clades with at least a non-null marker ("-t clade_profiles")
          * Marker abundance table - normalized marker counts: only when > 0.0 and optionally normalized by metagenome size ("-t marker_ab_table"), see also "Normalize by metagenome size" parameter
          * Marker presence table - list of markers present in the sample ("-t marker_pres_table"), see also "Presence threshold" parameter
        
      • Tax level
        The taxonomic level for the relative abundance output: all, kingdoms (Bacteria and Archaea) only, phyla only, etc. (--tax_lev)
        
      • Normalize by metagenome size
        If "Normalize" is selected, the total number of reads in the original metagenome is taken into account for normlization: UGENE calculates the number of reads in an input FASTA/FASTQ file and passes "--nreads" parameter to MetaPhlAn2.
        
      • Presence threshold
        Specify a threshold for calling a marker.
        

      Command

      • Add "-t VALUE" to the command with the corresponding value of the "Analysis type": "rel_ab", "rel_ab_w_read_stats", "reads_map", "clade_profiles", "marker_ab_table", "marker_pres_table".
      • In case "Tax level" is available, add "--tax_lev VALUE" with the corresponding value: "a", "k", "p", "c", "o", "f", "g", "s".
      • In case "Normalize by metagenome size" is present and set to "Normalize", calculate the number of sequences in an input file (or in the first file in case of a pair of input files), add "--nreads VALUE" to the command, where VALUE is the calculated number.
      • In case "Presence threshold" is available, add "--pres_th VALUE" with the corresponding INT value.

      Test plan

      See tests "Tests: Metagenomics > MetaPhlAn > Workflow element > Analysis types".
      The corresponding sample data and commands used to generate them are on the file server in folder ".../data/test_data/UGENE-6290".

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dsukhomlinov Dmitrii Sukhomlinov
              Reporter:
              oigl Olga Golosova
              Assigned Tester:
              Svetlana Samoilenko
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: