Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6094

Add "Improve Reads with Trimmomatic" workflow element

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: virogenesis
    • Fix Version/s: 1.31
    • Component/s: NGS, Workflow
    • Labels:
      None
    • Story Points:
      2
    • Epic Link:
    • Sprint:
      DEV-31-1, DEV-31-2
    • Affect Type:
      Userdefined

      Description

      Element name and description

      • Name of the element: "Improve Reads with Trimmomatic"
      • Description of the element on the Scene: "Trim, crop and/or remove adapters for input Illumina FASTQ data."
      • Description of the element in the Property Editor:
        "Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters."

      Input data

      There is one input port.

      Item Value
      Port name in GUI Input FASTQ file(s)
      Port description URL(s) to FASTQ file(s) with Illumina reads should be provided.
      In case of SE reads use the "Input FASTQ URL 1" slot only. In case of PE reads input "left" reads to "Input FASTQ URL 1", "right" reads to "Input FASTQ URL 2".
      See also the "Input data" parameter of the element.
      Port ID in UWL in
      Number of slots 1 or 2 depending on the value of "Input data"
      Slot #1 name in GUI Input FASTQ URL 1
      Slot #1 ID in UWL reads-url1
      Slot #1 data type String
      Slot #2 name in GUI Input FASTQ URL 2
      Slot #2 ID in UWL reads-url2
      Slot #2 data type String

      Output data

      There is one output port.

      Item Value
      Port name in GUI Improved FASTQ file(s)
      Port description The port outputs URLs to FASTQ files, produced by Trimmomatic.
      In case of SE reads for each input FASTQ file one output file is produced. The file URL is passed to the output slot "Output FASTQ URL 1".
      In case of PE reads for each pair of input FASTQ files four output files are produced: for paired "left" reads, for unpaired "left" reads, for paired "right" reads, and for unpaired "right" reads. URLs of files with paired reads are passed to the output slots "Output FASTQ URL 1" and "Output FASTQ URL 2".
      Port ID in UWL out
      Number of slots 1 or 2 depending on the value of "Input data"
      Slot #1 name in GUI Output FASTQ URL 1
      Slot #1 ID in UWL reads-url1
      Slot #1 data type String
      Slot #2 name in GUI Output FASTQ URL 2
      Slot #2 ID in UWL reads-url2
      Slot #2 data type String

      Parameters

      # Parameter Description Value in GUI Default value
      1 Input data Set the type of the input reads: single-end (SE) or paired-end (PE).
      One or two slots of the input port are used depending on the value of the parameter. Pass URL(s) to data to these slots.
      Note that the paired-end mode will use additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process.
      A combo box with values: "SE reads", "PE reads". "SE reads"
      2 Trimming steps Configure trimming steps that should be performed by Trimmomatic. A line edit with disable text "Configure steps" and a browse button ("...") nearby. The button will open a dialog, see UGENE-6095. Disabled label "Configure steps"
      3 Output file Specify the output file name. The parameter is only available if "Input data" is "SE reads". Auto (this equals to "input file name_trim.input file extension", e.g. "SRR519926_trim.fastq" for input file "SRR519926.fastq")
      4 Paired output file 1 Specify the output file name for "left" reads that have paired "right" reads. The parameter is only available if "Input data" is "PE reads". Auto (add additional "P", e.g. "sample_1P.fq.gz")
      5 Paired output file 2 Specify the output file name for unpaired "left" reads. The parameter is only available if "Input data" is "PE reads". Auto (add additional "U", e.g. "sample_1U.fq.gz")
      6 Unpaired output file 1 Specify the output file name for "left" reads that have no pair. The parameter is only available if "Input data" is "PE reads". Auto (add additional "P", e.g. "sample_2P.fq.gz")
      7 Unpaired output file 2 Specify the output file name for "right" reads that have no pair. The parameter is only available if "Input data" is "PE reads". Auto (add additional "U", e.g. "sample_2U.fq.gz")
      8 Generate detailed log Select "True" to generate a file with log of all read trimmings, indicating the following details (-trimlog):
      • the read name
      • the surviving sequence length
      • the location of the first surviving base, aka. the amount trimmed from the start
      • the location of the last surviving base in the original read
      • the amount trimmed from the end
      A combo box with values "True" and "False". "False"
      9 Log file Specify a text file to keep detailed information about reads trimming. The parameter is only available if "Generate detailed log" is set to "True". Auto (add "_trimlog.txt" to the base name, e.g. "SRR519926_trimlog.txt", "sample_trimlog.txt")
      10 Number of threads Use multiple threads (-threads). A spin box with values from 1 to the number of available cores. Use the value from the Application Settings (the “Optimize for CPU count” option).

      Data processing

      For each input FASTQ file / a pair of FASTQ file the element should run Trimmomatic with the corresponding settings. For now hardcode some step, e.g. "HEADCROP 5". This will be modified in terms of UGENE-6095, UGENE-6096, and UGENE-6097.

      Additional notes

      • Trimmomatic does not support FASTA input.
      • Compressed FASTQ input files are supported: "gzip" or "bzip". In this case the output file should have the same format and extension. For example, for input file "sample.fastq.gz" the output should be "sample_trim.fastq.gz".
      • Pay attention that Trimmomatic documentation is Illumina-oriented. I'm not sure yet, if it is possible to use Trimmomatic for processing reads from other platforms, e.g. Ion Torrent.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dsukhomlinov Dmitrii Sukhomlinov
              Reporter:
              oigl Olga Golosova
              Assigned Tester:
              Svetlana Samoilenko
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: