Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6398

Parse attribute column of a GTF file and create correct qualifiers

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.32
    • Fix Version/s: 33
    • Component/s: Basic-Nucl
    • Labels:
    • Story Points:
      3
    • Sprint:
      DEV-33-2
    • Affect Type:
      Userdefined

      Description

      Currently the 9th attribute column of a GTF is parsed into qualifiers of the corresponding annotations, however, all qualifiers are called "attr". See the attached screenshot.

      For example, for the following line:

      chr20	ensembl_havana	gene	87250	97094	.	+	.	gene_id "ENSG00000178591"; gene_version "6"; gene_name "DEFB125"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
      

      a "gene" annotation should be created with the following qualifiers:

      • gene_id (with value "ENSG00000178591" without quote marks).
      • gene_version
      • gene_source
      • etc.

      As sample data use, for example, files "Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz" (a sequence) and "chr20_ref.gtf" (the corresponding GTF file), located in the ".../test/UGENE-6398" folder on the file server.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dsukhomlinov Dmitrii Sukhomlinov
              Reporter:
              oigl Olga Golosova
              Assigned Tester:
              Kirill Rasputin
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: