-
Type: Bug
-
Status: Tested
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 49
-
Fix Version/s: 50
-
Component/s: Basic-Nucl
-
Environment:
RHEL 8 Linux
-
Tests Type:Manual scenario
-
Sprint:DEV-50-RELEASE
-
Affect Type:Userdefined
In v49.1, LOCUS lines created by Ugene omit a field indicating the type of molecule (eg. DNA, mRNA, rRNA etc.) This causes some programs to incorrectly read a Ugene-generated circular sequence as linear. The NCBI GenBank release notes prescribe the following format for a LOCUS line:
LOCUS CP032762 5868661 bp DNA circular BCT 15-OCT-2018 ------------+--------------+-+---------+---------+---------+---------+--------- 1 13 28 30 40 50 60 70 79
I created a construct and saved in GenBank format:
LOCUS pBS_SK-GUS 5978 bp circular 27-FEB-2024
This file will be read incorrectly as a linear sequence by programs that look for the word circular in field 6, or in column 56. It would be parsed correctly by programs that search for 'circular' anywhere on the LOCUS line. The line can be fixed manually:
LOCUS pBS_SK-GUS 5978 bp DNA circular 27-FEB-2024
In previous versions of Ugene (eg. 29) the molecule type field was written. I recommend revising Ugene to explicitly write LOCUS lines in the specific columns described in the NCBI documentation at
https://ftp.ncbi.nlm.nih.gov/genbank/gbrel.txt
- relates to
-
UGENE-8062 Amino acid sequence is marked as DNA in GenBank file
- Open