In short, the following should be done:
- Update the executable files in the 64-bit Linux and macOS external tools packages.
- Update the command launches for DIAMOND build and classify.
- Update the element description a little bit.
- Update the parameters.
DIAMOND executable on 64-bit Linux
The executable file provided on the DIAMOND website does not work on all systems by default. The same happened when I tried to compile the source code. The executable file worked only on the system where it was build. It is required to:
- make the executable file non-dependent on a particular Linux system
- add the updated files to the 64-bit Linux external tool package
DIAMOND executable on 64-bit macOS
- build the source code
- add the updated files to the 64-bit macOS external tool package
Command launches
Since DIAMOND version 0.9.19 taxonomy files are passed to the tool during building of the database, not aligning. So modify the tool launches correspondingly (parameters "taxonmap" and "taxonnodes").
Build DIAMOND Database:
diamond makedb .../uniref50.fasta.gz -d .../uniref50.dmnd --taxonmap .../data/ngs_classification/taxonomy/prot.accession2taxid.gz --taxonnodes .../data/ngs_classification/taxonomy/nodes.dmp
Classify Sequences with DIAMOND:
diamond blastx -d .../uniref50.dmnd -f 102 other_parameters
Update the workflow element description in the Property Editor
In general, DIAMOND is a sequence aligner for protein and translated DNA searches similar to the NCBI BLAST software tools. However, it provides a speedup of BLAST ranging up to x20,000. Using this workflow element one can use DIAMOND for taxonomic classification of short DNA reads and longer sequences such as contigs. The lowest common ancestor (LCA) algorithm is used for the classification.
Update DIAMOND parameters
- Modify the default value of the "Block size" parameter to "0.5" to increase chances that DIAMOND will be able to run on a common computer by default.
- Modify description of the "Expected value" parameter:
Maximum expected value to report an alignment (--evalue/-e).
- Modify description of the "Output file" parameter:
Specify the output file name. The output file is a tab-delimited file with the following fields: * Query ID * NCBI taxonomy ID (0 if unclassified) * E-value of the best alignment with a known taxonomy ID found for the query (0 if unclassified)
- Add a new parameter "Top alignments percentage":
- Put this parameter under "Sensitive mode" in the Property Editor.
- The value should be input via a spin box with integer values >= 0, <= 100. Put "%" near the value.
- The default value is "10".
- The description should be the following:
DIAMOND uses the lowest common ancestor (LCA) algorithm for taxonomy classification of the input sequences. This parameter specifies what alignments should be taken into account during the calculations (--top). For example, the default value "10" means to take top 10% of the best hits (i.e. sort all query/subject-alignments by score, take top 10% of the alignments with the best score, calculate the lowest common ancestor for them).
- relates to
-
UGENE-6126 Update DIAMOND databases for UniRef50 and UniRef90
- Closed