qseq and sseq in -outfmt 6 are not aligned

r0r

New member
Hi,

I recently realised that in the output format '-outfmt 6' the 2 fields 'qseq' and 'sseq' do not correspond to the aligned sequences, but rather to the sequences that align (i.e. the string of characters (aa or nucl) without the insertions and deletions).
That is different to the corresponding 'qseq' and 'sseq' output in BLAST, in which indels are present.

nb: however, it seems that insertions and deletions (indicated by series of '---') are present in the XML output format (fields Hsp_qseq and Hsp_hseq).

I was wondering if it would be possible to 'put back' the indels in the tabular output format (-outfmt 6), for the seek of compatibility with BLAST format.

many thanks
Romain
 

Benjamin Buchfink

Administrator
Staff member
Hi Romain,

no problem, I will add that feature, but probably make it a new output field instead of changing the behaviour of the current ones, otherwise it will break the compatibility with previous Diamond versions.

Best regards,
Benjamin
 

r0r

New member
Hi Benjamin,

Apologies for the delay, I have been busy with other projects.
I have compiled 2 different versions of Diamond (trees 'd333bab199c8f107f22f04ac727820dfe23b6e5d' and 'e9ee266f97cd38ad57511790611ac67551c54403') but on both cases I couldn't find the options qseq_gapped and sseq_gapped for the output tabular format.

I might have downloaded the wrong version (?). Any help would be helpful.

Here are the tabular options I have (in both cases):

qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
full_qseq means Query sequence
sseq means Aligned part of subject sequence
full_sseq means Subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
btop means Blast traceback operations(BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP
qtitle means Query title
qqual means Query quality values for the aligned part of the query
full_qqual means Query quality values


best regards,
Romain
 

r0r

New member
update:

my previous observations were based on Diamond versions compiled on a Mac OS (with a lot of error messages during the compilation).

I tried again on our linux server (version d333bab199c8f107f22f04ac727820dfe23b6e5d) and it works fine: the aligned query and target sequences with indels are present in the tabular output, although these 2 new fields are not shown when typing './diamond help'.

To summarise:
_ the 2 fields are not shown in the help message in both Mac and linux
_ the 2 fields are properly generated in the output on our linux server, but it didn't work on my Mac.

Thanks a lot
Romain
 
Top