Question about --taxonlist

Xingyu Luo

New member
I was trying to do blastp for about 2000 protein sequences against nr database. Specifically, I want to blast with bacteria protei, so I used --taxonlist 2. However, it only returned about 900 results. To make sure what happened with the sequences which are not showed on the results, I tested 2 sequences with NCBI blastp taxid:2. They do have results return. Anyone has an idea, please help.
Here is my code:
sudo diamond blastp -f 6 qseqid sseqid pident nident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle sscinames --taxonlist 2 --max-target-seqs 1 --db nr -q /mnt/e/dnae_Sequence/archaea_dnae.fasta -o /mnt/e/blast/archaea_dnae.txt
 

Benjamin Buchfink

Administrator
Staff member
It is probably a question of sensitivity, so you should try the more sensitive modes of Diamond like --sensitive or --very-sensitive. Another reason could be repeat masking, so you can try again with --masking 0.
 

Xingyu Luo

New member
Thanks for the fast reply!

This is the result without taxonomy filter, --taxonlist 2.
1597172525546.png
This is the result of --masking 0 without --sensitive or --very-sensitive.
1597171876489.png
When I try --sensitive the performance will be better.
1597171909317.png
But the --very-sensitive will cause a dramatically decline of the reports.
1597172029119.png
I reread the manual again, do I need to do something with --freq-sd?
Because my data is about dnaE protein sequence in both eukaryotes and archaea, so I i think it might be with high frequency? I am not sure.
Also, --unal 1 didn't show the absent results.

BTW, I did not find the --masking on manual can you give me more detail with it.
 

Xingyu Luo

New member
I just send you the data. Also, I checked the version, its probably too old, v0.9.25.126. Do I have any options to update the version on Linux without using bioconda? Or I just need to redownload.
 
Top