Problem with blastdbcmd. “Error: [blastdbcmd] Skipped TRANSCRIPT/4865”
0
0
Entering edit mode
19 months ago

Hello, I am looking for help with running a Reciprocal BLAST. I cannot query any of my transcripts using blastdbcmd because I get this error message: Error: [blastdbcmd] Skipped TRANSCRIPT/4865. I appreciate any insight that can help resolve this issue!

This is the script I am using for the Reciprocal BLAST (macOS):

#Making a database
makeblastdb -in ~/Desktop/tegula_blast/Hq_transcripts_2.fa -dbtype 'nucl' -out ~/Desktop/tegula_blast/Tfunebralis_DB -parse_seqids

#Output message
Building a new DB, current time: 03/06/2024 11:23:46
New DB name:   /Users/lanigleason/Desktop/tegula_blast/Tfunebralis_DB
New DB title:  /Users/lanigleason/Desktop/tegula_blast/Hq_transcripts_2.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 99388 sequences in 2.75653 seconds.

#Blast for all possible pairwise comparisons 
blastn -query ~/Desktop/tegula_blast/Hq_transcripts_1.fa -db ~/Desktop/tegula_blast/Tfunebralis_DB -out ~/Desktop/tegula_blast/eiseni_to_funebralis.txt -evalue 1E-20 -outfmt 6 -max_target_seqs 1

#Output message
Warning: [blastn] Examining 5 or more matches is recommended

#Retrieve subset of assembly using blastdbcmd

Here, I took the queried transcripts with matches, from previous blastn output (eiseni_to_funebralis.txt) and made a list of the transcript names (eiseni_names.txt).

An observation I make here is that there are multiple matches listed in the blastn output. My original fasta file with transcripts does not include multiple copies, so this has to be a product of the blastn command. I am wondering if this is a possible reason for the error in the next step.

blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry_batch ~/Desktop/tegula_blast/eiseni_names.txt -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.fasta

#Output message
Error: [blastdbcmd] Skipped transcript_4866
Error: [blastdbcmd] Skipped transcript_9439
Error: [blastdbcmd] Skipped transcript_9439
Error: [blastdbcmd] Skipped transcript_13586
Error: [blastdbcmd] Skipped transcript_17909
Error: [blastdbcmd] Skipped transcript_38053
Error: [blastdbcmd] Skipped transcript_38053
Error: [blastdbcmd] Skipped transcript_22088
Error: [blastdbcmd] Skipped transcript_34418
Error: [blastdbcmd] Skipped transcript_45393
Error: [blastdbcmd] Skipped transcript_45393
Error: [blastdbcmd] Skipped transcript_13587
Error: [blastdbcmd] Skipped transcript_13587
Error: [blastdbcmd] Skipped transcript_13587
etc…

Most of the transcripts are skipped in this step. As seen, all duplicates of the same transcript are skipped too.

I have tried using a smaller subset of transcripts and removing the duplicate matches to run the reciprocal blast but I encounter the same errors.

I also tried looking up database identifiers that I could be missing but I am not sure how to use these identifiers to query with blastdbcmd.

blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry all -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.all.fasta -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"

Please let me know any ways to reduce the amount of transcripts skipped by blastdbcmd. Any recommended parameter changes and other explanations are welcome. Thank you!

blast • 411 views
ADD COMMENT

Login before adding your answer.

Traffic: 4364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6