Hello, I am looking for help with running a Reciprocal BLAST. I cannot query any of my transcripts using blastdbcmd because I get this error message: Error: [blastdbcmd] Skipped TRANSCRIPT/4865
. I appreciate any insight that can help resolve this issue!
This is the script I am using for the Reciprocal BLAST (macOS):
#Making a database
makeblastdb -in ~/Desktop/tegula_blast/Hq_transcripts_2.fa -dbtype 'nucl' -out ~/Desktop/tegula_blast/Tfunebralis_DB -parse_seqids
#Output message
Building a new DB, current time: 03/06/2024 11:23:46
New DB name: /Users/lanigleason/Desktop/tegula_blast/Tfunebralis_DB
New DB title: /Users/lanigleason/Desktop/tegula_blast/Hq_transcripts_2.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 99388 sequences in 2.75653 seconds.
#Blast for all possible pairwise comparisons
blastn -query ~/Desktop/tegula_blast/Hq_transcripts_1.fa -db ~/Desktop/tegula_blast/Tfunebralis_DB -out ~/Desktop/tegula_blast/eiseni_to_funebralis.txt -evalue 1E-20 -outfmt 6 -max_target_seqs 1
#Output message
Warning: [blastn] Examining 5 or more matches is recommended
#Retrieve subset of assembly using blastdbcmd
Here, I took the queried transcripts with matches, from previous blastn output (eiseni_to_funebralis.txt
) and made a list of the transcript names (eiseni_names.txt
).
An observation I make here is that there are multiple matches listed in the blastn output. My original fasta file with transcripts does not include multiple copies, so this has to be a product of the blastn command. I am wondering if this is a possible reason for the error in the next step.
blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry_batch ~/Desktop/tegula_blast/eiseni_names.txt -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.fasta
#Output message
Error: [blastdbcmd] Skipped transcript_4866
Error: [blastdbcmd] Skipped transcript_9439
Error: [blastdbcmd] Skipped transcript_9439
Error: [blastdbcmd] Skipped transcript_13586
Error: [blastdbcmd] Skipped transcript_17909
Error: [blastdbcmd] Skipped transcript_38053
Error: [blastdbcmd] Skipped transcript_38053
Error: [blastdbcmd] Skipped transcript_22088
Error: [blastdbcmd] Skipped transcript_34418
Error: [blastdbcmd] Skipped transcript_45393
Error: [blastdbcmd] Skipped transcript_45393
Error: [blastdbcmd] Skipped transcript_13587
Error: [blastdbcmd] Skipped transcript_13587
Error: [blastdbcmd] Skipped transcript_13587
etc…
Most of the transcripts are skipped in this step. As seen, all duplicates of the same transcript are skipped too.
I have tried using a smaller subset of transcripts and removing the duplicate matches to run the reciprocal blast but I encounter the same errors.
I also tried looking up database identifiers that I could be missing but I am not sure how to use these identifiers to query with blastdbcmd.
blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry all -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.all.fasta -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"
Please let me know any ways to reduce the amount of transcripts skipped by blastdbcmd. Any recommended parameter changes and other explanations are welcome. Thank you!