Bbmap filterbyname.sh to extract sequences out of a fasta file - wrong output
0
0
Entering edit mode
4.5 years ago

Hi guys,

I wanted to extract specific sequences out of a fasta file based on its accession number. Therefore I used bbmap filterbyname.sh and at the first view it looked perfectly fine. I run this command:

filterbyname.sh in=DB.fasta out=SequencesOfInterest.fasta names=Accessions_DB.txt include=t fixjunk

My name file contains 9499 accessions but for some reason bbmap gives me 9504 sequences in the output file. I picked random sequences and double checked if it worked properly and so far it did. But is it possible that some accessions are twice in the output? If so, how can I identify them? Or did I miss an important parameter in my command? I never used this script before.

Thanks in advance for your help!

Cheers

filterbyname.sh bbmap • 4.0k views
ADD COMMENT
0
Entering edit mode

Have you checked to see if you have some accessions twice in your input/source file? If you post examples of a couple of your fasta headers we can tell you how to extract/sort and identify if they are unique.

ADD REPLY
0
Entering edit mode

The format of the accessions is varies a lot, depending on how the data was generated. e.g.

>Delta1_2004206643 orf2_0004_16 Aldehyde ferredoxin oxidoreductase [O.algarvensis Delta1] # COG COG2414(C)

>Host_265335_c1_seq1_5 [136 - 2] (REVERSE SENSE) len=239 path=[217 0-135 353 136-238]

>Delta3_G2_84585.peg.319 scaffold_0 Molybdopterin biosynthesis protein MoeA / Periplasmic molybdate-binding domain

Does that help?

ADD REPLY
1
Entering edit mode

You would find that many other programs would be defeated by those spaces in the fasta headers. Looks like BBMap did reasonably well. I would still suggest checking if your input had duplicate fasta headers (if not actual sequences).

You could try grep "^>" | sort | wc -l and then grep "^>" | sort| uniq -c on input file and see if those numbers match. This process may need a significant amount of memory and may crash.

ADD REPLY
0
Entering edit mode

Great, thanks for your advice!

ADD REPLY

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6