Question: Output whole header line in fasta file using bwa
gravatar for erikras1223
3.8 years ago by
United States
erikras122310 wrote:

Hello, I am trying to save the complete header lines in my Fasta file using BWA. Once I've mapped the reads to the references genome and I want to extract the ones that mapped and output them to a fasta file. I need the reads to have the complete header name they originally had.

After looking for a while I see the option: bwa mem -R ’@RG\tID:foo\tSM:bar’. The problem is I don't understand this string i need to input and I get an error every time I try to use it. I know the above string is just an example, but I would be very grateful if some could explain this. Or propose a different way to output the complete header line for the reads from bwa. Thanks

complete bwa header mem whole line • 1.4k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by erikras122310

I'm a bit confused on what you're trying to do and why. Are starting with a fasta file and you want to end up with a fasta file containing only the reads that map to the reference? What are you using readgroups for? Are the read headers important to keep unchanged, or are you just trying to use them for extracting reads?

ADD REPLYlink written 3.8 years ago by Brian Bushnell17k

I assume you want to save the entire fasta header (which has spaces in the name)? If that is the case you would need to convert those spaces to "_" and make the header a long string. Fasta format specification ignores anything that follows the first space in the header (which is how bwa is treating it, my guess).

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by genomax80k

Yes, this is exactly my question. I just want to be able to save the whole line of the header, but bwa is chopping some of the info off. I am later doing a search with the original header line to match against the bwa reads produced and they don't match.

ADD REPLYlink written 3.8 years ago by erikras122310

Note that the default behavior of BBMap is to NOT chop off header after the first whitespace, and it can directly output to fasta, like this: in=sequences.fasta outm=mapped.fasta outu=unmapped.fasta ref=reference.fasta
ADD REPLYlink written 3.8 years ago by Brian Bushnell17k

Either use BBMap or convert the spaces in the names to "_" like I said before, if you want to keep using bwa.

ADD REPLYlink written 3.8 years ago by genomax80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2247 users visited in the last hour