Question: Bwa Aln Output In .Bam
1
gravatar for Dawnoflife
6.4 years ago by
Dawnoflife10
Dawnoflife10 wrote:

How can I use the bwa aln function and get the output in .bam format.

I know bwasw gives the output in .sam so I just used SAMTOOLS to do the conversion but don't really know how to do it in this case as the output is in .sai.

What I have been trying to do is this:

  1. Create the indexes for all the .FNA files.

  2. Align them using aln command.

  3. Take this output and convert it to .sam.

  4. The sam files are very large to store. Hence convert them to .bam.

  5. Data abstraction, like counting the ranges of values can't be performed on the binary (.bam) file, hence I converted that to a .txt file using SAMTOOLS.

  6. Perform the desired data abstraction on the txt and get the results.

The tricky part is the placement of the files as I have to do this for about 1500 files. My code looks like this:

.././bwa index NC_008765.fna

.././bwa aln NC_008765.fna ../QUERY/updated_635_25bp.fasta > out.sai

../bwa samse NC_008765.fna out.sai ../QUERY/updated_635_25bp.fasta > aln.sam


../samtools-0.1.18/samtools-0.1.18/samtools view -bS -F 4 aln.sam > output.bam


../samtools-0.1.18/samtools-0.1.18/samtools view output2.bam > seemee.txt

awk '{ print $1 }' seemee.txt > comns.txt

sed 's/.\?.$//;s/$/00/;s/^00$/0/' comns.txt | sort | uniq -c | sort -nr > final.txt

I need to do this for all the 1000+ input files and 6 different query files. Does the general steps make sense? Can be it optimized? the code has been running on for 2 hours now and its only about 1/10th of its way through. Thanks for your help!

bam samtools bwa • 13k views
ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 6.4 years ago by Dawnoflife10

Dear dawnoflife, you do not need to convert to BAM to, in the end, convert it to a text file. SAM is already a text file, so do what you want to do with your seemee.txt file directly with your aln.sam file.

ADD REPLYlink written 6.4 years ago by toni2.1k

SAM files are too large. they take up 36 mb as compared to < 1 mb for text files. unless I can perform the -F 4 operation when receiving the sam file itself.

ADD REPLYlink written 6.4 years ago by Dawnoflife10

So just do not use "-bS -F 4" but "-S -F 4" , you're output will still be in SAM format (test), and then use it like your seemee.txt file

ADD REPLYlink written 6.4 years ago by toni2.1k

you could then convert to BAM when you're done for persistent storage.

ADD REPLYlink written 6.4 years ago by toni2.1k

Is there any way I can pipe the sam file to bam directly and do the data analysis in b/w. I don't want to store the sam file.

ADD REPLYlink written 6.4 years ago by Dawnoflife10
5
gravatar for Vikas Bansal
6.4 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

Use ->

bwa samse <in.db.fasta> <in.sai> <in.fq> > <out.sam> for single end reads.

bwa sampe <in.db.fasta> <in1.sai> <in2.sai> <in1.fq> <in2.fq> > <out.sam> for paired end

You can use some other parameters which are described here.

ADD COMMENTlink written 6.4 years ago by Vikas Bansal2.3k

So we don't use the indexes created with bwa index -a

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Yes, the first step is to create index. and then you have 2 options -> aln and bwasw, depending on the type of reads. Since you used aln, it will give you binary file and then you have to use samse or sampe to get sam file.

ADD REPLYlink written 6.4 years ago by Vikas Bansal2.3k

Thanks for the help!

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Actually, if I am trying to convert this bam file back to txt, it is giving me problems. The code below works when I convert the bam files obtained from

 bwasw .

for file in *.bam;  do /samtools-0.1.18/samtools-0.1.18/samtools view $file > ../$file.txt; done

The error I get is:

 EOF marker is absent. The input is probably truncated .

Will really appreciate the help!

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Edited the question, I am having troubles converting the bam to .txt. I need to get data from the columns and I was planning on using

awk for it. Thanks for your help!

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Have a look at Tony's answer. When you are using samtools view, you have to tell it that your file is in sam format (because by default it take input as bam) by adding -S argument.

ADD REPLYlink written 6.4 years ago by Vikas Bansal2.3k

I did use the -bS flag when using SamTools. Also how do I put code in the comments? It doesn't seem to work.

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Not -bS , just -S because -b would output in BAM !!! Please read the samtools documentation calmly :)

ADD REPLYlink written 6.4 years ago by toni2.1k
3
gravatar for toni
6.4 years ago by
toni2.1k
Lyon
toni2.1k wrote:

'bwa aln' does not output in BAM format but in SAM which already is what you called a 'TXT' version of the BAM (binary format).

After that, [?]samtools[?] expects an input in BAM format so you have to specify the '-S' option to let it know that your input is in SAM format.

Synthesis :

bwa sampe <in.db.fasta> <in1.sai> <in2.sai> <in1.fq> <in2.fq> > <out.sam>

samtools view -S out.sam

You should be OK now.

ADD COMMENTlink written 6.4 years ago by toni2.1k

Is <in.db.fasta> the indexes created with

bwa index ? When i was just using by .fna file for this it did not accept the input. 

ADD REPLYlink written 6.4 years ago by Dawnoflife10

NO. As it is suggested by the name, it expects a FASTA file. This is the fasta sequence of your reference genome and this is already an input required by 'bwa index'.

ADD REPLYlink written 6.4 years ago by toni2.1k

<in.db.fasta> is your reference genome (eg hg18, hg19 in fasta format). The first step is to create index. Use the same file in bwa samse. It should be fine if all other index files are in same directory.

ADD REPLYlink written 6.4 years ago by Vikas Bansal2.3k

Can I just use .fna since its a FASTA file as well. I'll try it again but the .fna didn't work for me.

ADD REPLYlink written 6.4 years ago by Dawnoflife10

We cannot help you if you do not clarify what is not working.

ADD REPLYlink written 6.4 years ago by toni2.1k

Please see my edited question!

ADD REPLYlink written 6.4 years ago by Dawnoflife10
1
gravatar for Swbarnes2
6.4 years ago by
Swbarnes21.4k
Swbarnes21.4k wrote:

Are you doing something other than this to make out.bam?

bwa index ref.fa; 
bwa aln ref.fa read1.fq > r1.sai; 
bwa aln ref.fa read2.fq > r2.sai; 
bwa sampe ref.fa r1.sai r2.sai read1.fa read2.fq | samtools view -bSho out.bam -;

samtools view -S will do nothing. It will take a .sam file as input and output another .sam file.

(If you have a large index, like human, there's a slightly different index command)

ADD COMMENTlink written 6.4 years ago by Swbarnes21.4k

yes true, I wrote this without thinking but a simple 'cat' will then be enough

ADD REPLYlink written 6.4 years ago by toni2.1k

Kindly see the edited question!

ADD REPLYlink written 6.4 years ago by Dawnoflife10

Have you tried something like:

samtools view out.bam | awk '{print $1}' | sed 's/.\?.$//;s/$/00/;s/^00$/0/' - | sort | uniq -c | sort -nr > final.txt

And wouldn't it be easier to make an altered fastq and realign rather than altering the .bams?

ADD REPLYlink written 6.4 years ago by Swbarnes21.4k

Have you tried something like: samtools view out.bam | awk '{print $1}' | sed 's/.\?.$//;s/$/00/;s/^00$/0/' | sort | uniq -c | sort -nr > final.txt And wouldn't it be easier to make an altered fastq and realign rather than altering the .bams?

ADD REPLYlink written 6.4 years ago by Swbarnes21.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1844 users visited in the last hour