Question: How can I save the output from samtools and awk as a bam file?
0
gravatar for GK1610
4.2 years ago by
GK161080
United States
GK161080 wrote:

My input file .bam files with mapped reads is bam1 from step 1

Step 1 samtools view -F 0X4 > **bam1**

Step 2 I want to keep only those reads which are paired and mapped. so I got this flag no from here

samtools view **bam1** | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > **bam2**

I have 20 bam2s from different samples and I want to merge them but I am getting this error bamtools merge ERROR: could not open input BAM file(s)... Aborting.

Step 3 I then redid step 2

samtools view -H **bam1** > bam2

samtools view **bam1** | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > **bam2**

samtools view -bS t**bam2** >  **newbam2**

**[E::sam_parse1] missing SAM header
[W::sam_read1] parse error at line 1
[main_samview] truncated file.**

how can I resolve this issue?

chip-seq • 3.7k views
ADD COMMENTlink modified 2.5 years ago by genomax91k • written 4.2 years ago by GK161080

Where did you create a file called tbam2?

Also, if there is a header in bam1, surely the awk after the pipe removes it. Perhaps you should add another if statement there to deal with header lines.

Also, for more clarity, please use four spaces before commands or the "code sample" function.

Also, in your awk, do you think

&& $9 >= -10000 && $9 <= 10000

affects all the "ors" or just the last one?

$ cat test.file
1   1
2   2
$ awk 'BEGIN{OFS=FS="\t"}{if($1==1 || $1==2 && $2==2){print}}' test.file
1   1
2   2
$ awk 'BEGIN{OFS=FS="\t"}{if(($1==1 || $1==2) && $2==2){print}}' test.file
2   2
ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by 5heikki9.0k
5
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Sometimes it's easier to write a little python script:

#/usr/bin/env python
import pysam

bam = pysam.AlignmentFile("your file.bam")
output = pysam.AlignmentFile("your output.bam", template=bam)
for read in bam.fetch():
    if read.is_paired and not read.is_unmapped and not read.mate_is_unmapped and abs(read.template_length) < 10000:
        output.write(read)
bam.close()
output.close()

I haven't actually run that, so there may be a typo, but that'll be rather more convenient than a multistep samtools->awk->samtools thing.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Devon Ryan97k
import pysam
bf = pysam.AlignmentFile(fname, "rb")
output = pysam.AlignmentFile("output.bam", "w", template=bf)
for read in bf.fetch(until_eof=True):
if read.is_paired and not read.is_unmapped and not read.mate_is_unmapped and abs(read.template_length) < 10000:
output.write(read)
bf.close()
output.close()

I am getting this error 
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "pysam/calignmentfile.pyx", line 1345, in pysam.calignmentfile.AlignmentFile.write (pysam/calignmentfile.c:14774)
File "pysam/calignmentfile.pyx", line 1374, in pysam.calignmentfile.AlignmentFile.write (pysam/calignmentfile.c:14696)
ValueError: sam write failed

do you know what could have been wrong?

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by GK161080
1

Presumably 'wb' rather than 'w' on the third line.

ADD REPLYlink written 4.2 years ago by Devon Ryan97k
2
gravatar for Pierre Lindenbaum
4.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

java -jar dist/samjs.jar  -e 'record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && Math.abs(record.getInferredInsertSize())< 10000'   -o out.bam in.bam
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Pierre Lindenbaum131k
1

That's pretty neat...

ADD REPLYlink written 4.2 years ago by Brian Bushnell17k

Pierre, out of curiosity, can you comment on the use of JavaScript for bioinformatics tasks like this one? What advantages does it have over a Java or python solution? I know nothing about Javascript and it's not a language we see very often around here...

ADD REPLYlink written 4.2 years ago by dariober11k
1

, can you comment on the use of JavaScript for bioinformatics tasks

sure. Java contains a javascript engine .

One can insert any java object into a javascript context; Hence, any class that have been written for the java library for HTS (htsjdk) can be used as a simple javascript object without writing a new specific js API from scratch and we can stick to the BAM specification.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Pierre Lindenbaum131k

Thanks Pierre

One can insert any java object into a javascript context

But why would you want to do that in the first place? Why not just write a pure Java program? Is it because JavaScript is easier to develop than Java, a bit like Scala?

ADD REPLYlink written 4.2 years ago by dariober11k
1

Why not just write a pure Java program?

sounds like "why writing an awk script when you could write a C program" :-)

it's faster to write and run ( No need to create a source code + no need to compile). But the job will take more time than a pure compiled java program.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Pierre Lindenbaum131k
2
gravatar for GK1610
4.2 years ago by
GK161080
United States
GK161080 wrote:

solution

`samtools view -h **input.bam** | awk '$1~"@" || $2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 '| awk '$1~"@" || $9 >= -10000 && $9 <= 10000'| samtools view -Shb - > **output.bam**

`java -jar dist/samjs.jar  -e record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && Math.abs(record.getInferredInsertSize())< 10000'   -o out.bam in.bam
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by GK161080
1
gravatar for Simply Bioinformatics
2.5 years ago by
WashingtonDC
Simply Bioinformatics170 wrote:
samtools view --threads 2 -h in.bam | awk '{if($3 != "chrM" && $3 != "chrUn"){print $0}}' | samtools view --threads 2  -Shb - > out.bam
ADD COMMENTlink modified 2.5 years ago by genomax91k • written 2.5 years ago by Simply Bioinformatics170
awk -F '\t' '($0 ~/^@/ || !($3=="chrM" || $3=="chrUn"))'
ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum131k
0
gravatar for GK1610
4.2 years ago by
GK161080
United States
GK161080 wrote:

okay I really want this to work

suggestion1. pysam doesnt work

suggestion2. picard--doesnt solve the problem of removing reads and mates with fragment length >10KB

all I want is to convert samtools view bam file + awk operations ---> readable bam file

samtools view bam1 | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > "**readable bam file**"
ADD COMMENTlink modified 2.5 years ago by genomax91k • written 4.2 years ago by GK161080

doesnt solve the problem of removing reads and mates with fragment length >10KB

updated my code with :

&& Math.abs(record.getInferredInsertSize())< 10000

...

ADD REPLYlink written 4.2 years ago by Pierre Lindenbaum131k
0
gravatar for swbarnes2
4.2 years ago by
swbarnes28.8k
United States
swbarnes28.8k wrote:

For one thing, samtools view something.bam is going to output a sam, not a bam. You should be able to confirm this by just looking at it.

Second, you should only need those first 4 flag values, I do't think you can have a properly paired pair where both are in the same direction.

Third, you need to use >> to write in append mode. Your second command that has > bam2 is over-writing the results from the first one.

ADD COMMENTlink written 4.2 years ago by swbarnes28.8k

Second, you should only need those first 4 flag values, I do't think you can have a properly paired pair where both are in the same direction.

Así es la vida : Tophat : Sam-Flag 115 = Properly-Paired + Read.Reverse + Mate.Reverse ?

ADD REPLYlink written 4.2 years ago by Pierre Lindenbaum131k
0
gravatar for i.sudbery
2.5 years ago by
i.sudbery9.4k
Sheffield, UK
i.sudbery9.4k wrote:

Am being dense? What's wrong with:

samtools -F 12 -F 3 -bh input.bam  > bam2.bam

explaination:

-F excludes reads with any of the bits set. 12 is 0x4 + 0x8: Read is unmapped, mate is unmapped

-f includes only reads with all the bits set. 2 is 0x1 + 0x1: Read is paired. Read is in proper pair.

-b is output bam (rather than the default sam)

-h is output header.

Still would of course depend on your mapper setting the proper paired flag properly, but most do right?

ADD COMMENTlink written 2.5 years ago by i.sudbery9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2293 users visited in the last hour