Apply conditions to forward/reverse strands on BAM file (RNA-Seq)
0
0
Entering edit mode
4.0 years ago
ek699 ▴ 10

Hi, I am new to bioinformatics and I am struggling with filtering out my bam file from STAR alignment (RNA-Seq).

After going through a few previous filtering process (properly paired, and uniquely mapping), the current bam file head looks like:

A00521:147:H2H22DSXY:1:2140:19777:35430 99      chr1    4691088 255     100M    =       4691193 205     CCTTCTCTCTCCTTGGCCATTGACTCTTTAACAACCCTGGAATGGTCTATATAAAAGCAACACTCCCCAGCTAAGGCAGCACGTAGCCCTCCTATTCTGC        :FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i:198        nM:i:0
A00521:147:H2H22DSXY:1:2140:19777:35430 147     chr1    4691193 255     100M    =       4691088 -205    TACTTCTGACAATGAAGTCCGTGATTCTTGAAGGTGACTGACGGAAGTTTCTATTTTCTCTACATCTAAATCTATGGCTGCCCTCAGACTGTCATAATTT        FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:        NH:i:1  HI:i:1  AS:i:198        nM:i:0
A00521:147:H2H22DSXY:1:2202:32841:28307 99      chr1    4692954 255     1S100M  =       4693013 158     GTCCTCTTCGAACTGCAACTTTTCTACGTTATCCCTTTTTTTTCTCTTCCCTAATCTCAGTTGGTGTTTTTTCTTGAAAGGCACAGCCATTCCAGGACTGG       F,FF,F:,FF:,FFFF,,F,,,::,,F,,F,FFFFFFF,,F:FFFFF:FFF,F::F:FFF,,::,FFF,,:FF::,F:F:FF,F:,FF,FF:,,FF,,F:F       NH:i:1  HI:i:1  AS:i:185        nM:i:6
A00521:147:H2H22DSXY:1:2202:32841:28307 147     chr1    4693013 255     99M     =       4692954 -158    TTGGTGTTTTTTCGTGAAAGGCACAGCCTGTCCAGGAATGGAAGCTGCCATTCCCGTGTCGTTGGTTTCCTGTACTACCTCCACTATGGTGGCTCATAC :::F::FFFF,:F,F:FFFFFF:FFFFF,,:FFF:FF,:F,FFFF,:F,F,,FFFF:::F::,::F,:FF,:,:F,FF:,FF:F:,,F,,F,FFFF,:F NH:i:1  HI:i:1  AS:i:185        nM:i:6
A00521:147:H2H22DSXY:1:2419:5240:32737  99      chr1    4724721 255     1S98M   =       4724765 144     ATCCCCCGTTGCACCCAGAATTTCTCTGAACTTAAGTCAGCTCAGCAAAAATGGAAAGTGAATGTCAGGGCAAAACCTGTACTAGAACCGTGGTAATAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1  HI:i:1  AS:i:196        nM:i:0
A00521:147:H2H22DSXY:1:2419:5240:32737  147     chr1    4724765 255     100M    =       4724721 -144    CAAAAATGGAAAGTGAATGTCAGGGCAAAACCTGTACTAGAACCGTGGTAATACAGAGTAAATGAGTCCCTGACTCTTACGGGTCTTTCATTCTAGTGAG        FFFFFFFFFFFFFFFF,FFFF:FFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFF:FFFFF        NH:i:1  HI:i:1  AS:i:196        nM:i:0
A00521:147:H2H22DSXY:1:2453:29595:28166 99      chr1    4740099 255     100M    =       4740159 160     GTGTCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTGTGACATATACATGAGTATCCACAGAAACCAGAAGAGGGTCTTGAATACGTGG        FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF,FFF::FF::,FFF:FFFFFF:FFFFFFFF:F::FF:F,:FF:        NH:i:1  HI:i:1  AS:i:198        nM:i:0
A00521:147:H2H22DSXY:1:1428:4065:14011  99      chr1    4740101 255     101M    =       4740159 158     GTGTCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTGTGACATATACATGAGTATCCACAGAAACCAGAAGAGGGTCTTGAATACGTGGAGT       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF,F,FF:FF:FF:FFF,,F:FFFFFFFFFFFFFF:F,FFFFFFF       NH:i:1  HI:i:1  AS:i:195        nM:i:2
A00521:147:H2H22DSXY:1:1428:4065:14011  147     chr1    4740159 255     100M    =       4740101 -158    GAGTATCCACAGAAACCAGAAGAGGGTCTTGAATACGTGGAGTGGGAAGGAGCGGAGGTTGAGAGTTTCCTGCGGTAGGAACTGGGAACTGAAGTTGGGT        FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i:195        nM:i:2
A00521:147:H2H22DSXY:1:2453:29595:28166 147     chr1    4740159 255     100M    =       4740099 -160    GAGTATCCACAGAAACCAGAAGAGGGTCTTGAATACGTGGAGTGGGAAGGAGCGGAGGTTGAGAGTTTCCTGCGGTAGGAACTGGGAACTGAAGTTGGGT        FFFFFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFF:FF:FFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i:198

And the current bam file flagstat looks like:

32151192 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
32151192 + 0 mapped (100.00% : N/A)
32151192 + 0 paired in sequencing
16075596 + 0 read1
16075596 + 0 read2
32151192 + 0 properly paired (100.00% : N/A)
32151192 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

From here, I am now trying to only keep reads based on following condition: For a read on forward strand, the start position of mate should greater than or equal to the start position of read, and conversely for reads on reverse strand.

For forward strand, I tried

samtools view -F 16 original.bam | awk '{if ($4<$8 || $4 ==$8) print $0}'

For reverse strand, I tried

samtools view -f 16 original.bam | awk '{if ($4>$8 || $4 ==$8) print $0}'

This works fine separately, but in this case, I need to merge two bam files at the end, since the original bam file was filtered separately.

Is there any way that I can apply those conditions all at once to the original bam file and output the new filtered bam file??

Or, is merging the two separated filtered bam files the best way?

Your help/assistance will be greatly appreciated.. Thank you so much.

RNA-Seq bam strands paired-ends STAR • 1.0k views
ADD COMMENT
0
Entering edit mode

Are you really finding a significant number of reads that are "properly paired" that fail your awk filter?

ADD REPLY
0
Entering edit mode

Yeah, I checked that there are still many reads that don't meet the conditions after filtered with "properly-paired".. I want to know how to pipe (combine) the two codes to generate one filtered bam file..

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6