extract reads where one pair of read is at the annotated region and other not
1
0
Entering edit mode
7.8 years ago
nikkihathi ▴ 30

Hello!

I have a bam alignment file from lifescope. I would like to extract reads that fulfil following criteria:

  1. one pair is aligned to the annotated region (on exom) and
  2. the other pair aligned to the intron region

I tried the following command but was unable to grep the condition.

    samtools view -F 14 input_file.bam | grep -v "     =       " > ouput_file.sam

I have searched for the answer and played with different options for samtools view but failed to find that satisfies the criteria.

Thanks

SOLID bam RNA-Seq exome • 2.1k views
ADD COMMENT
0
Entering edit mode

There's no possible way to do this with a combination of samtools and grep. You're going to write a script using pysam or something similar to do this.

ADD REPLY
3
Entering edit mode
7.8 years ago

get the read names mapping the exons, and sort

samtools view -F 2820 -L exon.bed  in.bam | cut -f 1 | LC_ALL=C sort > name1.txt

get the read names mapping the introns, and sort

samtools view -F 2820 -L intron.bed  in.bam | cut -f 1 | LC_ALL=C sort > name2.txt

2820 : read unmapped, not primary alignment , read fails platform/vendor quality checks supplementary alignment

get the common read names:

comm -12 name1.txt  name2.txt

go back to the bam, and get the reads with those names.

EDIT: you might get some reads overlapping BOTH exon and intron. you can play with bedtools to remove those reads...

ADD COMMENT
0
Entering edit mode

The -F 2820 bit might not be necessary depending on the alignment, these could be primary mapped reads. I know when I process my exomes I map against the whole genome because of the way the baited capture fragments work. I expect to get primary, proper alignments that would meet these criteria.

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6