Question: filtering paired end mapped reads form SAM/BAM file
0
gravatar for Raghav
4.8 years ago by
Raghav100
Allahabad, India
Raghav100 wrote:

Dear All,

I performed paired end mapping of transcriptome reads (genrated on illumina platform) by bowtie2. after mapping I used Samtools for calculation of mapping statistics by using samtools flagstat. and output is

68714536 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

54522137 + 0 mapped (79.35%:-nan%)

68714536 + 0 paired in sequencing

34357268 + 0 read1

34357268 + 0 read2

44630360 + 0 properly paired (64.95%:-nan%)

46641928 + 0 with itself and mate mapped

7880209 + 0 singletons (11.47%:-nan%)

168844 + 0 with mate mapped to a different chr

145423 + 0 with mate mapped to a different chr (mapQ>=5)

how to interpret this maping output.

Apart from this, I want to filter only those mapped reads which are paired properly. For that perpose I used samtools again

samtools view  -f 1 -F 12 <bamfile> > pairedendmapps.bam

But I am not sure whether it working correctly or not.

waiting for your valuable comments

 

bam sam paired end maping bowtie2 • 5.7k views
ADD COMMENTlink modified 4.8 years ago by Devon Ryan90k • written 4.8 years ago by Raghav100
2
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

For getting only properly paired alignments, you're better off with samtools view -f 0x2 some_file. Your command will get discordant and concordant reads, just not singletons.

For interpreting the results of flagstat, it'd be helpful to know what's confusing. You have an ~80% alignment rate, which is OK for many organisms. 64% of all of the reads aligned as properly paired, which is good (i.e., ~80% of actual alignments are properly paired). You have a relatively high rate of pairs where only one read aligned (singletons). Of those, most didn't align with a mate aligning to a different chromosome...which is good unless you're expecting a lot of chromosomal rearrangements.

ADD COMMENTlink written 4.8 years ago by Devon Ryan90k

Thank you Sir for your kind reply

confusion:

46641928 + 0 with itself and mate mapped

168844 + 0 with mate mapped to a different chr

145423 + 0 with mate mapped to a different chr (mapQ>=5)

how to interpret these lines .

ADD REPLYlink written 4.8 years ago by Raghav100
1

The first number is the number of concordant and discordant mappings. The other two are looking at a subtype of discordant mappings. Normally we expect fragments that we sequence to contain sequence from a single chromosome. In some cases (e.g., in cancer), there may be chromosomal rearrangements, causing many reads to span multiple chromosomes. This can also happen due to artifacts introduced in library prep. and/or mapping. These last two numbers are giving you information on that. So you have 16884 reads where the mates mapped to different chromosomes. One possibility is that this is just due to one mate of the pair being poorly mapped, meaning that it would have a low MAPQ score. So the second one is giving you information about the number with a somewhat decent MAPQ score. In short, the 168844 number isn't due to poor mapping.

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

A great answer definitely. I have a problem about the procedure after filter. I used the command samtools view -hf 0x2 some_file for the sam file generated by bwa-mem to get only properly paired alignments for further analysis. After using the samtools sort to sort the bam file, am error occurred when I used samtools view to check the sorted bam file.

[W::sam_read1] parse error at line 1 [main_samview] truncated file.

And this bam file cannot be used for index either. I guess it is probably because the header of filtered sam file is still same as the header of the original sam file? Looking forward to your reply. Much appreciate.

ADD REPLYlink modified 22 months ago • written 22 months ago by ghostforever.shi50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1442 users visited in the last hour