Question

filter mapped and unmapped bam file

0

Entering edit mode

8.1 years ago

mgadrianam ▴ 30

Hi everry one

I have library single end, they are libraries of microRNAs

the raw data by fastqc : basis statistics       total sequences 17830678
after cutadapt and trimming de adapter 3´     total sequences 17683138

I used samtools to know the No of mapped read I used this comands

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

17334322

now I try to keep the unmapped read

first I counted unmapped reads as

samtools view -f 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | wc -l

0

I try different comands

samtools view -f 4 -c  AD_1.cutadapt.fastq.sam.bam.sorted.bam

0

and only for counting

samtools view -c AD_1.cutadapt.fastq.sam.bam.sorted.bam

35205186

I understand there are all reads and also multireads

but I want to keep the unmapped reads and it say 0, can someone help with all this. Thanks you very much for your help.

Adriana

rna-seq samtools • 2.6k views

ADD COMMENT • link updated 8.1 years ago by GouthamAtla 12k • written 8.1 years ago by mgadrianam ▴ 30

0

Entering edit mode

what is the output of

samtools flagstat AD_1.cutadapt.fastq.sam.bam.sorted.bam

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

I did the comand :

35205186 + 0 (QC-passed reads + QC-failed read
0 + 0 duplicates
35205186 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly pareid (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

it means that I have only the mapped sequences? why shows me a differents numbers what is -nan% why in fastq after cutadapt it shows me, in basis statitics total sequences 17683138. Could you please explained me this discrepancy when I did

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

and give me 17334322 it corresponds at 98.03% and the rest of the reads?

thanks so very much for your help Adriana

ADD REPLY • link updated 8.1 years ago by Istvan Albert 100k • written 8.1 years ago by mgadrianam ▴ 30

1

Entering edit mode

See how you have 35 million alignments but only 17 million single end reads? Clearly some (most) of your reads have multiple alignments, provided the same read name is not reused! nan means not a number and is there because they divided with zero.

The same read gives a primary and one or more secondary alignments. If all checks out filtering with flag -F 256 should give you the same number as the unique read names that you have.

ADD REPLY • link 8.1 years ago by Istvan Albert 100k

1

Entering edit mode

thanks you very much for your answer..

when I diid samtools view AD_1.cutadapt.fastq.sam.bam.sorted.bam -F 256 -c 35205186 it looks like the total read, and not unique

ADD REPLY • link 8.1 years ago by mgadrianam ▴ 30

0

Entering edit mode

In this case it may just be that you have identically names reads that correspond to different sequences. That's why the count for your unique read names is so much lower.

For example is one were to merges paired end files into one file and would align these as single end data it would lead to a situation like the one you observe.

ADD REPLY • link 8.1 years ago by Istvan Albert 100k

0

Entering edit mode

It looks like your alignment file does not have unmapped reads.

ADD REPLY • link 8.1 years ago by Istvan Albert 100k

score 0 · Answer 1 · 2016-03-15

0

Entering edit mode

8.1 years ago

GouthamAtla 12k

From the flagstat output, there are no unmapped reads in the bam file. If its a paired-end data, make sure you have aligned it correctly, i.e in paired-end mode.

ADD COMMENT • link 8.1 years ago by GouthamAtla 12k