filter mapped and unmapped bam file
1
0
Entering edit mode
8.1 years ago
mgadrianam ▴ 30

Hi everry one

I have library single end, they are libraries of microRNAs

the raw data by fastqc : basis statistics       total sequences 17830678
after cutadapt and trimming de adapter 3ยด     total sequences 17683138

I used samtools to know the No of mapped read I used this comands

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

17334322

now I try to keep the unmapped read

first I counted unmapped reads as

samtools view -f 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | wc -l

0

I try different comands

samtools view -f 4 -c  AD_1.cutadapt.fastq.sam.bam.sorted.bam

0

and only for counting

samtools view -c AD_1.cutadapt.fastq.sam.bam.sorted.bam

35205186

I understand there are all reads and also multireads

but I want to keep the unmapped reads and it say 0, can someone help with all this. Thanks you very much for your help.

Adriana

rna-seq samtools • 2.6k views
ADD COMMENT
0
Entering edit mode

what is the output of

samtools flagstat AD_1.cutadapt.fastq.sam.bam.sorted.bam
ADD REPLY
0
Entering edit mode

I did the comand :

35205186 + 0 (QC-passed reads + QC-failed read
0 + 0 duplicates
35205186 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly pareid (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

it means that I have only the mapped sequences? why shows me a differents numbers what is -nan% why in fastq after cutadapt it shows me, in basis statitics total sequences 17683138. Could you please explained me this discrepancy when I did

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

and give me 17334322 it corresponds at 98.03% and the rest of the reads?

thanks so very much for your help Adriana

ADD REPLY
1
Entering edit mode

See how you have 35 million alignments but only 17 million single end reads? Clearly some (most) of your reads have multiple alignments, provided the same read name is not reused! nan means not a number and is there because they divided with zero.

The same read gives a primary and one or more secondary alignments. If all checks out filtering with flag -F 256 should give you the same number as the unique read names that you have.

ADD REPLY
1
Entering edit mode

thanks you very much for your answer..

when I diid samtools view AD_1.cutadapt.fastq.sam.bam.sorted.bam -F 256 -c 35205186 it looks like the total read, and not unique

ADD REPLY
0
Entering edit mode

In this case it may just be that you have identically names reads that correspond to different sequences. That's why the count for your unique read names is so much lower.

For example is one were to merges paired end files into one file and would align these as single end data it would lead to a situation like the one you observe.

ADD REPLY
0
Entering edit mode

It looks like your alignment file does not have unmapped reads.

ADD REPLY
0
Entering edit mode
8.1 years ago

From the flagstat output, there are no unmapped reads in the bam file. If its a paired-end data, make sure you have aligned it correctly, i.e in paired-end mode.

ADD COMMENT

Login before adding your answer.

Traffic: 2073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6