Question: filter mapped and unmapped bam file
0
gravatar for mgadrianam
3.1 years ago by
mgadrianam20
Colombia
mgadrianam20 wrote:

Hi everry one

I have library single end, they are libraries of microRNAs

the raw data by fastqc : basis statistics       total sequences 17830678
after cutadapt and trimming de adapter 3ยด     total sequences 17683138

I used samtools to know the No of mapped read I used this comands

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

17334322

now I try to keep the unmapped read

first I counted unmapped reads as

samtools view -f 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | wc -l

0

I try different comands

samtools view -f 4 -c  AD_1.cutadapt.fastq.sam.bam.sorted.bam

0

and only for counting

samtools view -c AD_1.cutadapt.fastq.sam.bam.sorted.bam

35205186

I understand there are all reads and also multireads

but I want to keep the unmapped reads and it say 0, can someone help with all this. Thanks you very much for your help.

Adriana

rna-seq samtools • 1.5k views
ADD COMMENTlink modified 3.1 years ago by geek_y9.4k • written 3.1 years ago by mgadrianam20

what is the output of

samtools flagstat AD_1.cutadapt.fastq.sam.bam.sorted.bam
ADD REPLYlink written 3.1 years ago by geek_y9.4k

I did the comand :

35205186 + 0 (QC-passed reads + QC-failed read
0 + 0 duplicates
35205186 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly pareid (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

it means that I have only the mapped sequences? why shows me a differents numbers what is -nan% why in fastq after cutadapt it shows me, in basis statitics total sequences 17683138. Could you please explained me this discrepancy when I did

samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l

and give me 17334322 it corresponds at 98.03% and the rest of the reads?

thanks so very much for your help Adriana

ADD REPLYlink modified 3.1 years ago by Istvan Albert ♦♦ 80k • written 3.1 years ago by mgadrianam20
1

See how you have 35 million alignments but only 17 million single end reads? Clearly some (most) of your reads have multiple alignments, provided the same read name is not reused! nan means not a number and is there because they divided with zero.

The same read gives a primary and one or more secondary alignments. If all checks out filtering with flag -F 256 should give you the same number as the unique read names that you have.

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 80k
1

thanks you very much for your answer..

when I diid samtools view AD_1.cutadapt.fastq.sam.bam.sorted.bam -F 256 -c 35205186 it looks like the total read, and not unique

ADD REPLYlink written 3.1 years ago by mgadrianam20

In this case it may just be that you have identically names reads that correspond to different sequences. That's why the count for your unique read names is so much lower.

For example is one were to merges paired end files into one file and would align these as single end data it would lead to a situation like the one you observe.

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 80k

It looks like your alignment file does not have unmapped reads.

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 80k
0
gravatar for geek_y
3.1 years ago by
geek_y9.4k
Barcelona/CRG/London/Imperial
geek_y9.4k wrote:

From the flagstat output, there are no unmapped reads in the bam file. If its a paired-end data, make sure you have aligned it correctly, i.e in paired-end mode.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by geek_y9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour