Entering edit mode
8.1 years ago
mgadrianam
▴
30
Hi everry one
I have library single end, they are libraries of microRNAs
the raw data by fastqc : basis statistics total sequences 17830678
after cutadapt and trimming de adapter 3ยด total sequences 17683138
I used samtools to know the No of mapped read I used this comands
samtools view -F 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | cut -f1 | sort | uniq | wc -l
17334322
now I try to keep the unmapped read
first I counted unmapped reads as
samtools view -f 0x4 AD_1.cutadapt.fastq.sam.bam.sorted.bam | wc -l
0
I try different comands
samtools view -f 4 -c AD_1.cutadapt.fastq.sam.bam.sorted.bam
0
and only for counting
samtools view -c AD_1.cutadapt.fastq.sam.bam.sorted.bam
35205186
I understand there are all reads and also multireads
but I want to keep the unmapped reads and it say 0, can someone help with all this. Thanks you very much for your help.
Adriana
what is the output of
I did the comand :
it means that I have only the mapped sequences? why shows me a differents numbers what is -nan% why in fastq after cutadapt it shows me, in basis statitics total sequences 17683138. Could you please explained me this discrepancy when I did
and give me 17334322 it corresponds at 98.03% and the rest of the reads?
thanks so very much for your help Adriana
See how you have 35 million alignments but only 17 million single end reads? Clearly some (most) of your reads have multiple alignments, provided the same read name is not reused!
nan
means not a number and is there because they divided with zero.The same read gives a primary and one or more secondary alignments. If all checks out filtering with flag
-F 256
should give you the same number as the unique read names that you have.thanks you very much for your answer..
when I diid samtools view AD_1.cutadapt.fastq.sam.bam.sorted.bam -F 256 -c 35205186 it looks like the total read, and not unique
In this case it may just be that you have identically names reads that correspond to different sequences. That's why the count for your unique read names is so much lower.
For example is one were to merges paired end files into one file and would align these as single end data it would lead to a situation like the one you observe.
It looks like your alignment file does not have unmapped reads.