Variation In The Unique Reads Stats

0

Entering edit mode

10.5 years ago

alok.helix ▴ 120

Hello thank you for reading the post!!

I am trying to comapre the statistics of mapping of reads to the reference genome. In my tophat based alignment i used the tags grep -w "NH:i:1" and grep -w "HI:i:1" both gave me the same answer. I also made use of a ready made python scriptBAMstat.py) from the toolkit of RNAseqQC for calculating the number of unique reads this also gave me the same answer as my command line command.

On performing BWA alignment with the 0.6.2-r126 by using aln, sampe i generated my sam and bam file from the paired end illumina data. I searched the unique reads by utilizing the code grep -w "XT:A:U" and grep -w "X1:i:0" to search for the number of unique reads both gave me the answer with a variation of about 5 million reads with "X1:i:0" giving a higher number.

Upon using the BAMstat.py script i got a significantally lower uniquely aligned reads in comparision to tophat unique reads...Why is there so much variation in the read stats??

genomics illumina bwa tophat alignment bowtie2 • 2.2k views

ADD COMMENT • link 10.5 years ago by alok.helix ▴ 120

Login before adding your answer.