Question

Mapping short 26-32nt RPF reads to human genome

0

Entering edit mode

4.7 years ago

Adrian Pelin ★ 2.6k

I was wondering what are some concerns/considerations when mapping short reads to the human genome. I realize that the first Illumina technologies were providing reads as short as 36bp, but nowadays standard Illumina sequencing provides >100bp reads, often paired-end. My experimental reads come from ribosomal profiling, where mRNA regions shielded by the ribosome remain undigested. These can be sequenced but are very small. 26-32nt.

1) What is the best genome to map to? Currently, I map all my transcriptome experiments to dna_sm (soft mask of repetitive regions) ensembl version of the human genome. Does that increase the odds of erroneous mapping, or should I be mapping to dna_rm (repetitive regions masked with N).

2) What aligner/counter should I use? Typically I use HISAT2 and StringTie to map and get TPM counts. Would something like bowtie2 work better? Should I specify any additional parameters to HISAT2?

3) When calculating transcript levels, do tools like cufflinks and StringTie use the total amount of reads mapped to the genome, or total reads within the regions of the GTF file supplied?

Thanks

RNA-Seq Ribosomal profiling Mapping RPF • 764 views

ADD COMMENT • link 4.7 years ago by Adrian Pelin ★ 2.6k