Hello, all,
Lately I read a few papers using intronic reads from RNA-seq data as a proxy for transcription rate, which is usually measured by nascent RNA-seq technologies. It looks like a very good idea, since it’s much easier to do a standard RNA-seq than nascent RNA-seq. In the following papar, they found a saw-tooth pattern across transcripts (Fig2 of the paper). When I downloaded their raw fastq datasets and algined them to hg38 using subread, I didn’t find that pattern. Intronic reads are more or less evenly distributed across introns. I definitely don't see more reads on 5' side than 3' side. In the paper, their reads were mapped with AB solid software tool which is not available any more.
My question is, when mapping intronic reads to reference genome, is there anything special we need to pay attention to?
This is how I did the alignment:
subread-buildindex -c -M 8000MB -o hg38_color_index Homo_sapiens.GRCh38.dna.primary_assembly.fa
subread-align -T 10 -b -t 0 -i ~/human_genome_hg38/hg38_color_index -r ERR042386.fastq.gz -o ERR042386.bam
Nat Struct Mol Biol. 2011 Nov 6;18(12):1435-40. doi: 10.1038/nsmb.2143.
Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain.
Thanks
If you are trying to reproduce the analysis, first try to use the same tools as shown in the methods section.