Entering edit mode
12 months ago
harish
•
0
I am trying to see if whole transcriptome sequencing has good coverage for miRNA as well. when I load BAM files from one of the samples, and look for miRNA, I can see that there are reads in the regions with no gene (no protein coding/RNA genes present). How can RNAseq cover these genomic regions? In the image attached, you can see that MIR4521 has some reads, BORCS6 gene has no reads, but there is a small region between them with good reads. Is it some novel genomic region with transcriptional activity or is it an artifact from sequencing?
What kind of read structure is your seq run? Single end/paired? If the latter, you might want to limit your bam file to only properly paired mapped reads and see if this signal disappears/decreases.
These are single-end sequencing runs. My guess is that they are coding regions that have not got an HGNC symbol yet, and maybe they have some ensemble ID. Is my guess right??
The reads all seem to have a precise end point on the left of that peak so that argues for a directed transcriptional stop rather than noise or gDNA contamination. Could be any unannotated transcriptional element. Also load the latest GENCODE GTf file, they're more comprehensive than RefSeq.
I am very new to miRNA analysis and I currently just looking at the BAM files in IGV to check, whether the normalized counts of miRNA genes that I see at the end of the whole transcriptome analysis actually relate to miRNA counts or is it just the counts from host gene (which due to its larger size, should have been sequenced).
Is it go way to have an initial opinion and are there any other tools to just have the first opinion? Of course, if I find something reasonable then my collaborator would assist me in further analysis.