Hello,
So as always I run into issue upon issue as I try to develop a pipeline for small RNA analsysis using Qiagen mirna Library kit. So I aligned against my genome and got some successful results, but when it came to annotating using the gff on miRBase no features popped up. The reference genome I used is a prebuilt bowtie index of the NCBI reference since it was available and I dont have the computing power to build an index on my computer. I am now stuck and really dont know how to continue forward, Im trying to use another build for alignment maybe its a coordinate issue or something Im not sure any help would be greatly appreciated Im so close to getting results, I know it but I grow disheartened as a significant amount of time was spent building this and frankly I dont just want to give up.
You can't mix and match genome sequence and annotations. If you used a prebuilt bowtie index from NCBI then you will need to find an annotation file that has miRNA/small RNA annotations in it.
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.gtf.gz human annotation has miRNA in it. More than likely your version of the pre-built indexes will not match the chromosome names in this file. If that happens, create your own indexes using the genome sequence that corresponds to this annotation and repeat alignment. Sequence is here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz
As awlays Geno, thank you so much for helping, although I thought the same thing, I looked online the according to everything the gff from miRBASE should have been compatible. Ill see what I can do, like i said Im very limited due to my laptops available resources. Another reason I used the miRBASE gff with prebuitl genome is because I saw other pipelines use miRBASE annotations. Hopefully my project doesnt die here.
I looked at the GFF file from miRBase and that indeed is the case. So in theory it should work.
You will need to explain what is the issue? Are you using
featureCounts
?Hello Geno,
You dont understand how much I appreciate you, but yes I used featureCounts my trimming is correct my leniancy on those trim features is fine i get alignment fine. But featureCounts is where everything falls apart, currently trying to use the UCSC build hg38, maybe something will work. Essentially 95% of my alignments got no features i was only able to obtain around 5% which were able to get features attached.
Current Feature: Status sample_dedup.bam
Assigned 26248
Unassigned_Unmapped 0
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 10705776
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 1508
Aligned using the gtf from UCSC genome(results not shown), slight improvement not significant.
Can you post the output of
This should show us the "name" of the chromosome that is in your alignments and if there are actual alignments. It would also help if you can post your alignment and
featureCount
commands.samtools view:
alignment commands:
featureCount:
i put both featureCounts ive used that didnt work well. I used subread featurecounts, since it takes both GTF and GFF.
Looking at the alignment file we know the reference chromosome names match the GTF from miRBase. So that should be fine. Your reads are also aligning (the example above) so that should be ok as well.
Can you add a
-M
(which will count multi-mapping reads, which these short reads are going to be) to yourfeatureCounts
and see if that makes a difference. Also adding a-f
to summarize reads at the miRNA/exon level would be useful.Thank you but, didnt make a difference, let me see what I can find, there has to be a reason.
Hello Geno,
I figured out part of my issue, my gtf file was corrupt, and I also changed my featureCounts params:
I then obtained:
Which is 57% of my reads, I dont know whether this is good or not.
That is probably fine since you are counting only the miRNA using that GTF. You already compressed the data by desuplicating UMI. Take a look at the alignments with IGV to make sure the reads are piling up under miRNA models. You can also experiment removing the
--fraction
and-M
to see what effect that has.