Question

Annotation Issue using miRBASE gff

0

Entering edit mode

5 months ago

anthony.santana.703.j ▴ 30

Hello,

So as always I run into issue upon issue as I try to develop a pipeline for small RNA analsysis using Qiagen mirna Library kit. So I aligned against my genome and got some successful results, but when it came to annotating using the gff on miRBase no features popped up. The reference genome I used is a prebuilt bowtie index of the NCBI reference since it was available and I dont have the computing power to build an index on my computer. I am now stuck and really dont know how to continue forward, Im trying to use another build for alignment maybe its a coordinate issue or something Im not sure any help would be greatly appreciated Im so close to getting results, I know it but I grow disheartened as a significant amount of time was spent building this and frankly I dont just want to give up.

NGS miRNA • 1.2k views

ADD COMMENT • link updated 5 months ago by GenoMax 154k • written 5 months ago by anthony.santana.703.j ▴ 30

0

Entering edit mode

You can't mix and match genome sequence and annotations. If you used a prebuilt bowtie index from NCBI then you will need to find an annotation file that has miRNA/small RNA annotations in it.

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.gtf.gz human annotation has miRNA in it. More than likely your version of the pre-built indexes will not match the chromosome names in this file. If that happens, create your own indexes using the genome sequence that corresponds to this annotation and repeat alignment. Sequence is here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

As awlays Geno, thank you so much for helping, although I thought the same thing, I looked online the according to everything the gff from miRBASE should have been compatible. Ill see what I can do, like i said Im very limited due to my laptops available resources. Another reason I used the miRBASE gff with prebuitl genome is because I saw other pipelines use miRBASE annotations. Hopefully my project doesnt die here.

ADD REPLY • link 5 months ago by anthony.santana.703.j ▴ 30

1

Entering edit mode

I looked online the according to everything the gff from miRBASE should have been compatible.

I looked at the GFF file from miRBase and that indeed is the case. So in theory it should work.

but when it came to annotating using the gff on miRBase no features popped up

You will need to explain what is the issue? Are you using featureCounts?

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

Hello Geno,

You dont understand how much I appreciate you, but yes I used featureCounts my trimming is correct my leniancy on those trim features is fine i get alignment fine. But featureCounts is where everything falls apart, currently trying to use the UCSC build hg38, maybe something will work. Essentially 95% of my alignments got no features i was only able to obtain around 5% which were able to get features attached.

Current Feature: Status sample_dedup.bam

Assigned 26248

Unassigned_Unmapped 0

Unassigned_Read_Type 0

Unassigned_Singleton 0

Unassigned_MappingQuality 0

Unassigned_Chimera 0

Unassigned_FragmentLength 0

Unassigned_Duplicate 0

Unassigned_MultiMapping 0

Unassigned_Secondary 0

Unassigned_NonSplit 0

Unassigned_NoFeatures 10705776

Unassigned_Overlapping_Length 0

Unassigned_Ambiguity 1508

Aligned using the gtf from UCSC genome(results not shown), slight improvement not significant.

ADD REPLY • link 5 months ago by anthony.santana.703.j ▴ 30

0

Entering edit mode

Can you post the output of

samtools view your_alignment.bam | head -8

This should show us the "name" of the chromosome that is in your alignments and if there are actual alignments. It would also help if you can post your alignment and featureCount commands.

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

samtools view:

LH00150:566:22TM3VLT3:6:2196:18797:12321_CTTTTCTGCCTA   16      chr1    10571   255     16M     *       0       0       CGCCCTCGCGGTGATC        IIIIIIIIIIII9III     XA:i:1  MD:Z:13C2       NM:i:1  XM:i:11

LH00150:566:22TM3VLT3:6:2147:36214:7325_CGATCATGGACT    0       chr1    11110   255     16M     *       0       0       CGGTGGCGCGGCGCAT        IIIIIIIIIIIIIIII     XA:i:1  MD:Z:15G0       NM:i:1  XM:i:5

LH00150:566:22TM3VLT3:6:1267:34384:24555_AATACTGATGGT   0       chr1    11110   255     16M     *       0       0       CGGTGGCGCGGCGCAT        IIIIIIIIIIIII

alignment commands:

bowtie -a -v 1 --strata --best -m 5000 --chunkmbs 200  "${reference}/genome" "${trimmed_r1}" -S "${sample_id}.sam"

featureCount:

featureCounts -F GFF -s 0 -a "${gtf_file}" -o counts.txt -t miRNA -g Name "${dedup_bam}"(miRBASE)

featureCounts -F GTF -s 0 -a "${gtf_file}" -o counts.txt -t exon -g name_id "${dedup_bam}"(genome gtf)

i put both featureCounts ive used that didnt work well. I used subread featurecounts, since it takes both GTF and GFF.

ADD REPLY • link updated 5 months ago by GenoMax 154k • written 5 months ago by anthony.santana.703.j ▴ 30

1

Entering edit mode

Looking at the alignment file we know the reference chromosome names match the GTF from miRBase. So that should be fine. Your reads are also aligning (the example above) so that should be ok as well.

Can you add a -M (which will count multi-mapping reads, which these short reads are going to be) to your featureCounts and see if that makes a difference. Also adding a -f to summarize reads at the miRNA/exon level would be useful.

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

Thank you but, didnt make a difference, let me see what I can find, there has to be a reason.

ADD REPLY • link 5 months ago by anthony.santana.703.j ▴ 30

0

Entering edit mode

Hello Geno,

I figured out part of my issue, my gtf file was corrupt, and I also changed my featureCounts params:

featureCounts -F GTF -s 0 -a "${gtf_file}" -M -f -o counts.txt -O --fraction -t transcript -g gene_id "${dedup_bam}"

I then obtained:

Status  124988-8_dedup.bam

Assigned    6133010

Unassigned_Unmapped 0

Unassigned_Read_Type    0

Unassigned_Singleton    0

Unassigned_MappingQuality   0

Unassigned_Chimera  0

Unassigned_FragmentLength   0

Unassigned_Duplicate    0

Unassigned_MultiMapping 0

Unassigned_Secondary    0

Unassigned_NonSplit 0

Unassigned_NoFeatures   4600522

Unassigned_Overlapping_Length   0

Unassigned_Ambiguity    0

Which is 57% of my reads, I dont know whether this is good or not.

ADD REPLY • link updated 5 months ago by GenoMax 154k • written 5 months ago by anthony.santana.703.j ▴ 30

0

Entering edit mode

That is probably fine since you are counting only the miRNA using that GTF. You already compressed the data by desuplicating UMI. Take a look at the alignments with IGV to make sure the reads are piling up under miRNA models. You can also experiment removing the --fraction and -M to see what effect that has.

ADD REPLY • link 5 months ago by GenoMax 154k