Question

mirbase gff and mature fastas for custom STAR index genome?

0

Entering edit mode

28 days ago

RNAseqer ▴ 260

I have performed single cell mirna seq. i have the full GRch38 genomes' gtf and fasta files. I am looking to align my reads to the genome and get a .bam file, get the mature miRNA counts table from that using Hiseq, and continue to a seurat analysis.

Now, as I understand it, there are STAR parameters that can be used to align snRNA-seq to the genome and generate .bam files using the full human genome GTF/fasta as inputs. However, I have read in various locations that STAR cannot handle mature miRNA seq specifically due to its very short reads. If so, my hope was to create an indexed genome generated using the mirbase GFF3 file (converted to a GTF) and the mature mirna fasta files provided by mirbase (using the GRch38 fasta with the mirbase GFF3 has this odd issue of the mirnas falling outside of the maximum length of the chromosomes...I'm not sure whats going on there and have not been able to find the exact fasta file mirbase used).

First question: would it be appropriate to use the mature mirna fastas with the hsa.gff3 file to create a genome index for in STAR? If so is how do you deal with the incompatibility caused by the individual gene fasta's having labels that are gene names when STAR expects those fastas to be labelled with chromosome information?

Once I have my bam files sorted I intend to use HTseq's count feature to get the count table for use in seurat. Currently, using the full genome and gtf for grch38 I'm getting MANY instances of multimapping reads, as would be expected given their length, but I was hoping if I can create my bam file using mirbase GTF/fastas this wont be a problem.

Second question: if I am way off base here and there is a straightforward way to do this I would very much appreciate the help!

mirbase indexing star genome • 163 views

ADD COMMENT • link 27 days ago by RNAseqer ▴ 260

0

Entering edit mode

after converting the hairpin fasta files from mirbase to DNA this worked, leaving me with a whole new set of downstream problems because making the indexed genome -> sorted.bam gives you a bam file with no chromosomes matching the gff. so essentially useless. id love to know how to get around that little problem

./STAR --runMode genomeGenerate --genomeDir miRNA_index --genomeFastaFiles /scratch/output.fasta --genomeSAindexNbases 6 --sjdbGTFtagExonParentTranscript /scratch/out.gff3

ADD REPLY • link 27 days ago by RNAseqer ▴ 260