Hi everyone,
I have 6 files of paired-end 75 nt RNA-Seq reads from HEK293 I want to map onto the AAV genome. I got the reference genome as a fasta file and the annotation file as gff3/gtf from NCBI. For mapping onto the human genome, I used the STAR mapper, which worked brilliantly. But I do not know how to proceed with the much smaller viral genome and its transcript variants. How do I quantify the different transcript variants in this case?
I tried replacing "CDS" in the gtf file with "exon", since no exon entries seemed to lead to fatal errors, but that did not work out as expected. A lot of the mapped reads are either ambiguos or multimapped.
(--genomeFastaFiles) AAV reference genome from NCBI
(--genomeSAindexNbases) 5, based on ~4700 bp genome
- (--sjdbGTFfile) gff3-derived gtf file where CDS is replaced by exon, gff3 file from NCBI
- (--sjdbOverhang) 148, because average read length was 149
All other parameters were left as default.
Btw, I am working with Galaxy.
Any tips are appreciated!
Cheers