
2) created genome file and build index
3) loaded sorted.gff by loading from file
**It showed like this. The ref genes look normal.
4) After loading my bam file, results showed like this.
I'm not sure how I can look into whether it's due to mismatching gtf based on the results I currently can obtain from IGV even after googleing this problem with no hints. But when I went through the annotaiton file(gtf), I found most of genes belong to CDS instead of exon.
I'm wondering if this is the reason why the genes can't be mapped to reference annotation(STAR maps only exon?). Could you tell me more? Thank you so much!**
I also tried to use kallisto and only returned 97 lines of results which were very few since I were assuming there would be more than hundreds of lines.
Plus, after looking back fastqc results, it showed that the difference between A/T and G/C is larger than 10% and also there're a lot of overrepresented sequences which belong to L.gasseri after I blasted (the reference genome I used).
!
enter image description here
Do you think those suggested that the problem is library prep itself? Really appreciate for your kind reply!!
Do you expect your organism to have transcripts with exons and introns? Bacteria generally don't. The IGV looks a little weird, generally bacterial read should not be aligning with massive gaps.
My bad, just revised and updated my previous comments. Yes, bacteria generally don't have introns. But as u can see in my previous comments, in the annotation file of this bacteria, most of genes belong to cds instead of exons. I'm wondering if this is the reason why the genes can't be mapped to reference annotation(STAR maps only exon?) And do you think the massive gaps could be due to incorrect library pre or any other reasons? Thank you!
You might have to play with the settings of featureCount to make sure that it is counting things aligning to CDS regions.
Thank you so much! The problem has been resolved!