Hello everyone! I'm doing QC analysis of my RNA-seq data so we can do the library prep for more samples
We got one sample that failed (5M reads) and one good to go (60M reads). We got this number of reads via FASTQC on the fastq file after trimming adapters and bad quality (<30) bases.
But when I map with galaxy, in STAR RNA (2pass filter hg38) and then align with htseq, we got 140 million sequences aligned, half duplicated and half unique, which is very good for transcript.
My question is if that this is normal, because I've never seen the number of sequences so much bigger than number of reads (i'm a newbie). 140Million is more than we expected!
Also, only less than 5% aligned to rRNA using CollectRNAmetrics, some aligned to intergenic regions (probably lncRNA) and some were small (microRNA)
You have secondary (a read mapped to more than one location) and supplementary alignments in your file.