Question

RNASeq bulk transcriptomics analysis

1

Entering edit mode

4 weeks ago

rajdeepboral00 ▴ 70

The commands i used for my RNASeq bulk data analysis are

 1. trim_galore   --quality 30   --length 30   --cores 8   -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA   -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT   --basename CC1   --paired   --fastqc   CC1_S16_L002_R1_001.fastq.gz   CC1_S16_L002_R2_001.fastq.gz
 2. STAR --runThreadN 8 --runMode genomeGenerate --genomeDir Index --genomeFastaFiles GRCm39.genome.fa --sjdbGTFfile gencode.vM37.basic.annotation.gtf --sjdbOverhang 149
 3. STAR   --genomeDir /Index   --runThreadN 10   --readFilesIn CC1_val_1.fq.gz CC1_val_2.fq.gz   --outFileNamePrefix CC1_   --readFilesCommand zcat   --outSAMtype BAM SortedByCoordinate   --outSAMunmapped Within   --quantMode GeneCounts

Before trimming with Trimgalore After trimming A snippet of my gene counts

Using these commands i am getting only 37%mapped and others are not...37% is way too less a number...Maybe i am making an error somewhere, can anyone please suggest what more i can include to get a better and more mapped result.

RNASeq Trimgalore genecounts STAR fastqc • 942 views

ADD COMMENT • link 26 days ago by rajdeepboral00 ▴ 70

0

Entering edit mode

that is a low number indeed, however it can also be reflecting your data quality (and thus not necessarily mistakes in your bioinfo part) . The commands you include look OK to me.

The biggest issue is that 'noFeature' category which means they are mapped but not assigned to a (gene) feature.

Can you asses your input data quality? Eg. (and most importantly) was the data rRNA depleted for instance? Is the RNAseq derived from the same species/strain , ...

ADD REPLY • link 4 weeks ago by lieven.sterck 15k

0

Entering edit mode

Where this noFeature reads can align to then? How to assess the quality> yes the method for library prep itself included rRNA depletion...RNASeq derived from the same species means?

ADD REPLY • link 4 weeks ago by rajdeepboral00 ▴ 70

2

Entering edit mode

The mouse genome is only about 2% coding exons. 98% of the genome is sequence that is not present in a mature mRNA transcript. Most of your reads are aligning to this sequence.

The three most common causes of this would be:

Your rRNA depletion failed, and your reads are aligning to rRNA repeats that are not annotated
Your reads map to introns, not exons. If you have done total RNA-seq, rather than poly-A selected RNA seq, I'd expect to up 50% of reads to map to introns, rather than exons. While each intron is lowly expressed, they represent 20-30x as much sequence, so generate many reads, in aggregate.
Your reads map to intergenic sequence. This would usually suggest that your samples were contaminated with genomic DNA - most likely your DNase treatment or column based purification was insufficiently efficient.

ADD REPLY • link 4 weeks ago by i.sudbery 21k

0

Entering edit mode

1)I have checked for rRNA, but no such contamination was found. 2) & 3) Yes that could be the possibility though.

Thanks.

ADD REPLY • link 4 weeks ago by rajdeepboral00 ▴ 70

0

Entering edit mode

Hopefully, because there aren't any other explanations :)

ADD REPLY • link 4 weeks ago by lieven.sterck 15k

0

Entering edit mode

just out of curiosity, and for completeness, can you provide the numbers that you used to get to the 'only 37% mapped' result?

ADD REPLY • link 4 weeks ago by lieven.sterck 15k

0

Entering edit mode

42439450 these are the reads that got mapped.

ADD REPLY • link 4 weeks ago by rajdeepboral00 ▴ 70

0

Entering edit mode

that is the number when summing all the reads assigned to genes then? (== mapped & assigned)

what is the number of reads in the initial input file (fastq) ?

ADD REPLY • link 4 weeks ago by lieven.sterck 15k

0

Entering edit mode

The initial fastq after trimming had 107653608 reads

ADD REPLY • link 4 weeks ago by rajdeepboral00 ▴ 70

0

Entering edit mode

Aha, I suspected something like this already :) :

as you have 107653608 input reads and only have 3098536 unmapped reads, you actually have a mapping rate of ~97%!! (far from the 37% you mentioned initially).

Yes, only have ~39% reads assigned , for explanations on that aspect see i.sudbery 's answer