RNASeq bulk transcriptomics analysis
0
1
Entering edit mode
4 weeks ago

The commands i used for my RNASeq bulk data analysis are

 1. trim_galore   --quality 30   --length 30   --cores 8   -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA   -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT   --basename CC1   --paired   --fastqc   CC1_S16_L002_R1_001.fastq.gz   CC1_S16_L002_R2_001.fastq.gz
 2. STAR --runThreadN 8 --runMode genomeGenerate --genomeDir Index --genomeFastaFiles GRCm39.genome.fa --sjdbGTFfile gencode.vM37.basic.annotation.gtf --sjdbOverhang 149
 3. STAR   --genomeDir /Index   --runThreadN 10   --readFilesIn CC1_val_1.fq.gz CC1_val_2.fq.gz   --outFileNamePrefix CC1_   --readFilesCommand zcat   --outSAMtype BAM SortedByCoordinate   --outSAMunmapped Within   --quantMode GeneCounts

Before trimming with Trimgalore After trimming A snippet of my gene counts

Using these commands i am getting only 37%mapped and others are not...37% is way too less a number...Maybe i am making an error somewhere, can anyone please suggest what more i can include to get a better and more mapped result.

RNASeq Trimgalore genecounts STAR fastqc • 942 views
ADD COMMENT
0
Entering edit mode

that is a low number indeed, however it can also be reflecting your data quality (and thus not necessarily mistakes in your bioinfo part) . The commands you include look OK to me.

The biggest issue is that 'noFeature' category which means they are mapped but not assigned to a (gene) feature.

Can you asses your input data quality? Eg. (and most importantly) was the data rRNA depleted for instance? Is the RNAseq derived from the same species/strain , ...

ADD REPLY
0
Entering edit mode

Where this noFeature reads can align to then? How to assess the quality> yes the method for library prep itself included rRNA depletion...RNASeq derived from the same species means?

ADD REPLY
2
Entering edit mode

The mouse genome is only about 2% coding exons. 98% of the genome is sequence that is not present in a mature mRNA transcript. Most of your reads are aligning to this sequence.

The three most common causes of this would be:

  1. Your rRNA depletion failed, and your reads are aligning to rRNA repeats that are not annotated
  2. Your reads map to introns, not exons. If you have done total RNA-seq, rather than poly-A selected RNA seq, I'd expect to up 50% of reads to map to introns, rather than exons. While each intron is lowly expressed, they represent 20-30x as much sequence, so generate many reads, in aggregate.
  3. Your reads map to intergenic sequence. This would usually suggest that your samples were contaminated with genomic DNA - most likely your DNase treatment or column based purification was insufficiently efficient.
ADD REPLY
0
Entering edit mode

1)I have checked for rRNA, but no such contamination was found. 2) & 3) Yes that could be the possibility though.

Thanks.

ADD REPLY
0
Entering edit mode

Hopefully, because there aren't any other explanations :)

ADD REPLY
0
Entering edit mode

just out of curiosity, and for completeness, can you provide the numbers that you used to get to the 'only 37% mapped' result?

ADD REPLY
0
Entering edit mode

42439450 these are the reads that got mapped.

ADD REPLY
0
Entering edit mode

that is the number when summing all the reads assigned to genes then? (== mapped & assigned)

what is the number of reads in the initial input file (fastq) ?

ADD REPLY
0
Entering edit mode

The initial fastq after trimming had 107653608 reads

ADD REPLY
0
Entering edit mode

Aha, I suspected something like this already :) :

as you have 107653608 input reads and only have 3098536 unmapped reads, you actually have a mapping rate of ~97%!! (far from the 37% you mentioned initially).

Yes, only have ~39% reads assigned , for explanations on that aspect see i.sudbery 's answer

ADD REPLY
0
Entering edit mode

Thank you for the reply

ADD REPLY

Login before adding your answer.

Traffic: 3257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6