Question: counting ERCCs spike-ins in RNAseq data
0
gravatar for alirezamomeni707
2.2 years ago by
alirezamomeni7070 wrote:

I have used ERCC spike-in in my RNAseq data. I have aligned my data and now I have bam files. to count the reads per gene I used htseq-count(which needs gtf file). I also have to count ERCC (I have 98 spike-in). I have fasta file of ERCCs. do you know how I can count the ERCCs?

rna-seq • 1.5k views
ADD COMMENTlink modified 2.2 years ago by Charles Plessy2.7k • written 2.2 years ago by alirezamomeni7070

Ideally you could have appended the ERCC fasta to the genome and then aligned your data. Since you have not done that you would need to create a new "genome" (and a GTF file to go with it) and then align/count.

ADD REPLYlink written 2.2 years ago by genomax73k

is it not possible to align to ERCC fasta and it's GTF post alignment (using aligned bam)? (aligned bam = aligned with reference fasta other than ERCC)

ADD REPLYlink written 2.2 years ago by cpad011212k
1

You can filter them out and quantify them with BBMap's Seal using the aligned bam.

seal.sh in=aligned.bam ref=ERCC.fa out=filtered.bam stats=stats.txt k=31
ADD REPLYlink written 2.2 years ago by Brian Bushnell16k

Since ERCC sequences should be totally diverse it should not matter what you use.

ADD REPLYlink written 2.2 years ago by genomax73k

thanks. actually I have aligned to ERCC (made index from fasta file). but I do not have GTF file for that. actually this is the main problem

ADD REPLYlink written 2.2 years ago by alirezamomeni7070

You make one up yourself.

ADD REPLYlink written 2.2 years ago by genomax73k
0
gravatar for Charles Plessy
2.2 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

You can filter out and count the spike sequences (and rRNA, and linker) before alingment with TagDust 2. In the following document on GitHub, I used it to detect ArrayControl spikes, but it also work with ERCC ones. For maximal accuracy, make sure to use the translated sequences available from the NIST, and not the plasmid insert sequences (see Patch ERCC spike sequences to get their real 5-prime ends. for the long story). In any case, if you use the TagDust approach, make sure your sequences do not contain common parts such as polyA tails.

ADD COMMENTlink written 2.2 years ago by Charles Plessy2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1613 users visited in the last hour