I have used ERCC spike-in in my RNAseq data. I have aligned my data and now I have bam files. to count the reads per gene I used htseq-count(which needs gtf file). I also have to count ERCC (I have 98 spike-in). I have fasta file of ERCCs. do you know how I can count the ERCCs?
Question: counting ERCCs spike-ins in RNAseq data
0
alirezamomeni707 • 0 wrote:
ADD COMMENT
• link
•
modified 3.5 years ago
by
Charles Plessy • 2.7k
•
written
3.5 years ago by
alirezamomeni707 • 0
0
Charles Plessy • 2.7k wrote:
You can filter out and count the spike sequences (and rRNA, and linker) before alingment with TagDust 2. In the following document on GitHub, I used it to detect ArrayControl spikes, but it also work with ERCC ones. For maximal accuracy, make sure to use the translated sequences available from the NIST, and not the plasmid insert sequences (see Patch ERCC spike sequences to get their real 5-prime ends. for the long story). In any case, if you use the TagDust approach, make sure your sequences do not contain common parts such as polyA tails.
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1503 users visited in the last hour
Ideally you could have appended the ERCC fasta to the genome and then aligned your data. Since you have not done that you would need to create a new "genome" (and a GTF file to go with it) and then align/count.
is it not possible to align to ERCC fasta and it's GTF post alignment (using aligned bam)? (aligned bam = aligned with reference fasta other than ERCC)
You can filter them out and quantify them with BBMap's Seal using the aligned bam.
Since ERCC sequences should be totally diverse it should not matter what you use.
thanks. actually I have aligned to ERCC (made index from fasta file). but I do not have GTF file for that. actually this is the main problem
You make one up yourself.