Reference genome for mapping RNA-seq spike-in dataset
Entering edit mode
6.9 years ago

Using spike-in controls is a common way of evaluating statistical methods while finding differentially expressed genes. Having Fastq files containing ERCC controls and the corresponding gtf file for the ERCCs, how can one does the alignment step with TopHat? For instance if the samples are from human, we have the fastq file, hg19 reference and the ERCC.gtf file. How can one use TopHat to align the fatsq files to the reference genome while they include the ERCC reads? Should we combine the hg19 reference genome with the ERCC. gtf file? Following article can be an example of this situation:

How should we include the ERCC controls information to the reference genome used in Tophat?

Thanks for the help.

RNA-Seq alignment • 7.8k views
Entering edit mode
6.9 years ago

You're on the right track. The ERCC sequences should be available as FASTA you can append to your reference genome as more chromosomes.  Then tophat/bowtie will put the reads that belong to them onto those chromosomes. If you're using a GTF, go ahead and attach them there too, knowing they're unspliced single-exon sorts of mRNA.

Be aware some reads of human genome will fit on to some of the ERCC chromosomes as well. It's not many but it's not zero.



Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6