Using igenome to run a seq analysis with Tophat/Cofflinks.....but how do I add the ERCC sequences to the the reference transcriptome and reference genome?
1
2
Entering edit mode
7.8 years ago
BioinfGuru ★ 1.7k

My task is to perform RNAseq analysis on the UCSC RN4 genome by first mapping the reads I have to the transcriptome and then mapping the leftover reads to the genome.

I have downloaded the genome from igenome (http://support.illumina.com/sequencing/sequencing_software/igenome.html) because it seems to have all files in the format required for tophat/bowtie.

My first problem is adding the ERCC sequence file (http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_095047.txt). I really dont which files these sequences must be added to. I know they must be added to both the reference transcriptome and genome but I don't know where in the download from igenome to find the appropriate files.

Advice would be much appreciated.

Thanks.

RNA-Seq Rna-seq analysis tophat ERCC mapping • 2.9k views
ADD COMMENT
0
Entering edit mode

Thanks for your reply Genomax....here is what I have done:

Using the ERCC files with the igenome download:

  1. Append ERCC92.gtf to genes.gtf with the command "cat ERCC92.gtf >> genes.gtf"
  2. Append ERCC92.fa to genome.fa wiht the command "cat ERCC92.fa >> genome.fa"
  3. Add ERCC.fa to the directory containing the chromosomes

Is this sufficient? How do I "rebuild all of the index files" as you suggest? is this still needed?

Regards.

ADD REPLY
0
Entering edit mode

Rebuilding aligner indexes is needed since the pre-existing indexes do not have information about the additional sequences you added to the genome file.

Please do all these steps (move the appended genome and GTF file) in a separate directory. This way there would be less chance of using files from the original iGenomes download by mistake.

Are you planning to use TopHat (you should consider using HISAT2 instead)? If yes then you would use bowtie2-build program from bowtie v.2.x (you will need to download it, if you don't have it). Details are in TopHat manual.

ADD REPLY
0
Entering edit mode

Excellent...I have that....thank you so much. I just started the build programme. I'm guessing this is going to take a few hours. This is part of a University assignment so I have to use TopHat for now.

Thanks again

ADD REPLY
1
Entering edit mode
7.8 years ago
GenoMax 141k

If you need to add the ERCC sequences then they would have to be appended to the genome.fa file in the Sequence/WholeGenomeFasta directory.

Keep in mind that as soon as you do that you would need to rebuild all of the index files (or ones you are planning to use at least). You would still be able to use the GTF file for genome data but if you need to count the ERCC spike-in then information for those sequences would have to be added to the GTF files.

Can you align/count to the ERCC spike-in separately (since those sequences would presumably be very different) using iGenomes files as they are?

ADD COMMENT

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6