Expected Run Times for USeq MakeTranscriptome and NovoAlign
1
0
Entering edit mode
5.4 years ago

Dear community,

does somebody have experience with run times for the RNA-Seq alignment with NovoAlign? Especially USeq MakeTranscriptome for creating annotations as suggested in [1] seems to take pretty long. It would be important for me to have some estimate, how much time the alignment or at least some steps needs to finish or alternatively how large the resulting files are.

I run the transcriptome assembly on both hg19 and hg38 with the following options:

java -jar /opt/useq/Apps/MakeTranscriptome -f <path-to-hg19-fastq.gz-files-per-chromosome-from-ucsc-golden-path> -u <path-to-refFlat-txt-from-ucsc-genome-browser> -r 96 -n 60000 -m 10 -s

This runs now for three days on hg19. Thereafter, I will run NovoAlign with the following commands:

novoindex -n hg19 <path-for-index> <path-to-masked-genome> <path-to-transcriptome-file-1> <path-to-transcriptome-file-2>

novoalign -o SAM -f <forward-fastq> <reverse-fastq> -d <path-to-index> -r All 10 -v 0 70 70 '[>]([^:]*)'

Finally, I will need to fix the coordinates:

java -jar /opt/useq/Apps/SamTranscriptomeParser -f  <path-to-sam-file> -a 50000 -n 100 -u

I run everything on a machine with 8x Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz and 48 GB RAM, for a GM12878 data set with approximately 118,000,000 paired reads [2].

If you have any estimates for any step, I would highly appreciate it!

Cheers, Tamara

[1] http://www.novocraft.com/documentation/novoalign-2/novoalign-user-guide/rnaseq-analysis-mrna-and-the-spliceosome/

[2] https://www.encodeproject.org/experiments/ENCSR000COQ/

RNA-Seq USeq MakeTranscriptome NovoAlign • 966 views
ADD COMMENT
1
Entering edit mode
5.3 years ago

Ok, so I'll answer it myself if anyone is interested in this in the future. However, I had to further modify the setup, as described below.

TL;DR - in total the whole process would significantly exceed two days with different annotations than recommended in the user guide.

MaskExonsInFastaFiles

Although not listed in the question, this step is recommended by the user guide [1], it only takes two minutes.

MakeTranscriptome

novoindex did not work with annotations build from the RefFlat annotations [2] referenced in the user guide because the resulting files were too large. Instead, I tried it with the BED file from Baruzzo et al. [3] that was converted to RefFlat with their scripts [4], which took 25 hours. These files need to be treated with a Perl script from the user guide to remove duplicates anyways, so I omitted the -s option for MakeTranscripome. The Perl script took 33 minutes.

novoindex

novoindex takes 28 minutes, with the masked genome files and annotation files as input (two minutes without annotations).

novoalign

I tested novoalign on a simulated data set also provided by Baruzzo et al. [5] with 10,000,000 paired-end reads, which takes two hours without annotations. With annotations, I abandoned the execution after 24 hours due to time shortage. Therefore, I did not test SamTranscriptomeParser.

[1] [NovoAlign User Guide](http://www.novocraft.com/documentation/novoalign-2/novoalign-user-guide/rnaseq-analysis-mrna-and-the-spliceosome/)

[2] [UCSC RefFlat annotations](http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=212485719&clade=mammal&org=Human&db=hg19&hgta_group=genes&hgta_track=refGene&hgta_table=refFlat&hgta_regionType=genome&position=chr21%3A33031597-33041570&hgta_outputType=all&hgta_outFileName=refFlat.txt.gz)

[3] Baruzzo et al. annotations

[4] [Baruzzo et al. conversion scripts](https://bitbucket.org/baruz/aligner-benchmark/src/b40bdc7a34dc/scripts/?at=master)

[5] [Baruzzo et al. data set](http://bp1.s3.amazonaws.com/human_t1r1.tar.bz2)

ADD COMMENT

Login before adding your answer.

Traffic: 3003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6