Tophat Protocol - nature paper
1
0
Entering edit mode
9.4 years ago
David_emir ▴ 490

Hi All,

I was doing differential gene and transcript expression analysis by following the Tophat protocol (published in Nature protocol in March 2012) on the same example D. melanogaster as it is mentioned in the paper. But I am confused of an error/warning message at step 1 itself: align the RNA -seq reads to the genome using the command:

[root@BIO-DT-415 RNA_SEQ]# tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq

Here, genes.gtf was collected from iGenome package of Ensembl_BDGP5.25 of the species. The reference genome "genome.fa" was also from the same package.

After executing the command , I am getting the following error:

`[2014-11-25 14:05:25] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2014-11-25 14:05:25] Checking for Bowtie
          Bowtie version:     2.2.4.0
[2014-11-25 14:05:25] Checking for Bowtie index files (genome)..
[2014-11-25 14:05:25] Checking for reference FASTA file
[2014-11-25 14:05:25] Generating SAM header for genome
IOError: [Errno 2] No such file or directory: 'C1_R1_1.fq'

I fail to understand what went wrong, please let me know how to create 'C1_R1_1.fq' .

I have following files from the RAW data downloaded from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32038

NGS/RNA_SEQ/GSE32038_RAW/GSM794483_C1_R1.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794483_C1_R1.transcripts.gtf.gz
NGS/RNA_SEQ/GSE32038_RAW/GSM794484_C1_R2.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794484_C1_R2.transcripts.gtf.gz
NGS/RNA_SEQ/GSE32038_RAW/GSM794485_C1_R3.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794485_C1_R3.transcripts.gtf.gz
NGS/RNA_SEQ/GSE32038_RAW/GSM794486_C2_R1.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794486_C2_R1.transcripts.gtf.gz
NGS/RNA_SEQ/GSE32038_RAW/GSM794487_C2_R2.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794487_C2_R2.transcripts.gtf.gz
NGS/RNA_SEQ/GSE32038_RAW/GSM794488_C2_R3.accepted_hits.bam
NGS/RNA_SEQ/GSE32038_RAW/GSM794488_C2_R3.transcripts.gtf.gz

I am not sure or I am totally lost on how to get this working. Please let me know your suggestions, I am totally new to this right now I am sounding very stupid, but I am totally helpless. Need your views guys.

Thanks a ton!

Ateeq

RNA-Seq • 4.5k views
ADD COMMENT
5
Entering edit mode
9.4 years ago

You actually wanted to download this file. The file names seem to have been chosen a bit poorly, so I'm surprised more people haven't run into this exact problem (well, they probably have, but then just gave up rather than asking for help as you wisely did).

Let me know if that file doesn't have the correct fastq files in it and I'll have a look again.

ADD COMMENT
0
Entering edit mode

Hi Devon !!!

Thanks a lot ! i got the files and its perfectly working till now . I am sure i will get few more doubts, i will come back here again.

Thank you so much, Its great help for me .

ADD REPLY
0
Entering edit mode

Hi Devon,

I tried aligning the RNA-seq reads to the genome using tophat, used the following command

[root@BIO-DT-415 RNA_SEQ]# tophat -p 8 -G genes.gtf -o C1_R1_thout genome GSM794483_C1_R1_1.fq GSM794483_C1_R1_2.fq

It's throwing error:

[2014-11-26 16:00:53] Building transcriptome data files C1_R1_thout/tmp/genes
[2014-11-26 16:00:57] Building Bowtie index from genes.fa
[2014-11-26 16:04:35] Mapping left_kept_reads to transcriptome genes with Bowtie2
[2014-11-26 16:15:06] Mapping right_kept_reads to transcriptome genes with Bowtie2
[2014-11-26 16:25:31] Resuming TopHat pipeline with unmapped reads
[2014-11-26 16:25:31] Reporting output tracks
    [FAILED]
Error running /usr/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir C1_R1_thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations genes.gtf --gtf-juncs C1_R1_thout/tmp/genes.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header C1_R1_thout/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/local/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 genome.fa C1_R1_thout/junctions.bed C1_R1_thout/insertions.bed C1_R1_thout/deletions.bed C1_R1_thout/fusions.out C1_R1_thout/tmp/accepted_hits C1_R1_thout/tmp/left_kept_reads.m2g.bam C1_R1_thout/tmp/left_kept_reads.bam C1_R1_thout/tmp/right_kept_reads.m2g.bam C1_R1_thout/tmp/right_kept_reads.bam
Loaded 51950 junctions

I am not able to decode this, can you help me please ?

Thanks a lot.

-Ateeq

ADD REPLY
2
Entering edit mode

Usually that happens when you run out of memory. The only real way to diagnose why it failed is to run that line manually yourself and see what the underlying error message is that it reports. This is assuming that all of the temp. files are still there.

ADD REPLY
0
Entering edit mode

Thanks Devon, thanks for coming to my rescue again! I have increased the RAM size to 16 gigs, now it's running fine. Hoping to finish this protocol by this week! I am sure I will be back again to this forum. Thank you so much, it's great help for me!

ADD REPLY

Login before adding your answer.

Traffic: 2585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6