Question: RNA seq NGS data analysis
1
gravatar for poonam.bi01
3.0 years ago by
poonam.bi0120
poonam.bi0120 wrote:

hello I am runninh tophat for alingment the query sequence with the references genome sequences with this command and getting this error..

tophat2 -o SRR643880.sra_out --num-threads 5 --segment-length 18 GCF_000002495 SRR643880.fastq 

 Beginning TopHat run (v2.0.13)
-----------------------------------------------
 Checking for Bowtie
Bowtie version:2.2.4.0
Checking for Bowtie index files (genome)..   
Checking for reference FASTA file
Warning: Could not find FASTA file GCF_000002495.fa
Reconstituting reference FASTA file from Bowtie index
Executing: /usr/bin/bowtie2-inspect GCF_000002495 > SRR643880.sra_out/tmp/GCF_000002495.fa
Generating SAM header for GCF_000002495
Preparing reads
left reads: min. length=36, max. length=36, 12103256 kept reads (9047 discarded)
 Mapping left_kept_reads to genome GCF_000002495 with Bowtie2 
 Mapping left_kept_reads_seg1 to genome GCF_000002495 with Bowtie2 (1/2)
 Mapping left_kept_reads_seg2 to genome GCF_000002495 with Bowtie2 (2/2)
 Searching for junctions via segment mapping
Coverage-search algorithm is turned on, making this step very slow
Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
 Retrieving sequences for splices
 Indexing splices
Building a SMALL index
Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
Joining segment hits
Reporting output tracks
[FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SRR643880.sra_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 18 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --no-closure-search --no-microexon-search --sam-header SRR643880.sra_out/tmp/GCF_000002495_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 SRR643880.sra_out/tmp/GCF_000002495.fa SRR643880.sra_out/junctions.bed SRR643880.sra_out/insertions.bed SRR643880.sra_out/deletions.bed SRR643880.sra_out/fusions.out SRR643880.sra_out/tmp/accepted_hits SRR643880.sra_out/tmp/left_kept_reads.mapped.bam,SRR643880.sra_out/tmp/left_kept_reads.candidates.bam SRR643880.sra_out/tmp/left_kept_reads.bam
Warning: no input BAM records found.
rna-seq • 1.4k views
ADD COMMENTlink modified 2.9 years ago by genomax64k • written 3.0 years ago by poonam.bi0120
2

@poonam.bi01: You are not running the latest versions of tophat/bowtie. Generally in case of tuxedo programs you should try and use the latest versions since these sorts of issues may have been addressed by new releases.
One more thing to check is you are not running out of storage space/hitting a quota.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax64k

Where's the error? Did you get any outputs?

Are you referring to "Warning: Could not find FASTA file GCF_000002495.fa"?

ADD REPLYlink written 3.0 years ago by jotan1.2k
1

i am getting the warning message Warning: no input BAM records found.

ADD REPLYlink written 3.0 years ago by poonam.bi0120
1

Sorry, didn't see that at the end.

Did you get any output files? Tophat writes out temporary files.

ADD REPLYlink written 3.0 years ago by jotan1.2k
1
gravatar for kanika.151
3.0 years ago by
kanika.15150
United States
kanika.15150 wrote:

Can you please write the command you gave for creating bowtie index?

Is it a single end data or paired end data?

Also, make sure that when you are running TopHat2 on your server you have enough memory available and space available. It can also be that the tmp files created had nothing inside it as the space is not available to store it.

ADD COMMENTlink written 3.0 years ago by kanika.15150

bowtie2-build -f GCF_000005425.2_Build_4.0_genomic.fna GCF_000005425

i used this command for running bowtai. and there is no memory problem.

how to know about data is single end or paired end...??

ADD REPLYlink written 3.0 years ago by poonam.bi0120
1

Paired end files have the naming convention filename_1 filename_2

If these are your own data, ask the person who created it.

If this is downloaded data, check the documentation.

If downloaded as an SRA, use --split-3 option with sra toolkit fastq-dump (fast-dump --split-3 filename.sra)

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by jotan1.2k

and for tophat i used command

tophat2 -o SRR643880.sra_out --num-threads 5  --segment-length 18 GCF_000002495 SRR643880.fastq     [ 6:53PM]

[2016-04-12 18:53:56] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2016-04-12 18:53:56] Checking for Bowtie
          Bowtie version:    2.2.4.0
[2016-04-12 18:53:56] Checking for Bowtie index files (genome)..
[2016-04-12 18:53:56] Checking for reference FASTA file
    Warning: Could not find FASTA file GCF_000002495.fa
[2016-04-12 18:53:56] Reconstituting reference FASTA file from Bowtie index
  Executing: /usr/bin/bowtie2-inspect GCF_000002495 > SRR643880.sra_out/tmp/GCF_000002495.fa
[2016-04-12 18:53:57] Generating SAM header for GCF_000002495
[2016-04-12 18:53:57] Preparing reads
     left reads: min. length=36, max. length=36, 12103256 kept reads (9047 discarded)
[2016-04-12 18:54:55] Mapping left_kept_reads to genome GCF_000002495 with Bowtie2 
[2016-04-12 18:57:57] Mapping left_kept_reads_seg1 to genome GCF_000002495 with Bowtie2 (1/2)
[2016-04-12 19:00:18] Mapping left_kept_reads_seg2 to genome GCF_000002495 with Bowtie2 (2/2)
[2016-04-12 19:02:07] Searching for junctions via segment mapping
    Coverage-search algorithm is turned on, making this step very slow
    Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
[2016-04-12 19:05:13] Retrieving sequences for splices
[2016-04-12 19:05:14] Indexing splices
Building a SMALL index
[2016-04-12 19:05:15] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2016-04-12 19:06:40] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2016-04-12 19:11:19] Joining segment hits
[2016-04-12 19:12:52] Reporting output tracks
    [FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SRR643880.sra_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 18 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --no-closure-search --no-microexon-search --sam-header SRR643880.sra_out/tmp/GCF_000002495_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 SRR643880.sra_out/tmp/GCF_000002495.fa SRR643880.sra_out/junctions.bed SRR643880.sra_out/insertions.bed SRR643880.sra_out/deletions.bed SRR643880.sra_out/fusions.out SRR643880.sra_out/tmp/accepted_hits SRR643880.sra_out/tmp/left_kept_reads.mapped.bam,SRR643880.sra_out/tmp/left_kept_reads.candidates.bam SRR643880.sra_out/tmp/left_kept_reads.bam
Warning: no input BAM records found.
ADD REPLYlink modified 2.9 years ago by genomax64k • written 3.0 years ago by poonam.bi0120
1
gravatar for kanika.151
2.9 years ago by
kanika.15150
United States
kanika.15150 wrote:

What do you have in your out directory?

Your data seems to be downloaded from SRA website and in their description they talk about the data being PE or SE? where PE stands for Paired end and SE stands for single end data.

your SRR643880.fastq can be left or right fastq files for PE data. If that is the case you need to find another part of it. I think it is a paired end file and it is the right fastq file. you need to find the left one as nothing got aligned to the genome which should come from left.fastq

command for PE data:

/opt/tophat2.10/tophat2 -p 8 -o tophat_out <indexsuffixfilename> <left.fastq> <right.fastq>

ADD COMMENTlink written 2.9 years ago by kanika.15150

Try to keep discussion as comments (do not post a new answer) unless you are offering a new answer.


The dataset in question is not a PE dataset. SRR643880 is SE.

ADD REPLYlink written 2.9 years ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour