Question: syntax tophat2 for reference index
1
gravatar for marongiu.luigi
8 days ago by
United Kingdom
marongiu.luigi90 wrote:

Hello,

I want to use Tophat to align reading for RNA-seq analysis. I downloaded the sequences SRR390728_1.fastq and SRR390728_2.fastq, quality trimmed with

java -jar /usr/bin/trimmomatic PE SRR390728_1.fastq SRR390728_2.fastq paired1.fq unpaired_1.fq paired2.fq unpaired_2.fq SLIDINGWINDOW:4:20 MINLEN:20 ILLUMINACLIP:/usr/local/lib/Trimmomatic/adapters/TruSeq2-PE.fa:2:30:10:1:true

and then downloaded the reference human sequences Homo_sapiens.GRCh38.dna.toplevel.fa and Homo_sapiens.GRCh38.90.gtf, which I renamed Hsapiens_GRCh38.fa and Hsapiens_GRCh38.gtf respectively. I then indexed them with

bowtie2-build -f  Hsapiens_GRCh38.fa Hsapiens_GRCh38
tophat2 -G Hsapiens_GRCh38.gtf --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38

and then copied everything in the same folder, so that the content of the folder in use is:

$ ls
align.sam               Hsapiens_GRCh38.rev.1.bt2l  SRR390728_2.fastq
Hsapiens_GRCh38.1.bt2l  Hsapiens_GRCh38.rev.2.bt2l 
Hsapiens_GRCh38.2.bt2l  Hsapiens_GRCh38.tr          
Hsapiens_GRCh38.3.bt2l  paired1.fq                  unpaired_1.fq
Hsapiens_GRCh38.4.bt2l  paired2.fq                  unpaired_2.fq
Hsapiens_GRCh38.fa      Hsapiens_GRCh38.gtf     SRR390728_1.fastq

with Hsapiens_GRCh38.tr being a folder.

Then I ran the alignment with:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 11:26:19] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:26:19] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 11:26:19] Checking for Bowtie index files (transcriptome)..
Error: Could not find Bowtie 2 index files Hsapiens_GRCh38.tr.*.bt2l)

Then I gave the index as a path adding './' and I got:

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq
   [2017-10-12 11:27:36] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 11:27:36] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 11:27:37] Checking for Bowtie index files (genome)..
[2017-10-12 11:27:37] Checking for reference FASTA file
[2017-10-12 11:27:37] Generating SAM header for Hsapiens_GRCh38
Error: Opening file ./Hsapiens_GRCh38.tr.gff

The content of Hsapiens_GRCh38.tr is:

./Hsapiens_GRCh38.tr$ ls
Hsapiens_GRCh38.1.bt2  Hsapiens_GRCh38.fa         Hsapiens_GRCh38.rev.2.bt2
Hsapiens_GRCh38.2.bt2  Hsapiens_GRCh38.fa.tlst    Hsapiens_GRCh38.ver
Hsapiens_GRCh38.3.bt2  Hsapiens_GRCh38.gff
Hsapiens_GRCh38.4.bt2  Hsapiens_GRCh38.rev.1.bt2

My questions are:

  1. Have I made an error in the syntax? (how Hsapiens_GRCh38.tr.gff came out?)

  2. it is possible to give a path to the .tr folder and the other indices, so I can use a single folder for all the alignments?

Thank you

rna-seq • 134 views
ADD COMMENTlink modified 7 days ago by e.rempel510 • written 8 days ago by marongiu.luigi90
1

I want to use Tophat to align reading for RNA-seq analysis.

Only reason to continue using TopHat at this time is nostalgia.

There are much better/accurate tools that you should switch to. Even authors of TopHat have suggested using HISAT2 (their new tool). STAR, BBMap (any other splice aware aligner) are excellent other choices.

ADD REPLYlink written 8 days ago by genomax34k

I know, it is nostalgia indeed. I am already switching to STAR and HISAT, but I wanted to get to know Tophat for completion. The index was built a folder above Hsapiens_GRCh38.tr (actually the latter was created by Tophat).

ADD REPLYlink written 8 days ago by marongiu.luigi90

The transcriptome index was built inside Hsapiens_GRCh38.tr folder? Are the missing files in there?

ADD REPLYlink written 8 days ago by genomax34k

The indexing looked done OK, there were no error messages. I provided the list of files in use; does not look to me there is something missing.

ADD REPLYlink written 8 days ago by marongiu.luigi90
1
gravatar for e.rempel
7 days ago by
e.rempel510
Germany, Heidelberg, COS
e.rempel510 wrote:

Ok. Let's start from the beginning. First, I would update Bowtie2 to the current version (2.3.3.1). You must check .gtf and .fasta files for compatibility (same chr names etc).

Bowtie index (hg38... files will be created)

bowtie2-build -f Hsapiens_GRCh38.fa hg38

The command for transcriptome index generation must be (creates Hsapiens_GRCh38...)

tophat2 -G Hsapiens_GRCh38.gtf --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 hg38

TopHat2 run

tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 hg38 paired1.fq paired2.fq
ADD COMMENTlink written 7 days ago by e.rempel510

Thank you. I have upgraded Bowtie to 2.3.3.1 using the precompiled version for linux 64 bit; the version 2.2.9.0 I was using instead was generated from the source with make. However, while with the former version it took me overnight to generate the index, with the newer I had to stop it after 48 h because the process was still running. Is that normal? this slow mode is perhaps due to the precompiled version?

ADD REPLYlink written 5 days ago by marongiu.luigi90

It worked. I built bowtie2 2.3.3.1 from source and this time everything went smoothly. I could even use a fixed folder where I ma keeping the reference indices (/home/RefSeq/Human):

tophat2 -o SRR390728_aln --transcriptome-index=/home/RefSeq/Human/Hsapiens_GRCh38.tr/Hsapiens_GRCh38 /home/RefSeq/Human/hg38 paired1.fq paired2.fq
ADD REPLYlink written 3 days ago by marongiu.luigi90
0
gravatar for e.rempel
8 days ago by
e.rempel510
Germany, Heidelberg, COS
e.rempel510 wrote:

You should include the prefix Hsapiens_GRCh38 in the path to the index:

--transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38
ADD COMMENTlink written 8 days ago by e.rempel510

I got the following:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr/Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:05:10] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:05:10] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:05:10] Checking for Bowtie index files (transcriptome)..
[2017-10-12 16:05:10] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (paired1.fq.*.bt2l)

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr/Hsapiens_GRCh38 paired1.fq paired2.fq


[2017-10-12 16:05:32] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:05:32] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:05:32] Checking for Bowtie index files (transcriptome)..
[2017-10-12 16:05:32] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (paired1.fq.*.bt2l)

I reckon the base name 'Hsapiens_GRCh38' should really be on its own.

ADD REPLYlink written 8 days ago by marongiu.luigi90

I even tried to remove the .tr from the index name:

$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38 Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:12:02] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:12:02] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:12:02] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (Hsapiens_GRCh38.*.bt2l)

Yet the *.bt2l files are in the current folder. Even by moving them into Hsapiens_GRCh38.tr, there is an error:

$ tophat2 -o SRR390728_aln --transcriptome-index=Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:16:32] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:16:32] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:16:32] Checking for Bowtie index files (transcriptome)..
Error: Could not find Bowtie 2 index files Hsapiens_GRCh38.tr.*.bt2l)
$ tophat2 -o SRR390728_aln --transcriptome-index=./Hsapiens_GRCh38.tr Hsapiens_GRCh38 paired1.fq paired2.fq

[2017-10-12 16:16:51] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-10-12 16:16:51] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2017-10-12 16:16:51] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (Hsapiens_GRCh38.*.bt2l)
ADD REPLYlink written 8 days ago by marongiu.luigi90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1582 users visited in the last hour