Question

Tophat input ambiguity

0

Entering edit mode

9.3 years ago

Prasad ★ 1.6k

Hi all,

Need help in understanding the output. I have illumina pair end data. While running tophat i have made a mistake i guess. Instead of running

tophat2 --output-dir PABPC --num-threads 15 --solexa-quals Index/Planarian sample1_R1.fastq sample1_R2.fastq

i have excuted

tophat2 --output-dir PABPC --num-threads 15 --solexa-quals Index/Planarian sample1_R1.fastq,sample1_R2.fastq

input files are separated by comma instead of space. I have checked the total number of aligned reads in both case, they are same. I have also checked the number of transcripts generated by running cufflink. In the first case number of transcripts are less compared to later one (~9k).

How does this affect the downstream analysis? Transcripts generated in the second command are wrong if we are considering the pair end data.

RNA-Seq Assembly • 2.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Prasad ★ 1.6k

Ram · Answer 1 · 2014-12-23

0

Entering edit mode

9.3 years ago

GouthamAtla 12k

If you have paired end data, u must run 1st command. Aligning as paired-end will help in better understanding of novel splicing, gene fusion events etc.

The 2nd command aligns the data as single-end mode which will give less (and possibly incorrect) information about the alternate splicing events, gene fusions, placing of reads arising from repetitive regions etc.

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you. if I overlook alternate splicing events, gene fusions aspects and concentrate only on assembled transcripts, its annotation and expression profile. How reliable these transcripts compared those which we get it from 1st command?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Prasad ★ 1.6k

0

Entering edit mode

Paired-end sequencing is a way to get longer reads without loosing the quality. When you take paired-end data, along with insert, it will be around 500bp (assuming illumina Hiseq). Always long reads are better in any kind of data analysis. Hence data/assembly/alignment etc from long reads (I.e using pairing information) is more reliable than single end data.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by GouthamAtla 12k