Hi all,
Need help in understanding the output. I have illumina pair end data. While running tophat i have made a mistake i guess. Instead of running
tophat2 --output-dir PABPC --num-threads 15 --solexa-quals Index/Planarian sample1_R1.fastq sample1_R2.fastq
i have excuted
tophat2 --output-dir PABPC --num-threads 15 --solexa-quals Index/Planarian sample1_R1.fastq,sample1_R2.fastq
input files are separated by comma instead of space. I have checked the total number of aligned reads in both case, they are same. I have also checked the number of transcripts generated by running cufflink. In the first case number of transcripts are less compared to later one (~9k).
How does this affect the downstream analysis? Transcripts generated in the second command are wrong if we are considering the pair end data.
Thank you. if I overlook alternate splicing events, gene fusions aspects and concentrate only on assembled transcripts, its annotation and expression profile. How reliable these transcripts compared those which we get it from 1st command?
Paired-end sequencing is a way to get longer reads without loosing the quality. When you take paired-end data, along with insert, it will be around 500bp (assuming illumina Hiseq). Always long reads are better in any kind of data analysis. Hence data/assembly/alignment etc from long reads (I.e using pairing information) is more reliable than single end data.