paired-end sequence with two seperate files
1
1
Entering edit mode
7.3 years ago

I have RNA sample pair-end sequenced, what should I do with these two separate
files? Just need simply merge them before analysis? or something else? Thank you!

RNA-Seq • 2.8k views
0
Entering edit mode

You should not merge the two fastq files. You should provide both of them to the aligner at the same time. For example, Tophat manual clearly mentions:

When running TopHat with paired reads it is critical that the _1 files an the _2 files appear in separate comma-delimited lists, and that the order of the files in the two lists is the same. TopHat allows the use of additional unpaired reads to be provided after the paired reads. These unpaired reads can be either given at the end of the paired read files on one side (as reads that can no longer be paired with reads from the other side), or they can be given in separate file(s) which are appended (comma delimited) to the list of paired input files on either side e.g.:

tophat [options]* <genome_index_base> PE_reads_1.fq.gz,SE_reads.fa PE_reads_2.fq.gz


or

tophat [options]* <genome_index_base> PE_reads_1.fq.gz PE_reads_2.fq.gz,SE_reads.fa

0
Entering edit mode

Thank you Ashutosh, I see. Because of my samples are from bacteria, I actually use Bowtie and then HTSeq. Do you think these two will consider the "additional unpaired reads" as that in Tophat? How to add the two files into Bowtie and htseq?

Thank you!

0
Entering edit mode

If I am understanding you correctly, you have a pair of fastq files (two files) and a file that contains unpaired or orphan reads. I DON'T think you can use Bowtie to align all these reads together. I may be wrong though. You can always align them separately and merge the two bam files (paired end fastq files and orphan or single reads). Now the tricky part is that HT-seq takes into account if the data was paired-end or not. In case of paired-end if both the reads align to the same exon, they will only contribute to a single count for that gene. I am not sure how HTseq will work for merged bam file (will have to go through the source code) as the SAM flag for mapped paired-end read where the mate doesn't map is different from single end read which has mapped. You can calculate the counts separately for the two bam files and merge the counts instead of the bam files. Frankly speaking I have never tried quantifying the expression this way.

0
Entering edit mode

Thank you Brian, I understand this is just a general question (I am kind of new to this field but learning).

I want to look at the gene regulation difference under the two defined conditions (I guess it is differentiation of the transcriptome). Certainly we want to see roughly which kind of genes are actively expressed or depressed (maybe how much, hopefully?).

You mean we just need to take one of the two files (R1 or R2) for quantification/analysis?

Thank you Brian again

0
Entering edit mode

Yes, just map the reads simultaneously with an aligner that can handle paired reads, which most can. Then you can calculate expression from the resulting sam file. Or, for convenience, BBMap can map paired reads and directly output rpkm counts as a file, skipping intermediate steps.

0
Entering edit mode
7.3 years ago

So, this is not a tutorial; it should be classified as a question. As for the answer - if you are doing a quantification study, just adapter-trim them and then map them. Merging won't be helpful unless you are doing assembly.

Though it would help if you better described your experiment.

0
Entering edit mode

Changed it from tutorial to question.