I started out with two massive paired-end FASTQ files with a dozen barcodes each. Our lab has some custom scripts, which I used to perform paired-end adapter trimming using cutadapt, and then to separate each barcode sample into a separate FASTA file. I independently verified that in both the .1.fa and .2.fa files, all of the reads correspond to each other. However, they are in FASTA rather than FASTQ format.
When I aligned one of the resulting barcode sets to a transcriptome, they all resulted in discordant alignments. Here is the end of alignment_summary.txt
Aligned pairs: 17746091 of these: 2799869 (15.8%) have multiple alignments 17746090 (100.0%) are discordant alignments 0.0% concordant pair alignment rate.
I'm very confident that each .1.fa read corresponds to the correct .2.fa read; I ran cutadapt in paired-end mode and the demultiplexing script performed the same sorting on both the .1.fastq and .2.fastq files. Yet, it shows up as 100% discordant. What is the issue here?
I performed this again on a subset of my FASTA files (only 50 reads each) and it again reported 100% discordant. From this, I can guess one of two options: 1) My data is very bad and none of the reads line up properly, or 2) There is some info in the FASTQ files that isn't present in my FASTA files, contributing to this issue.
Is there a way to run paired-end data using FASTA input in TopHat2? Or do I need to go back and modify my script so that only FASTQ files are used?