Hi all,
Recently I'm trying to analyze RNA-Seq data from Paired-End NGS sequencing. At the start I've realized that 2 samples out of 9 have differences between Forward and Reverse fastq file.
For example:
- XXXX_1.fq.gz - Total Sequences -> 50592434; Total Bases -> 7.5 Gbp
- XXXX_2.fq.gz - Total Sequences -> 38414256; Total Bases -> 5.7 Gbp
Because of lack of data and knowing that these samples weren't of good quality during sequencing I can't throw away this samples.
Since this is SMARTer Stranded RNA-Seq kit I should cut three nucleotides before genome assembly. I've tried to use cutadapt in PE mode, but these 2 samples showed an error like this one:
Reads are improperly paired. Read name 'ST-E00144:1102:H7W5CCCX2:7:1101:7446:1520 1:N:0:NAGATCAT+NGATCTCG' in file 1 does not match 'ST-E00144:1102:H7W5CCCX2:8:1101:9110:1520 2:N:0:NAGATCAT+NGATCTCG' in file 2.
What should I do in this situation? I know that this will be exactly the same issue when I'll start genome assembly. That's why I need to figure out what to do with this one.
Should I run these two samples as Single-End reads (But what later when the rest of samples has PE mode?)? Maybe You know better solution for this kind of issue.
Thank You in advance for any kind of help!
Someone (before me) did trimming of adapters from library and now I should trim first 3 nucleotides (this kind of library demands it) and I am doing this for paired-end together. Funny story, because only 2 samples don't match and the rest (7 samples) are good, so I think that this sequencing was made poorly (origin of poor plant material).
Non-matching read files should have nothing to do with sequencing. The sequencer will always produce identical number of reads for PE sequencing. It is likely that the data files were trimmed independently. Please check the remaining samples with
repair.sh
just to be safe.Ok, thank you very much for your help. I'll try to use repair.sh. Also thanks for suggestion about checking remaining samples, that's very helpful in this situation.