Difference in R1 and R2 FASTQ files from RNA seq paired end
2
0
Entering edit mode
4.5 years ago

Hi there, I recently performed MiSeq 80 bp paired-end run to measure CRISPR sgRNA coverage from plasmid. I got two files back: R1 and R2. However, number of reads in R1 (66331572) is much more higher than R2 (566768). I used

cat R1.fastq | wc -l

What does this mean? Thanks

rna-seq RNA-Seq • 6.0k views
ADD COMMENT
2
Entering edit mode

First thing to do is to make sure you downloaded the entire file (e.g. the file is not partially downloaded / corrupted). Check this by checking the md5 checksum of the files -- it should match the md5 sum the company provides you. Also, you can see if the size of the file that the company gives you is the same as the size of your downloaded file.

Also, run cat R2.fastq|tail and that may be able to tell you whether it was a partial download (e.g. the last line might be truncated). If it was, then redownload the files.

If you confirm that your files are fully intact, then I'll see what others have to suggest about why the number of lines are different between R1 and R2.

ADD REPLY
1
Entering edit mode

This is not right. If you've verified that the files have been correctly downloaded (does the provider provide a MD5?), then you need to go back to your sequencing provider and query this.

ADD REPLY
0
Entering edit mode

The base call is performed independently, some sequencers remove low-quality reads without removing the respective pair, so you will end with singletons.

ADD REPLY
0
Entering edit mode

Is this some artifact of enabling trimming in bcl2fastq?

ADD REPLY
0
Entering edit mode

yes, it could be a cause of the filtering

ADD REPLY
1
Entering edit mode
4.5 years ago

You might be able to repair this using repair.sh from the bbmap package (install via bioconda if you like).

By repair I mean get R1 and R2 files of the same length with the corresponding reads and headers.

eg

repair.sh -Xmx40g in=A1_03_S2_2_R1.fastq in2=A1_03_S2_2_R2.fastq out1=A1_03_S2_2b_R1.fastq out2=A1_03_S2_2b_R2.fastq outs=singletons1.fq overwrite=true
ADD COMMENT
1
Entering edit mode

I would not try to /trust repair when 99% of the data is missing.

ADD REPLY
0
Entering edit mode
4.5 years ago

Thanks very much everyone for response. The sequencing facility confirmed that it was single-end reads, so R2 should be empty/irrelevant.

ADD COMMENT
1
Entering edit mode

I think that facility desperately needs to review their procedures and you should probably think twice before using them again (if that's an option) If the data is single end, why on Earth would they deliver an R2. Also, if you request paired-end (inferring from your post), why do you get single end... lots of red flags.

[edit] wrote "red lights", meant "red flags" :S

ADD REPLY

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6