Question: Extract 1M reads from paired end fastqs
0
gravatar for acorella
4.4 years ago by
acorella30
United States
acorella30 wrote:

Hi,

I have paired end reads in 2 separate fastq files. I want to take a subset of these reads for a bowtie run to get insert size. I am familiar with how to break up an individual file into 1 million reads (i.e. here: https://www.biostars.org/p/66864/)

My Question: Do I need to ensure my reads are in the same order in each file before I do this? If so, how do I do this?

Thanks!

rna-seq • 1.9k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by acorella30
1

However, do I need to ensure my reads are in the same order in each file before I do this? If so, how do I do this?

If you have not done anything to the files (other than using a paired-end aware trimming program) then the reads should be in order in R1/R2 files.

The files can be repaired as follows, if you suspect that the pairing is broken. repair.sh is from BBMap suite.

repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz
ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by genomax91k

Thank you! That was indeed the question I was trying to ask!

Is there a quick way you can tell if the pairing is broken?

ADD REPLYlink written 4.4 years ago by acorella30

reformat.sh from the same package has an option to to that:

reformat.sh in1=r1.fq in2=r2.fq vpair

That will just verify that the names indicate the reads are in the same order in each file. Incidentally, you can also randomly sample 1M pairs from them, like this:

reformat.sh in1=r1.fq in2=r2.fq out1=sampled1.fq out2=sampled2.fq samplereadstarget=1m

If your reads are overlapping, you can discover the insert size with BBMerge; if not, you'll need to use mapping.

ADD REPLYlink modified 4.4 years ago by genomax91k • written 4.4 years ago by Brian Bushnell17k

duplicate of Selecting Random Pairs From Fastq? ?

ADD REPLYlink written 4.4 years ago by geek_y11k
0
gravatar for Biomonika (Noolean)
4.4 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

seqtk sample with fixed seed should work for you. Take a look here:

Selecting Random Pairs From Fastq?

ADD COMMENTlink written 4.4 years ago by Biomonika (Noolean)3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1764 users visited in the last hour