Question: eliminate empty reads in R1 and R2 fastq file
0
gravatar for cabraham03
14 months ago by
cabraham0320
Mexico
cabraham0320 wrote:

Hi, I have a genome in two fastq files, an I have tried to determinate the deep coverage using, bwa, samtools, and BAMstats, the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1, that make some errors in bwa. So I want to keep the same reads (same number of reads, same order, and the same names in both files) that are not empty in both files..... any suggestion ??? I have tried to make a perl script, but I just can't find the way to fix it..... Any Software to fix that problem ???

Thanks So Much !!

sequencing dna assembly genome • 626 views
ADD COMMENTlink modified 14 months ago by Pierre Lindenbaum120k • written 14 months ago by cabraham0320

You may check out skewer. It can discard empty read pairs and also performs quality trimming, which should benefit you if you want to assembly your genome. For standard purposes, I typically use:

./skewer -n -q 25 -Q 25 -m pe -l 25

It runs on paired-end data (-m pe), discarding degenerated (many Ns) reads (-n), trims the 3' until it hits a trailing base of quality 25 or higher (-q 25), discards reads with average quality below 25 (-Q 25), and discards reads and its mates shorter 25bp (-l 25). Multithreading with -t is possible.

ADD REPLYlink modified 14 months ago • written 14 months ago by ATpoint16k

the problem is that some sequences in R1 file are empty but they could be not in R2, as well some of them in R2 are empty but not in R1

It seems like your R1 and R2 files are not properly paired. Did you perform some pre-processing step before the analysis?

ADD REPLYlink written 14 months ago by h.mon25k
1
gravatar for finswimmer
14 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

repair.sh from BBTools could do this.

repair.sh in1=broken1.fq in2=broken2 out1=fixed1.fq out2=fixed2.fq outs=singletons.fq repair

fin swimmer

ADD COMMENTlink written 14 months ago by finswimmer11k
0
gravatar for Pierre Lindenbaum
14 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

using paste + awk:

 paste <(gunzip -c R1_001.fastq.gz | paste - - - - ) <(gunzip -c R2_001.fastq.gz| paste - - - -) |awk -F '\t' '(length($2)>0 && length($6)>0)' |tr "\t" "\n"

output will be an interleaved fastq file that you can (pipe into/use with) bwa mem with option

   -p         first query file consists of interleaved paired-end sequences
ADD COMMENTlink written 14 months ago by Pierre Lindenbaum120k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1725 users visited in the last hour