Question: Remove Singletons From Trimmed Fastq Files
1
gravatar for Alex Paciorkowski
5.6 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

How are you removing singletons from whole exome data fastq files from which the adapter sequences have already been removed? We have some data where the paired-end files do not match up, and we believe this is due to singletons that need to be removed. What tools are currently out there?

Have checked Trimming Algorithm but do not see this specifically addressed.

Much thanks.

fastq • 3.9k views
ADD COMMENTlink modified 5.6 years ago by Rm7.8k • written 5.6 years ago by Alex Paciorkowski3.3k
1

Did you see this thread: How to sort two mate pair (fastq) files so that the order of the identifiers is the same? ? Are you still using a trimmer that can't natively handle paired-end reads or is this just an older dataset? If the former, you might consider just switching trimmers.

ADD REPLYlink written 5.6 years ago by Devon Ryan88k

Looks promising, will take a look.

ADD REPLYlink written 5.6 years ago by Alex Paciorkowski3.3k

Are the two paired files sorted in the same order excluding singletons?

ADD REPLYlink written 5.6 years ago by Damian Kao15k
4
gravatar for Gabriel R.
5.6 years ago by
Gabriel R.2.5k
Center for Geogenetik Københavns Universitet
Gabriel R.2.5k wrote:

We solved the problem in a very simple way, ditch fastq, use BAM even for unaligned reads and let the flags do their magic.

ADD COMMENTlink written 5.6 years ago by Gabriel R.2.5k

Can you elaborate? I don't understand what this means. How are you mapping BAMs?

ADD REPLYlink written 5.6 years ago by Alex Paciorkowski3.3k
1

You are not mapping the BAMs. You merely convert from fastq to BAM and working on those as your raw unmapped data. This will save you a lot of trouble of knowing who is paired/not paired/read group info etc.

ADD REPLYlink written 5.6 years ago by Gabriel R.2.5k

And then you convert back to fastq for mapping? I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate? Is there a workflow you can reference or point to as an illustration?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Alex Paciorkowski3.3k
1

And then you convert back to fastq for mapping?

yes and no, we use file descriptors. So bowtie apparently still cannot read bam, so we call it like that:

bowtie2 -1 <(samtools view -f "0x40" -Y input.bam) -2 <(samtools view -f "0x80" -Y input.bam)

Use this custom version of samtools: https://github.com/udo-stenzel/samtools-patched

I am still not sure what exactly you are doing. Which "flags" are you talking about? Can you please elaborate?

Every read in a BAM file has binary flags combined into a single number. These flags tell us about whether the read is paired, mapped, properly paired, QC failed etc... see http://samtools.sourceforge.net/SAM1.pdf

Is there a workflow you can reference or point to as an illustration?

Unfortunately not really. I suggest being more familiar with the BAM format and regular unix concepts like pipe/file descriptors etc. Good luck and have fun !

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Gabriel R.2.5k
1
gravatar for Rm
5.6 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

Try Sickle Paired End (sickle pe) for paired end trimming. OR

If already trimmed you use cmpfastq to get common and singletons in separate files.

ADD COMMENTlink written 5.6 years ago by Rm7.8k

This looks helpful as well -- thanks!

ADD REPLYlink written 5.6 years ago by Alex Paciorkowski3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 685 users visited in the last hour