Question

How can I use BLAST to extract chloroplast sequences from DNA reads?

0

Entering edit mode

10.2 years ago

AcademicDialysis ▴ 70

I'm trying to extract the chloroplast sequences from my reads, as Whole Genome Sequencing was used to produce them.

This paper: http://www.sciencedirect.com/science/article/pii/S2214540013000169 mentions that to do this, they BLASTed their reads against all of the known genomes in the same family. For me, this family would be Fabacaea.

Does anyone know of a quicker way to do this besides manually downloading every FASTA file containing Fabacaea chloroplast sequences from NCBI? Or of a better way to extract chloroplast sequences from my reads? I do know that chloroplast DNA should be more abundant than other DNA because it is more highly repeated than nuclear or mitochondrial DNA.

Info about reads: 300bp average, paired-end reads from Illumina MiSeq

Thanks in advance!

chloroplast alignment database BLAST • 5.2k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by AcademicDialysis ▴ 70

Ram · Answer 1 · 2015-05-05

4

Entering edit mode

10.2 years ago

Pierre Lindenbaum 166k

Search NCBI for chloroplast + Fabacaea

http://www.ncbi.nlm.nih.gov/nuccore?term=%28%22chloroplast%22%5BFilter%5D%29%20AND%20%22fabaceae%22%5BOrganism%5D

and download the sequences as fasta.

Index the fasta with bwa index and map with bwa mem

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Since the closest reference is just in the same family, I don't think my consensus sequences would be very large. Should I just blast and then use those reads to do de novo assembly? Or should I still use bwa and just use all of those sequences as reference?

Thanks!

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by AcademicDialysis ▴ 70

0

Entering edit mode

Hi Pierre,

I am working with WGS data which includes chloroplast and mitochondrial DNA. I want to remove and keep the reads originating from the chloroplast and mitochondria, from the nuclear reads.

I have performed bwa index and mapped with bwa mem for the chloroplast reads using the complete chloroplast genome of a related species.

What will the next steps be to remove (and keep) the reads that mapped to the chloroplast?

I really appreciate any help you can provide. Allison

ADD REPLY • link 4.9 years ago by AllisonAnne • 0

Ram · Answer 2 · 2015-05-06

3

Entering edit mode

10.2 years ago

5heikki 11k

Assuming the chloroplast genome differs from the host in GC% and codon usage, the quickest way would be to bin the reads based on tetramer frequencies.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by 5heikki 11k