Question: How can I use BLAST to extract chloroplast sequences from DNA reads?
3.7 years ago by
United States
AcademicDialysis60 wrote:

I'm trying to extract the chloroplast sequences from my reads, as Whole Genome Sequencing was used to produce them.

This paper: mentions that to do this, they BLASTed their reads against all of the known genomes in the same family. For me, this family would be Fabacaea.

Does anyone know of a quicker way to do this besides manually downloading every FASTA file containing Fabacaea chloroplast sequences from NCBI? Or of a better way to extract chloroplast sequences from my reads? I do know that chloroplast DNA should be more abundant than other DNA because it is more highly repeated than nuclear or mitochondrial DNA.

Info about reads: 300bp average, paired-end reads from Illumina MiSeq


Thanks in advance!

ADD COMMENTlink modified 3.7 years ago by 5heikki8.0k • written 3.7 years ago by AcademicDialysis60
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

search NCBI for chloroplast + Fabacaea[Filter]%29%20AND%20%22fabaceae%22[Organism]

and download the sequences as fasta.

Index the fasta with `bwa index` and map with `bwa mem`


ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum116k

Since the closest reference is just in the same family, I don't think my consensus sequences would be very large. Should I just blast and then use those reads to do de novo assembly? Or should I still use bwa and just use all of those sequences as reference?



ADD REPLYlink written 3.7 years ago by AcademicDialysis60
3.7 years ago by
5heikki8.0k wrote:

Assuming the chloroplast genome differs from the host in GC% and codon usage, the quickest way would be to bin the reads based on tetramer frequencies.

ADD COMMENTlink written 3.7 years ago by 5heikki8.0k
