Question

Removal of Prevotella Copri sequence from a FASTQ/A file

1

Entering edit mode

7.6 years ago

tmad109 ▴ 10

Hi All,

We have fastq files from a gut microbiome projects. I used 16S metagenomic kit from Ion torrent and sequenced them on Ion PGM, analyzed data on Ion Reporter and now I want to remove Prevotella Copri Sequence from the FASTQ file and reanalyze them. I'm just wondering whether anyone has done something like this. In Qiime, there is some information but I do not have Qiime output files such as otu_map.txt. http://qiime.org/scripts/filter_fasta.html My plan is to remove P. copri sequence from the FASTQ file and reanalyze them on Ion Reporter again.

Your inputs are much appreciated. Many thanks, Thilini

16S amplicon sequencing • 1.8k views

ADD COMMENT • link updated 7.6 years ago by GenoMax 142k • written 7.6 years ago by tmad109 ▴ 10

score 1 · Answer 1 · 2016-10-05

1

Entering edit mode

7.6 years ago

WouterDeCoster 47k

I'm unfamiliar with Ion torrent data, but I guess this should work. You could try mapping all reads to the Prevotella Copri genome and then extract those which aren't mapped (and as such attributable to other species).

ADD COMMENT • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for the suggestion. Can you mention which software that I can use this purpose? My fastq files have short reads. and I have Prevotella copri fastq file which has its whole genome sequence. Many thanks, Thilini

ADD REPLY • link 7.6 years ago by tmad109 ▴ 10

0

Entering edit mode

I think the method suggested by genomax wil be more efficient and less error prone for this application.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

score 1 · Answer 2 · 2016-10-05

1

Entering edit mode

7.6 years ago

GenoMax 142k

You can use bbsplit.sh from BBMap for this purpose. You will provide bbsplit will Prevotella copri sequence and it can bin the reads into two files. One for Prevotella and one for rest.

ADD COMMENT • link 7.6 years ago by GenoMax 142k

0

Entering edit mode

Hi, Thanks for the input. I installed BBmap and ran but then didn't work. I assume it is because my FASTQ files have short reads and Prevotella copri FASTQ file has the whole genome. Do you have any suggestions about how to do it? Many thanks, Thilini

ADD REPLY • link 7.6 years ago by tmad109 ▴ 10

0

Entering edit mode

You need to run bbsplit.sh program from BBMap suite. You must have Provotella genome in a fasta file, if not you should be able to get that from NCBI easily. You would do something like

bbsplit.sh in=orig_reads.fq ref=provotella.fa basename=out_%.fq outu=clean_reads.fq

ADD REPLY • link 7.6 years ago by GenoMax 142k