Removal of Prevotella Copri sequence from a FASTQ/A file
2
1
Entering edit mode
7.6 years ago
tmad109 ▴ 10

Hi All,

We have fastq files from a gut microbiome projects. I used 16S metagenomic kit from Ion torrent and sequenced them on Ion PGM, analyzed data on Ion Reporter and now I want to remove Prevotella Copri Sequence from the FASTQ file and reanalyze them. I'm just wondering whether anyone has done something like this. In Qiime, there is some information but I do not have Qiime output files such as otu_map.txt. http://qiime.org/scripts/filter_fasta.html My plan is to remove P. copri sequence from the FASTQ file and reanalyze them on Ion Reporter again.

Your inputs are much appreciated. Many thanks, Thilini

16S amplicon sequencing • 1.8k views
ADD COMMENT
1
Entering edit mode
7.6 years ago

I'm unfamiliar with Ion torrent data, but I guess this should work. You could try mapping all reads to the Prevotella Copri genome and then extract those which aren't mapped (and as such attributable to other species).

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. Can you mention which software that I can use this purpose? My fastq files have short reads. and I have Prevotella copri fastq file which has its whole genome sequence. Many thanks, Thilini

ADD REPLY
0
Entering edit mode

I think the method suggested by genomax wil be more efficient and less error prone for this application.

ADD REPLY
1
Entering edit mode
7.6 years ago
GenoMax 142k

You can use bbsplit.sh from BBMap for this purpose. You will provide bbsplit will Prevotella copri sequence and it can bin the reads into two files. One for Prevotella and one for rest.

ADD COMMENT
0
Entering edit mode

Hi, Thanks for the input. I installed BBmap and ran but then didn't work. I assume it is because my FASTQ files have short reads and Prevotella copri FASTQ file has the whole genome. Do you have any suggestions about how to do it? Many thanks, Thilini

ADD REPLY
0
Entering edit mode

You need to run bbsplit.sh program from BBMap suite. You must have Provotella genome in a fasta file, if not you should be able to get that from NCBI easily. You would do something like

bbsplit.sh in=orig_reads.fq ref=provotella.fa basename=out_%.fq outu=clean_reads.fq
ADD REPLY

Login before adding your answer.

Traffic: 1711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6