Question: Extracting contaminated reads from the sequenced data
0
gravatar for manjumoorthy95
9 weeks ago by
manjumoorthy950 wrote:

I have the sequenced data of an organism. but it has three 16srRNA which belong to 3 different organisms. I guess it could be contaminated. How could I extract the contigs belonging to each organism present in the sequence data?

ADD COMMENTlink modified 9 weeks ago by genomax63k • written 9 weeks ago by manjumoorthy950
1

Hello,

bbduk might help you. From the web manual:

bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq ref=phix.fa k=31 hdist=1 stats=stats.txt

This will remove all reads that have a 31-mer match to PhiX (a common Illumina spikein, which is included in /bbmap/resources/), allowing one mismatch. The “outm” stream will catch reads that matched a reference kmers. This allows you to split a set of reads based on the presence of something. “stats” will produce a report of which contaminant sequences were seen, and how many reads had them.

In the ref parameter you can define more than one reference. Have a look at bbduk.sh --help for more options.

fin swimmer

ADD REPLYlink written 9 weeks ago by finswimmer11k

Thanks a lot. Let me check and let you know.

ADD REPLYlink written 9 weeks ago by manjumoorthy950

Hi Finswimmer,

I observed that increasing the k-mer value decreases the number of matched reads. What should be the ideal k-mer size for paired reads of length 150bp. How will the interpretation of results related to matched reads will change with changing k-mer size?

ADD REPLYlink written 9 weeks ago by kspata40
1

Because k= value is used to find the initial match if you set it too high then BBMap tools are not going to find any (or find less initial) matches. So no surprise there. Generally setting k= to something between 20-30 is fine for most applications. Smaller values require more memory.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax63k

Thank you, this too worked for me.

ADD REPLYlink written 9 weeks ago by manjumoorthy950
2
gravatar for genomax
9 weeks ago by
genomax63k
United States
genomax63k wrote:

If you truly feel that there are three organisms then you can use bbsplit.sh (from BBMap suite) to bin your reads into respective organismal pools. This will generally work well as long as the bacterial are distinct enough. You are able to decide what you want to do with reads that multi-map (map to all three reference genomes). e.g. keep in all bins, toss etc.

Use the answer here and ask if you have any questions: A: Tool to separate human and mouse ran seq reads

Since you have bacterial data you could turn off maxindel=0.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by genomax63k

Yes thank you , I could seperate reads mapping to each reference genome. But still having one doubt. When I use bowtie or Bowtie2, the paired end reads of my data is not getting mapped to the reference genome, even when the 16s sequence of the reference genome is present in my data. Why could that happen?

ADD REPLYlink written 9 weeks ago by manjumoorthy950
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2089 users visited in the last hour