Is it possible to estimate the proportion of contamination from GC contents ?
0
0
Entering edit mode
3.5 years ago

Hello all,

I have multiple fastq files coming from different samples. Among them 2 show a significant diffrent GC content plot. I am wondering, is it possible from there to estimate the percentage of contaminated reads ?

Thanks

RNA-Seq • 717 views
ADD COMMENT
1
Entering edit mode

it possible from there to estimate the percentage of contaminated reads ?

I don't think so. You may get a hint that there is contamination e.g. with rRNA or a different species or something like that but you can't determine % of contaminant reads that may be present unless you go looking for those contaminant reads.

ADD REPLY
0
Entering edit mode

What do you mean by "a significant GC content plot"?

ADD REPLY
0
Entering edit mode

a significant different GC plot, my bad !!

ADD REPLY
0
Entering edit mode

As genomax mentioned, no, you are not going to be able to determine this from GC content. If you have a large proportion of reads that don't map to the genome of your target organism, there are a few methods you could try.

ADD REPLY
0
Entering edit mode

I am doing de novo assembly. So far I am thinking of doing a pre-assembly with my samples with good gc content and then blast all my transcript to delete stranger transcripts. And then mapping all my reads to this transcriptom. And eventually do a final assembly with all mapping reads.

ADD REPLY
0
Entering edit mode

If you know/suspect that there is contamination, it may be best to address it up front before doing the assembly.

ADD REPLY
0
Entering edit mode

I have thougt about it but I can't blast 200 Gb of reads, I reduce considerably the data after assembly. Besides it seems to be a multi species contamination and I don't have the full genomes/transcriptomes of these associated species. So the other alternativethat was to identify the contaminants from a subset and then do a mapping on the full genome/transcriptom seem complicated.

ADD REPLY
1
Entering edit mode

Then you may want to treat your data as if it was a metagenomic dataset and use an assembler like metaSPAdes.

ADD REPLY

Login before adding your answer.

Traffic: 2466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6