Question: Find Contamination From Solid Sequencing Data
0
gravatar for chjiao3456
7.2 years ago by
chjiao345640
Michigan State University, USA
chjiao345640 wrote:
 I'm dealing with HGS data generated by SOLiD 3 sequencing data, and in the pre-processing step, I want to find the contamination reads from the sequencing data. But since the data is in color-space form, I can't find a software to make the function.(Some softwares such as SeqTrim are designed for 454&Illumina platform). So did anyone meet with the same problem? How can I deal with the color-space data about contamination or artifacts?
• 1.4k views
ADD COMMENTlink written 7.2 years ago by chjiao345640

what do you mean by contamination? DNA from a different genome?

ADD REPLYlink written 7.2 years ago by Istvan Albert ♦♦ 81k

plz define the "contamination", if you are referring to primer, rRNA(in RNA-seq) and barcodes, these could easily be filtered-out in the Bioscope filter step

ADD REPLYlink written 7.2 years ago by GAO Yang250

Thanks for your answer, mainly I'm considering sequences from adaptors (primer?) and vector contamination now, we used MNase-seq to generate the data from human nucleosomes, so there is no rRNA and barcodes in the datasets. In this situation, can I use the software you recommended?

ADD REPLYlink modified 7.2 years ago • written 7.2 years ago by chjiao345640

Yes, vector contamination and sequences from adaptors

ADD REPLYlink written 7.2 years ago by chjiao345640
1
gravatar for Istvan Albert
7.2 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

There are very few (if any) tools that can do this for color space data - perhaps Bioscope (ABI's proprietary tool) has the utilities that partially help (but I don't have access to this tool)

The methodologies that you link to depend heavily on BLAST to align so your only chance is to convert your sequences to basespace

In general converting to basespace is an operation that is much frowned upon and strongly discouraged. From my own practical observations on realistic datasets I found that the conversion reduced the amount of correctly mapped reads by only 10% or so - which was a whole lot better than I expected - the loss in data was definitely worth gaining the ability to use a wider range of tools.

That being said your experience might be different. In the end the conversion is easy to do thus certainly worth trying.

ADD COMMENTlink written 7.2 years ago by Istvan Albert ♦♦ 81k

In fact, I had aligned my sequencing data to human genome by corona software and found that a lot of reads contain one color space error (over 20% of all mapped reads) which means that these reads will be wrong sequences when translated, that is quite a big proportion, does blast has the ability to deal with color-space data directly?

ADD REPLYlink written 7.2 years ago by chjiao345640
0
gravatar for chjiao3456
7.2 years ago by
chjiao345640
Michigan State University, USA
chjiao345640 wrote:

I have saw one software which can trim adaptors for color-space datasets.

ADD COMMENTlink written 7.2 years ago by chjiao345640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour