Find Contamination From Solid Sequencing Data
2
0
Entering edit mode
11.9 years ago
chjiao3456 ▴ 40
 I'm dealing with HGS data generated by SOLiD 3 sequencing data, and in the pre-processing step, I want to find the contamination reads from the sequencing data. But since the data is in color-space form, I can't find a software to make the function.(Some softwares such as SeqTrim are designed for 454&Illumina platform). So did anyone meet with the same problem? How can I deal with the color-space data about contamination or artifacts?
• 2.3k views
ADD COMMENT
0
Entering edit mode

what do you mean by contamination? DNA from a different genome?

ADD REPLY
0
Entering edit mode

plz define the "contamination", if you are referring to primer, rRNA(in RNA-seq) and barcodes, these could easily be filtered-out in the Bioscope filter step

ADD REPLY
0
Entering edit mode

Thanks for your answer, mainly I'm considering sequences from adaptors (primer?) and vector contamination now, we used MNase-seq to generate the data from human nucleosomes, so there is no rRNA and barcodes in the datasets. In this situation, can I use the software you recommended?

ADD REPLY
0
Entering edit mode

Yes, vector contamination and sequences from adaptors

ADD REPLY
1
Entering edit mode
11.9 years ago

There are very few (if any) tools that can do this for color space data - perhaps Bioscope (ABI's proprietary tool) has the utilities that partially help (but I don't have access to this tool)

The methodologies that you link to depend heavily on BLAST to align so your only chance is to convert your sequences to basespace

In general converting to basespace is an operation that is much frowned upon and strongly discouraged. From my own practical observations on realistic datasets I found that the conversion reduced the amount of correctly mapped reads by only 10% or so - which was a whole lot better than I expected - the loss in data was definitely worth gaining the ability to use a wider range of tools.

That being said your experience might be different. In the end the conversion is easy to do thus certainly worth trying.

ADD COMMENT
0
Entering edit mode

In fact, I had aligned my sequencing data to human genome by corona software and found that a lot of reads contain one color space error (over 20% of all mapped reads) which means that these reads will be wrong sequences when translated, that is quite a big proportion, does blast has the ability to deal with color-space data directly?

ADD REPLY
0
Entering edit mode
11.9 years ago
chjiao3456 ▴ 40

I have saw one software which can trim adaptors for color-space datasets.

ADD COMMENT

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6