Question

Processing SOLiD data but cannot find relevant documentation on tools (BFAST)

0

Entering edit mode

7.4 years ago

ehzed ▴ 40

I realize that almost no one uses SOLiD sequencing anymore, but unfortunately this paper does and for my project (chicken related) it would be useful to include as many relevant datasets as possible in my analysis. Since a new chicken reference genome assembly came out in January, I need to take those SOLiD reads and align to the new reference genome.

Nowhere in the paper that I linked does it say whether their reads are paired-ends or single-ends, only that it is 35 bp long. Does anyone know how I can tell whether the reads are paired end or single end? I just assume that it is paired end because there are two runs per sample on SRA.

Now there is very little documentation online available for snp calling from SOLiD reads. From reading forums I think BFAST should be used for aligning and then I can convert the alignment to bam format and use GATK to carry on my analysis. However, I haven't found any documentation on which trimmers to use or if trimming is needed at all since SOLiD reads are so short. Additionally, the BFAST website also does not provide any instructions nor the creator's contact information and the manual that I found online is outdated and doesn't provide all the information.

So here are my questions:

1) Does anyone know when BFAST is advantageous to BFAST-BWA and vice versa? In what situations would you use one over the other?

2) There are three options for alignment (match, easyalign, and localalign), how do I determine which one to use

I know this is quite a lot for one person to answer, if anyone could just point me to an online resource or tutorial I would really appreciate it. Thank you!

sequencing alignment SNP • 1.4k views

ADD COMMENT • link updated 7.4 years ago by colindaven 6.4k • written 7.4 years ago by ehzed ▴ 40

score 1 · Answer 1 · 2016-12-07

1

Entering edit mode

7.4 years ago

colindaven 6.4k

I never tried BFAST, it always appeared a bit tricky. I had best results by a long way with NovoalignCS (commercial, trial period probably useful for you).

Other aligners include SHRIMP, which may not be available any more. The original bowtie1 supports SOLiD reads quite well I believe and might be useful to get a BAM for SNP calling.

Other authors on Biostars has this (ancient) list which is probably still relevant: Which Programs Are You Relying On For Solid Data Analysis?

Generally, I only had data from the 5500xl, which has 75 bp forward and 35bp reverse reads. The reverse reads were always really bad, perhaps 15bp were useful. About 60bp of the forward read was ok in general in my datasets.Read quality could be improved by using the tool SAET (at least for human data).

I wrote all this up in this document way back:

https://docs.google.com/document/d/1NCV3Li5gO8-lEPWSK6eTuvk2MYhO0RBQqoa1y5H_WHU/edit?usp=sharing

ADD COMMENT • link 7.4 years ago by colindaven 6.4k

0

Entering edit mode

Thanks for your help! For your dataset, could you tell which reads were forward and which ones were reverse by looking at the headers of each read? Since the paper that I linked didn't give any clear information, I am now trying to figure out what kind of reads I have by looking at the header directly.

ADD REPLY • link 7.4 years ago by ehzed ▴ 40

0

Entering edit mode

No worries, but I haven't worked with SOLiD for four years now so can't really help. What I would do is - if the read lengths are the same - map a sample of each read set to a decent reference genome and check the orientations manually.

Try mapping as a) paired end b) each of the pair separately and you should gain a lot of info just by checking in a genome browser. Also, the paired end mapping rate should be very low if you falsely specify single end reads to be part of a pair.

ADD REPLY • link 7.4 years ago by colindaven 6.4k