I've been given a set of 454 sequences/results and I'm very new with this kind of data.
- 1.XXX.454Reads.fna (i guess these are the fasta sequences for the reads... )
- 1.XX.454Reads.qual (... and the qualities....)
- a tgz file containing some binary *.clc files (?)
454AllStructVars.txt.gz 454HCStructVars.txt.gz 454AllDiffs.txt.gz 454HCDiffs.txt.gz : it should be the allele calling. I guess those files were generated by the 'Genome Sequencer FLX System' isn't it ? If not, what is that tool ? I understand the *Diffs files however, the content of *StructVars.txt is not clear to me, for example, how should I interpret the following output:
>chr19 1988212 <-- ? ? ? 2 100.00 - Point Reads with Difference: chr19 1988172+ TTGTATTTTTGGTAGAGGCGGGATTTCATCATGTTGGCCAGACCTCGAGTGATC--CACCTGCCT-TGGCCTCCCAAAGT 1988248 * GKF3EFN01B3QKI 237- GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 203 GKF3EFN01CM6BB 183+ GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 217 * Other Reads:
does that mean that only the tail of two reads was mapped on the reference (=deletion) ? what is the <kbd>'*'</kbd> under the reference ?
- is there a way to transform those data to SAM/BAM ?
- how can i get the coverage of the genome with those data ?
Please update the question title to be a question...