454 Sequences: How To Get A Bam + Coverage ?
3
3
Entering edit mode
13.7 years ago

Hi all,

I've been given a set of 454 sequences/results and I'm very new with this kind of data.

  • 1.XXX.454Reads.fna (i guess these are the fasta sequences for the reads... )
  • 1.XX.454Reads.qual (... and the qualities....)
  • a tgz file containing some binary *.clc files (?)
  • 454AllStructVars.txt.gz 454HCStructVars.txt.gz 454AllDiffs.txt.gz 454HCDiffs.txt.gz : it should be the allele calling. I guess those files were generated by the 'Genome Sequencer FLX System' isn't it ? If not, what is that tool ? I understand the *Diffs files however, the content of *StructVars.txt is not clear to me, for example, how should I interpret the following output:

    >chr19    1988212    <--        ?    ?    ?        2    100.00    -    Point
    Reads with Difference:
    chr19                   1988172+ TTGTATTTTTGGTAGAGGCGGGATTTCATCATGTTGGCCAGACCTCGAGTGATC--CACCTGCCT-TGGCCTCCCAAAGT 1988248
                                                                     *
    GKF3EFN01B3QKI              237-                                         GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 203
    GKF3EFN01CM6BB              183+                                         GACCTCG--TGATCTGC-CC-GCCTCTG-CCTCCCAAAGT 217
                                                                     *
    Other Reads:
    

does that mean that only the tail of two reads was mapped on the reference (=deletion) ? what is the <kbd>'*'</kbd> under the reference ?

  • is there a way to transform those data to SAM/BAM ?
  • how can i get the coverage of the genome with those data ?

Many thanks,

Pierre

format bam coverage • 3.3k views
ADD COMMENT
0
Entering edit mode

Please update the question title to be a question...

ADD REPLY
5
Entering edit mode
13.7 years ago
Casbon ★ 3.3k

StructVars.txt: see manual here p 125 (btw you should use the -fd flag for full descriptions).

Transform to SAM/BAM see my question and answer

Or use sff2fastq then bwa sw (although bwas alignments are not as good for homopolymers)

Coverage: 454AllContigs.fna will give you quick and dirty coverage by looking at the headers, otherwise you are probably best converting to sam and using something else.

ADD COMMENT
0
Entering edit mode

The 454AlignmentInfo.tsv file contains per-base coverage stats. If you don't have it just add the -info flag to the runMapping command.

ADD REPLY
0
Entering edit mode

The only trouble I've found with 454AlignmentInfo is that is leaves read direction off. It can be useful for certain applications to know if there is a read direction bias.

ADD REPLY
0
Entering edit mode

Only trouble I've noticed with that particular file is it has no read orientation information. This may or may not matter, but for certain applications (exon capture) read bias can be useful to note. Else, its great.

ADD REPLY
2
Entering edit mode
13.7 years ago

The *.clc extension may indicate that some of your files may have been created by CLC Genomic Workbench.

ADD COMMENT
0
Entering edit mode
12.2 years ago
Gmoney ▴ 220

The newest version of gsMapper supposedly supports SAM/BAM output. I haven't upgraded yet, but the manual on their site has a screen shot of it.

ADD COMMENT

Login before adding your answer.

Traffic: 2909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6