Why is needed a genome reference when calling variants

0

Entering edit mode

2.1 years ago

ManuelDB ▴ 80

If reads have been aligned in the alignment process, why do we need a reference genome when calling variants (for example with FreeBayes)? This is one of the things I never expected to see before exploring NGS bioinformatic pipelines for calling variants from FASTQ files.

NGS • 846 views

ADD COMMENT • link updated 2.1 years ago by GenoMax 141k • written 2.1 years ago by ManuelDB ▴ 80

5

Entering edit mode

here is the sequence of a read:

CTTCAACAACGTCCACTCTTTCTGGAAAATCAATTGGTAGGAGAGAACAGTACATTTCACCATATGCAGA

can you tell me if there is a mutation, and where is it ? without reference.

ADD REPLY • link 2.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks Pierre,

I thought stupidly that that information is in the BAM file when the read is aligned.

ADD REPLY • link 2.1 years ago by ManuelDB ▴ 80

0

Entering edit mode

I thought that that information is in the BAM file when the read is aligned.

The information is in the BAM file for each standalone read (not as an identified SNP). It is in the form of how nucleotides (if any) in the read differ from the reference region the read aligned to.

To get this BAM file, one needs to start with a reference, index it and then align sequence data to that index, isn't that the case?

ADD REPLY • link 2.1 years ago by GenoMax 141k

Login before adding your answer.