Question

Is there a base that will be ignored in variant calling

1

Entering edit mode

8.6 years ago

L. A. Liggett ▴ 130

I am using freebayes and bwa-mem together to align and call variants on my sequencing reads. But I am doing some editing to the reads before aligning and calling variants. I want some loci to be ignored based on my edits.

So if for example I have a base whose WT is G, but for a given read I see an A, if I change that A to an N will the sequence then be properly aligned to the genome but freebayes will not report any info for that locus, such that I won't see the variant, but I also won't see a read supporting the WT sequence?

alignment sequencing • 2.0k views

ADD COMMENT • link updated 8.6 years ago by John 13k • written 8.6 years ago by L. A. Liggett ▴ 130

0

Entering edit mode

Regardless if it's possible or not (I don't know), could you explain why you are doing this? Sounds like tricky business to me.

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

0

Entering edit mode

I can't really explain the whole experimental setup behind this, but there are particular pieces of the reads that need to be ignored, and so I am looking for a good way of ignoring bases before alignment.

ADD REPLY • link 8.6 years ago by L. A. Liggett ▴ 130

1

Entering edit mode

That should be fine. I agree with John's statement that it's better to edit the genome, unless this is a known positional artifact in your reads (like a cycle where everything was called as 'A'). If that's the case, masking the reads and mapping them will work, to some extent - it depends on the number of masked bases. 1 is fine; but if every second base in the read was masked, then it would no longer map.

ADD REPLY • link 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Agreed, good suggestion.

ADD REPLY • link 8.6 years ago by L. A. Liggett ▴ 130

score 3 · Accepted Answer · 2016-12-05

3

Entering edit mode

8.6 years ago

John 13k

If your sequencing data has quality scores (FASTQ), you should set the quality of the base to a very low value (depending on how your quality scores are encoded). The mapper will know to ignore it.

You can set the base to N, or you can even leave it as-is and hope the mapper maps it with mis-matches (which it should), but dropping the quality is probably the best way.

Even better would be if you edited the genome you're mapping too ;)

ADD COMMENT • link 8.6 years ago by John 13k

1

Entering edit mode

This is a good suggestion. I agree that it would be much nicer to have an edited genome, but this is not a viable alternative based on the way I am running the experiment. But setting the quality score as very low is a great method of doing this. Thanks for the help.

ADD REPLY • link 8.6 years ago by L. A. Liggett ▴ 130