Question: Is there a base that will be ignored in variant calling
1
gravatar for angrypigeon
2.6 years ago by
angrypigeon120
angrypigeon120 wrote:

I am using freebayes and bwa-mem together to align and call variants on my sequencing reads. But I am doing some editing to the reads before aligning and calling variants. I want some loci to be ignored based on my edits.

So if for example I have a base whose WT is G, but for a given read I see an A, if I change that A to an N will the sequence then be properly aligned to the genome but freebayes will not report any info for that locus, such that I won't see the variant, but I also won't see a read supporting the WT sequence?

sequencing alignment • 642 views
ADD COMMENTlink modified 2.6 years ago by John12k • written 2.6 years ago by angrypigeon120

Regardless if it's possible or not (I don't know), could you explain why you are doing this? Sounds like tricky business to me.

ADD REPLYlink written 2.6 years ago by WouterDeCoster39k

I can't really explain the whole experimental setup behind this, but there are particular pieces of the reads that need to be ignored, and so I am looking for a good way of ignoring bases before alignment.

ADD REPLYlink written 2.6 years ago by angrypigeon120
1

That should be fine. I agree with John's statement that it's better to edit the genome, unless this is a known positional artifact in your reads (like a cycle where everything was called as 'A'). If that's the case, masking the reads and mapping them will work, to some extent - it depends on the number of masked bases. 1 is fine; but if every second base in the read was masked, then it would no longer map.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Brian Bushnell16k

Agreed, good suggestion.

ADD REPLYlink written 2.6 years ago by angrypigeon120
3
gravatar for John
2.6 years ago by
John12k
Germany
John12k wrote:

If your sequencing data has quality scores (FASTQ), you should set the quality of the base to a very low value (depending on how your quality scores are encoded). The mapper will know to ignore it.

You can set the base to N, or you can even leave it as-is and hope the mapper maps it with mis-matches (which it should), but dropping the quality is probably the best way.

Even better would be if you edited the genome you're mapping too ;)

ADD COMMENTlink written 2.6 years ago by John12k
1

This is a good suggestion. I agree that it would be much nicer to have an edited genome, but this is not a viable alternative based on the way I am running the experiment. But setting the quality score as very low is a great method of doing this. Thanks for the help.

ADD REPLYlink written 2.6 years ago by angrypigeon120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour