Question: Phasing alleles, indel gap filled with reference.
0
gravatar for Vincent Manzanilla
2.2 years ago by
Oslo
Vincent Manzanilla20 wrote:

Hello I have targeted enriched data from different plants that are, for some closely related and other a bit distant.

I have this idea to map then phase the alleles and call for the consensus for making matrices and at the end gene trees.

When I have a gap during the mapping, the phased alleles are calling the reference instead of having the gap; as shown in the figure. figure

Is there a way to overcome the this problem?

### Phasing alleles ###

samtools phase -AF -b $i.allele $i

### Phasing alleles ###
do echo -e "\n\nPhasing alleles"
echo -e "\n\n$i"

samtools phase -AF -b $i.allele $i;

### Sort phased bam files ###
echo -e "\n\nSort phased bam files..."
samtools sort -@ 32 $i.allele.0.bam -o $i.allele.0.sorted.bam
samtools sort -@ 32 $i.allele.1.bam -o $i.allele.1.sorted.bam

### mpileup and create fq files ###
echo -e "\n\nmpileup allele 0..."
samtools mpileup -u -f markers.fasta $i.allele.0.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.0.fq
echo -e "\n\nmpileup allele 1..."
samtools mpileup -u -f markers.fasta $i.allele.1.sorted.bam| bcftools call -c | vcfutils.pl vcf2fq > $i.allele.1.fq

### transform fq to fasta ###
echo -e "\n\nFastq to Fasta"
~/nobackup/seqtk/seqtk seq -a $i.allele.0.fq > $i.allele.0.fasta
~/nobackup/seqtk/seqtk seq -a $i.allele.1.fq > $i.allele.1.fasta
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Vincent Manzanilla20

I think this is because you are aligning against a reference in which you included the bases that are deleted in at least some subject. When you align reads with the deletion you do not align anything, so there is no SNP and the assigned base is the reference. You should filter the results, for example if no (or very few) reads align in some parts of your target, instead of the reference allele you put a gap.

ADD REPLYlink written 2.2 years ago by Fabio Marroni2.3k
0
gravatar for Vincent Manzanilla
2.2 years ago by
Oslo
Vincent Manzanilla20 wrote:

Thank you for your reply I think I found a way to mask the gap from the final fasta file. The solution was posted in this discussion , however I grep the tag DP=0 to have the position without coverage (gap).

here the updated code

for i in $(cat list);

### Phasing alleles ###
do echo -e "\n\nPhasing alleles"


samtools phase -AF -b $i.allele $i"_L006mapped.bam";

### Sort phased bam files ###
echo -e "\n\nSort phased bam files..."
samtools sort -@ 32 $i.allele.0.bam -o $i.allele.0.sorted.bam
samtools sort -@ 32 $i.allele.1.bam -o $i.allele.1.sorted.bam


### Generating the vcf file ###
echo -e "\n\nGenerating the vcf file..."
samtools mpileup -uv -f markers.fasta $i.allele.0.sorted.bam > $i.allele.0.vcf
samtools mpileup -uv -f markers.fasta $i.allele.1.sorted.bam > $i.allele.1.vcf

### Generating the bed file ###
echo -e "\n\nGenerating the bed file..."
grep DP=0 $i.allele.0.vcf | awk '{OFS="\t"; if ($0 !~ /\#/); print $1, $2-1, $2}' > $i.allele.0.bed
grep DP=0 $i.allele.1.vcf | awk '{OFS="\t"; if ($0 !~ /\#/); print $1, $2-1, $2}' > $i.allele.1.bed


### Generating the fasta file ###
echo -e "\n\nGenerating the fasta file..."
bcftools call -c $i.allele.0.vcf | vcfutils.pl vcf2fq |~/nobackup/seqtk/seqtk seq -a - > $i.allele.0.fasta
bcftools call -c $i.allele.1.vcf | vcfutils.pl vcf2fq |~/nobackup/seqtk/seqtk seq -a - > $i.allele.1.fasta


### Mask indel from the fasta file ###
echo -e "\n\nMask indel from the fasta file..."
bedtools maskfasta -fi $i.allele.0.fasta -fo $i.indel.0.fasta -bed $i.allele.0.bed
bedtools maskfasta -fi $i.allele.1.fasta -fo $i.indel.1.fasta -bed $i.allele.1.bed
done ;
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Vincent Manzanilla20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1979 users visited in the last hour