Question: Phased Genome, Want reads From Both Alleles
1
gravatar for kpr
2.5 years ago by
kpr70
kpr70 wrote:

I have a phased genome, and I am trying to generate a counts matrix that contains the number of reads that map to both alleles. For example, something like this.

                  Wild1     Wild2     Wild3     MT1      MT2    MT3
C1_000001          0          0         0        0        0      0      
C1_000002          10         9         8        1        1      1      
C1_000003          0          0         0        10       12     10      
C1_000004          6          5         7        0        0      0

Right now I have used bowtie, tophat, and have an accepted_hits file, aln.bam.

I have used samtools to sort the aln.bam but am having trouble with the next step. Based on what I have read so far, I think my next step is to generate a consensus sequence?

samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

I want to make sure I am understanding this step, and subsequent steps.

  1. I don't understand this portion of the above line of code, and am having trouble finding documents on it :

    vcfutils.pl vcf2fq > cns.fq

  2. I have read documentation saying that we can use some of these functions to generate a consensus across samples, or alleles. I want to make sure I am doing it by alleles.
  3. It looks like this will generate a consensus fastq file. Not sure what the next steps would be.
  4. Any other suggestions besides "Google Allele Specific Pipelines" would be helpful. Perhaps a link of one in particular that you would recommend.

Update: I also don't need the whole matrix. Figuring out a way just to do it for one gene would be sufficient.

Thanks in advance!

rna-seq • 626 views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by kpr70
1

Point 4 is quite demanding. People help whichever way they can.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by geek_y11k

I'm not trying to be rude, I just need something a little more than that at this point.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by kpr70
1
gravatar for geek_y
2.5 years ago by
geek_y11k
Barcelona
geek_y11k wrote:

Use ASEReadCounter from GATK.

ADD COMMENTlink written 2.5 years ago by geek_y11k
1
gravatar for colindaven
2.5 years ago by
colindaven2.4k
Hannover Medical School
colindaven2.4k wrote:

This whole project is very demanding. I would recommend looking at Phaser https://github.com/secastel/phaser

However, I spent a long time working on this, and got very little out. Nanopore is a better option for getting phased alleles in my opinion. Good luck.

ADD COMMENTlink written 2.5 years ago by colindaven2.4k

Thanks for the suggestion!

ADD REPLYlink written 2.5 years ago by kpr70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour