Question: Phased Genome, Want reads From Both Alleles
1
gravatar for kpr
13 months ago by
kpr60
kpr60 wrote:

I have a phased genome, and I am trying to generate a counts matrix that contains the number of reads that map to both alleles. For example, something like this.

                  Wild1     Wild2     Wild3     MT1      MT2    MT3
C1_000001          0          0         0        0        0      0      
C1_000002          10         9         8        1        1      1      
C1_000003          0          0         0        10       12     10      
C1_000004          6          5         7        0        0      0

Right now I have used bowtie, tophat, and have an accepted_hits file, aln.bam.

I have used samtools to sort the aln.bam but am having trouble with the next step. Based on what I have read so far, I think my next step is to generate a consensus sequence?

samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

I want to make sure I am understanding this step, and subsequent steps.

  1. I don't understand this portion of the above line of code, and am having trouble finding documents on it :

    vcfutils.pl vcf2fq > cns.fq

  2. I have read documentation saying that we can use some of these functions to generate a consensus across samples, or alleles. I want to make sure I am doing it by alleles.
  3. It looks like this will generate a consensus fastq file. Not sure what the next steps would be.
  4. Any other suggestions besides "Google Allele Specific Pipelines" would be helpful. Perhaps a link of one in particular that you would recommend.

Update: I also don't need the whole matrix. Figuring out a way just to do it for one gene would be sufficient.

Thanks in advance!

rna-seq • 363 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by kpr60
1

Point 4 is quite demanding. People help whichever way they can.

ADD REPLYlink modified 13 months ago • written 13 months ago by geek_y9.7k

I'm not trying to be rude, I just need something a little more than that at this point.

ADD REPLYlink modified 13 months ago • written 13 months ago by kpr60
1
gravatar for geek_y
13 months ago by
geek_y9.7k
Barcelona/CRG/London/Imperial
geek_y9.7k wrote:

Use ASEReadCounter from GATK.

ADD COMMENTlink written 13 months ago by geek_y9.7k
1
gravatar for colindaven
13 months ago by
colindaven1.3k
Hannover Medical School
colindaven1.3k wrote:

This whole project is very demanding. I would recommend looking at Phaser https://github.com/secastel/phaser

However, I spent a long time working on this, and got very little out. Nanopore is a better option for getting phased alleles in my opinion. Good luck.

ADD COMMENTlink written 13 months ago by colindaven1.3k

Thanks for the suggestion!

ADD REPLYlink written 13 months ago by kpr60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1620 users visited in the last hour