Question: Analysing BAM files
0
gravatar for User000
9 months ago by
User000380
User000380 wrote:

Hello,

I have aligned around 3 varieties against 5 different reference genomes. Now I have several BAM files. What I want to do next is to see if there are different SNPs in different varieties and references (in the same region of course), or if there are some regions where there are no reads or exactly same reads. In other words may be having BAM file results as a table could be a good option. I dont think variant calling (.vcf) is useful in this case. Does anybody know how to transform a BAM file into a parsable table?

Any suggestions could be very helpful.

rna-seq bam • 405 views
ADD COMMENTlink modified 9 months ago by julien.fouret.fr20 • written 9 months ago by User000380

Do you have the coordinates for the same region across your references?

ADD REPLYlink written 9 months ago by Brice Sarver3.5k

Nope....this is the problem i need someho to analyse SNPs even if I dont know the coordinates...

ADD REPLYlink written 9 months ago by User000380

I dont think variant calling (.vcf) is useful in this case. Does anybody know how to transform a BAM file into a parsable table?

These discussions might be helpful:

ADD REPLYlink modified 9 months ago • written 9 months ago by igor11k
1
gravatar for julien.fouret.fr
9 months ago by
julien.fouret.fr20 wrote:

Hi,

If you want the coordinates for the "same" region across the references, you can perform a genomic multiple alignment. You could use this tool : https://www.biorxiv.org/content/10.1101/730531v1

To be sure of the context what are your 5 references from ? Different species ?

To avoid multi-alignment at the genome level (that would be quite difficult to manage and to use after in your analysis), I would propose the following strategy.

  1. Choose a reference above all, let's call it THE reference
  2. perform SNP analysis against THE reference (see GATK guidelines: https://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-1) based on DNA-Seq reads for your 3 varieties.
  3. Align your other references vs. THE reference and extract per-position variations... you could use the combination of samtools mpileup and bcftools without any filter... see samtools mpileup VCF output
  4. Finally, you can merge 2 and 3 to compare variants from references and varieties to THE references and between them with a multi-sample vcf analysis.

Of note, you will need an alignment tool that works with "assembly to assembly" alignment; you could use this: https://github.com/lh3/minimap2

Good luck,

ADD COMMENTlink written 9 months ago by julien.fouret.fr20

The 5 genomes are just different varieties of the same species... So, just to be sure the methods you proposed are two: a. Either do multi-alignment b. or the step 1-4 strategy, right? Thanks for your useful suggestion.

ADD REPLYlink written 9 months ago by User000380

Yes.

In my point of view, If this 5 varieties are very similar, the simpliest would be to avoid multiple alignment.

ADD REPLYlink written 9 months ago by julien.fouret.fr20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 615 users visited in the last hour