Analysing BAM files
1
0
Entering edit mode
4.4 years ago
User000 ▴ 690

Hello,

I have aligned around 3 varieties against 5 different reference genomes. Now I have several BAM files. What I want to do next is to see if there are different SNPs in different varieties and references (in the same region of course), or if there are some regions where there are no reads or exactly same reads. In other words may be having BAM file results as a table could be a good option. I dont think variant calling (.vcf) is useful in this case. Does anybody know how to transform a BAM file into a parsable table?

Any suggestions could be very helpful.

RNA-Seq bam • 2.0k views
ADD COMMENT
0
Entering edit mode

Do you have the coordinates for the same region across your references?

ADD REPLY
0
Entering edit mode

Nope....this is the problem i need someho to analyse SNPs even if I dont know the coordinates...

ADD REPLY
0
Entering edit mode

I dont think variant calling (.vcf) is useful in this case. Does anybody know how to transform a BAM file into a parsable table?

These discussions might be helpful:

ADD REPLY
1
Entering edit mode
4.4 years ago

Hi,

If you want the coordinates for the "same" region across the references, you can perform a genomic multiple alignment. You could use this tool : https://www.biorxiv.org/content/10.1101/730531v1

To be sure of the context what are your 5 references from ? Different species ?

To avoid multi-alignment at the genome level (that would be quite difficult to manage and to use after in your analysis), I would propose the following strategy.

  1. Choose a reference above all, let's call it THE reference
  2. perform SNP analysis against THE reference (see GATK guidelines: https://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-1) based on DNA-Seq reads for your 3 varieties.
  3. Align your other references vs. THE reference and extract per-position variations... you could use the combination of samtools mpileup and bcftools without any filter... see samtools mpileup VCF output
  4. Finally, you can merge 2 and 3 to compare variants from references and varieties to THE references and between them with a multi-sample vcf analysis.

Of note, you will need an alignment tool that works with "assembly to assembly" alignment; you could use this: https://github.com/lh3/minimap2

Good luck,

ADD COMMENT
0
Entering edit mode

The 5 genomes are just different varieties of the same species... So, just to be sure the methods you proposed are two: a. Either do multi-alignment b. or the step 1-4 strategy, right? Thanks for your useful suggestion.

ADD REPLY
0
Entering edit mode

Yes.

In my point of view, If this 5 varieties are very similar, the simpliest would be to avoid multiple alignment.

ADD REPLY

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6