Question: Changing Chromosome Notation in Bam Files to Include Sample ID
0
gravatar for miles.thorburn
14 months ago by
miles.thorburn80 wrote:

I've found various ways to change the notation of the chromosomes in my bam files. However, would it be a bad idea to add my sample identifier to the chromosome notation? For example, changing chrX to chrX_US1 for the first US sample. I have a large data set and I'm going to be running analyses per chromosome, so I'm worried once I start I won't be able to determine which chromosome came from where.

Prior to making my sorted consensus sequences, all the samples were mapped with the same reference genome so shouldn't need to be realigned. Instead, I'm just going to move them all those I'm comparing into the same .fasta file.

I am very new to this, so could be making huge mistakes, hence asking on here.

Thanks in advance.

bam samtools alignment • 477 views
ADD COMMENTlink modified 14 months ago by Noushin N550 • written 14 months ago by miles.thorburn80
2

You should consider using read-groups instead of changing reference names.

ADD REPLYlink written 14 months ago by genomax65k

Thanks. That looks like it could be quite promising. However, I can't seem to find out if the read-group information is retained when converting from .bam to . fasta format. Will this information be retained?

ADD REPLYlink written 14 months ago by miles.thorburn80

If you split the bam files into read specific chunks then you can indirectly retain that information. It would not be directly transferred to the fasta files. You will need to rename the fasta after the fact to include the sample name in headers.

ADD REPLYlink written 14 months ago by genomax65k

You wrote

I have a large data set and I'm going to be running analyses per chromosome,

and

I can't seem to find out if the read-group information is retained when converting from .bam to . fasta format.

but since we have totally no idea what you are trying to accomplish we can't really give you a good answer to this. But in general, I agree with Noushin N that this sounds like a bad idea. I can't imagine a scenario in which this would be the best solution. In bam files, read groups are solving your problem. But you want (for unclear reasons) to keep that in fasta files. Also, when converting the bam file back to fasta you also lose the information of the mapping location.

ADD REPLYlink modified 14 months ago • written 14 months ago by WouterDeCoster38k

Apologies for the lack of information. Ultimately I am going to be converting these files into .phylip format to use in the program VariScan. Unfortunately, there don't appear to be any direct conversions, so I have to convert the file into .fasta first.

For VariScan I need anywhere from 12 to 66 samples aligned by chromosome in one .phylip file. Thus, I need a way of identifying which sample is which. In my current workflow, I don't see a way to retain sample identity once they are in the same file.

I hope that clears up my aims and intentions.

ADD REPLYlink written 14 months ago by miles.thorburn80

Not sure if this will help you BamBam. May be worth a look.

ADD REPLYlink written 14 months ago by genomax65k

Thanks for your suggestions. I've finally come up with a solution, and it was so much easier than I had anticipated. You can directly edit the header of the consensus fasta sequence, which is apparently retained when converting to .phylip. All you need to do is keep a spreadsheet with the information usually kept in the fasta header.

ADD REPLYlink written 14 months ago by miles.thorburn80
1
gravatar for Noushin N
14 months ago by
Noushin N550
Baltimore, MD
Noushin N550 wrote:

This doesn't sound like a good idea. I realize that you mention the reads have already been re-aligned to a common reference; but alignment is typically just the first step in the analysis pipeline. Re-naming chromosomes to non-standard ones will likely result in error and/or inaccuracy in many downstream steps, such as annotation and variant calling.

ADD COMMENTlink written 14 months ago by Noushin N550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1850 users visited in the last hour