Question: Changing Chromosome Notation in Bam Files to Include Sample ID
0
gravatar for miles.thorburn
2.0 years ago by
miles.thorburn90 wrote:

I've found various ways to change the notation of the chromosomes in my bam files. However, would it be a bad idea to add my sample identifier to the chromosome notation? For example, changing chrX to chrX_US1 for the first US sample. I have a large data set and I'm going to be running analyses per chromosome, so I'm worried once I start I won't be able to determine which chromosome came from where.

Prior to making my sorted consensus sequences, all the samples were mapped with the same reference genome so shouldn't need to be realigned. Instead, I'm just going to move them all those I'm comparing into the same .fasta file.

I am very new to this, so could be making huge mistakes, hence asking on here.

Thanks in advance.

bam samtools alignment • 664 views
ADD COMMENTlink modified 2.0 years ago by Noushin N570 • written 2.0 years ago by miles.thorburn90
2

You should consider using read-groups instead of changing reference names.

ADD REPLYlink written 2.0 years ago by genomax78k

Thanks. That looks like it could be quite promising. However, I can't seem to find out if the read-group information is retained when converting from .bam to . fasta format. Will this information be retained?

ADD REPLYlink written 2.0 years ago by miles.thorburn90

If you split the bam files into read specific chunks then you can indirectly retain that information. It would not be directly transferred to the fasta files. You will need to rename the fasta after the fact to include the sample name in headers.

ADD REPLYlink written 2.0 years ago by genomax78k

You wrote

I have a large data set and I'm going to be running analyses per chromosome,

and

I can't seem to find out if the read-group information is retained when converting from .bam to . fasta format.

but since we have totally no idea what you are trying to accomplish we can't really give you a good answer to this. But in general, I agree with Noushin N that this sounds like a bad idea. I can't imagine a scenario in which this would be the best solution. In bam files, read groups are solving your problem. But you want (for unclear reasons) to keep that in fasta files. Also, when converting the bam file back to fasta you also lose the information of the mapping location.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by WouterDeCoster43k

Apologies for the lack of information. Ultimately I am going to be converting these files into .phylip format to use in the program VariScan. Unfortunately, there don't appear to be any direct conversions, so I have to convert the file into .fasta first.

For VariScan I need anywhere from 12 to 66 samples aligned by chromosome in one .phylip file. Thus, I need a way of identifying which sample is which. In my current workflow, I don't see a way to retain sample identity once they are in the same file.

I hope that clears up my aims and intentions.

ADD REPLYlink written 2.0 years ago by miles.thorburn90

Not sure if this will help you BamBam. May be worth a look.

ADD REPLYlink written 2.0 years ago by genomax78k

Thanks for your suggestions. I've finally come up with a solution, and it was so much easier than I had anticipated. You can directly edit the header of the consensus fasta sequence, which is apparently retained when converting to .phylip. All you need to do is keep a spreadsheet with the information usually kept in the fasta header.

ADD REPLYlink written 2.0 years ago by miles.thorburn90
1
gravatar for Noushin N
2.0 years ago by
Noushin N570
Baltimore, MD
Noushin N570 wrote:

This doesn't sound like a good idea. I realize that you mention the reads have already been re-aligned to a common reference; but alignment is typically just the first step in the analysis pipeline. Re-naming chromosomes to non-standard ones will likely result in error and/or inaccuracy in many downstream steps, such as annotation and variant calling.

ADD COMMENTlink written 2.0 years ago by Noushin N570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour