Entering edit mode
8 months ago
amy__ ▴ 50
I am aware this question has been asked similarly before but I cannot find a straighforward answer.
The headers for my bam look like this (just a snippet):
I have lifted my bed file to hg38 from hg19 and would like to now have these accession numbers like chr1, chr2, chr3....
I have looked at this https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh38_RefSeq2UCSC.txt but find it confusing. I have previously had to recode another bed file so know how to do it, its just which accession code is associated to which chromosome which is confusing me.
The file from dpryan79 is pretty clear to me. What about it confuses you?
save the planet: don't post screenshots when you can just copy-n-paste the text.
Thanks! I was going to copy and paste it but the lines went all skew
See my guide on how to format plain text so it looks pretty: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists
Hi! Thanks for your reply. So previously when I've recoded the bed file to match the bam accession its looked like this (for a different dataset):
In the dpryan79 file, it has a lot more options below and I'm not quite sure which would related to the specific chromosomes - and like also NT_113901.1 chrUn_GL000195v1 what would that mean?
Basically, I want to be able to do the code above, but I'm not sure which bits to use.
Google "chrUn" to understand what those contigs are. dpryan's file is just a mapping table so you should be able to use it unless you have contigs not mapped in that table.
So my bed file currently only has chr1, chr2 ect. In the mapping file for example it has NT_187361.1 chr1_KI270706v1_random
So if I am changing the IDs in the bed file to match the bam, would it be appropriate to say chr1 should now be NT_187361.1 in the bed file.
In the bam file there is also NC_000001.11 too, so multiple IDs in the bed file will be changed from chr1 to these new IDs in the bam file?
What is "your bed file"? There is the BED file from the gist and your BAM file. Do not alter the BED. By what logic does
chr1when it matches
chr1is not the same as
My bed file is the exome panel kit used by the sequencing company, it contains the exome regions. I am using deepvariant which needs the chromosome IDs to match, currently they do not because the bed file is chr1, chr2 ... and the bam file has all the chromosome IDs shown in the header.
If the bed file does not match deepvariant will produce this error:
Thus I need my contig names in the bed to match the bam. There are so many headers that I am not sure which contigs to use. I have had a similar problem before and used the mutate code above to recode the bed file and then deepvariant worked but I don't know if that is the correct way.
I just want to make sure that this is the correct way, and if not, how would you do it?
Your mutate should work fine. Just make sure the mapping dpryan BED file matches your BAM contig names in one column and the exome panel BED contigs in the other column.
Thank you for your comment, I checked with a senior bioinf I know, and I was correct to do what I did.
For anyone in the future: The NC_000011.1 etc is the RefSeq accession numbers, you can change the bed file chromosome ID column from chr1 ect to NC_000011.1 etc using my mutate code above in R. As Ram stated, the other headers relate to pseudo-contigs or other regions of the chromosome that aren't explicitly just chr1 ect.. these won't appear in your bed file as the bed file just shows the exome regions.
Please correct me if I am wrong.
clash with each other. Just be sure to map chromosomes to
NCIDs and not any other
In the end I did, originally my bed file was hg19 so used liftover to get hg38, then got the GRCh38_RefSeq2UCSC.txt:
Also for anyone curious, I actually used the wrong HG38 file which is why I started with the refeq accession numbers first - oops! Make sure you use the correct reformatted reference genome and you won't have to do any of the above.