Question: Difference in read reference names when aligning reads.
gravatar for anthony.nash
8 months ago by
anthony.nash0 wrote:

This might be a newbie question, I'm a QM Chemist stepping in for a bioinformatician at work, so I am sorry in advance for the lack of necessary information required to help with my question.

I have a documented number of steps to follow that allows me to align my paired-end reads to a human reference genome and then perform variant calling. I am using Samtools, GATK and Picard. I am also using the same reference gnome fa file as my colleague.

However, when I perform the variant calling and I look inside the sam file I generated, I only have reference names "ref|NT_.....|". The original files generated by the bioinformatician have "NC...." as reference names. The code further down the pipeline will require the NC... naming structure.

I don't want this question to feel too much like a black box, but if I could get a general idea of what will have caused the difference in read reference names, I would be really grateful and I can go from there.


sequence assembly genome • 300 views
ADD COMMENTlink written 8 months ago by anthony.nash0

You must have aligned your data to a reference collection that had ref|NT.. names instead of the NC names.

ADD REPLYlink written 8 months ago by genomax65k

Would this be something inside the human genome *.fa file?

ADD REPLYlink written 8 months ago by anthony.nash0

Yes. Take a look at grep "^>" .fa and see if that is what you have.

You need to use matched genome sequence/annotation for this reason.

ADD REPLYlink written 8 months ago by genomax65k

Ah I see! Thanks. I didn't spot a single NC notation. The file I have is hg38_GRCh38.p12.allChr.fa - any idea where to get hold of the corresponding reference file with NC rather than ref|NT/NW? I appreciate your help, I'm a little out of my skill set and comfort zone at the moment.

ADD REPLYlink written 8 months ago by anthony.nash0

Do you know where you got that file from? You can get matching reference and GTF files from this page. You will need to realign your data though.

Here is an informative blog post that you will find useful about which human reference to use.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax65k

I am afraid I don't recall where that file on my system came from. Thank you for that information, I'll try and plod on from here.

ADD REPLYlink written 8 months ago by anthony.nash0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 938 users visited in the last hour