Entering edit mode
4.0 years ago
chicheheda ▴ 10
I have accidentally aligned a large amount of samples to a reference genome that includes patch chromsomes (specifically, using GRCh38.p5. According to the STAR manual I should only have used the major chromosomes and un-placed and un-localized chromosomes (which I think is GRCh38.
However, I do not know how bad it is. Some reads will go to genes on the patched chromosomes, could I assign these to the same gene that is located on the main chromosomes? Or is the only way to fix it to realign to the correct chromosome?
Not sure what the consequences are, other than as you said, reads that might otherwise map to major chromosomes might get mapped to patch chromosomes.
You could inspect a few bam files, and collect the distribution of chromosomes to see how often it happens. If a negligible fraction of reads are not mapped on the major chromosomes then it's unlikely that it'll change your results by a lot.
Possibly, you could extract/identify those reads, and either extract them from BAM to FASTQ or subset your FASTQ file for matching input reads, and remap them on their own to the assembly you want. Then remove those previously mapped reads from your bam file, and combine your newly mapped reads the bam with filtered reads.