remove scaffold and other unplaced sequence before mapping ?
1
0
Entering edit mode
8.1 years ago
yongxpeng • 0

Hi,
I downloaded reference genomes from Ensembl (fasta format). But there are lots of sequences with name "dna:scaffold": https://github.com/CTLife/TEMP/tree/master/RefGenomes

Such as Mouse_GRCm38 (mm10), except chromosome 1-19, Mt, X and Y; others should be removed before mapping ? https://github.com/CTLife/TEMP/blob/master/RefGenomes/Mouse_GRCm38.p4.txt

Such as Human_GRCh38.p5 (hg38), https://github.com/CTLife/TEMP/blob/master/RefGenomes/Human_GRCh38.p5.txt, there are 516 sequences. In addition to chromosome 1-22, Mt, X and Y; others (such as CHR_HG2241_PATCH and KI270728.1) should be removed before mapping ?

RNA-Seq ChIP-Seq genome sequencing next-gen • 4.1k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
abascalfederico ★ 1.2k

The latest release of the human genome (don't know about mice) contains alternative contigs. You will need an alternative-contig aware algorithm like BWA: https://github.com/lh3/bwa/blob/master/README-alt.md

If you are not using one of this kind of algorithms it would be better to remove the alternative contigs. That's because a read may map to multiple alternative contigs and be (incorrectly) considered a non-uniquely mapped read.

HTH

ADD COMMENT
0
Entering edit mode

OK, thank you. I am using BWA, Bowtie2 and Subread for ChIP-seq reads mapping. But for RNA-seq reads, the alternative contigs must be removed ?
How do you think about https://sequencing.qcfail.com/articles/genomic-sequence-not-in-the-genome-assembly-creates-mapping-artefacts/ ? It is a nice explanation of why we might not want to remove those extra sequences until after mapping.

ADD REPLY
0
Entering edit mode

If I understood well that link is about repetitive sequences, not about alternative contigs

For RNA-seq... it depends. For example, if you want to analyse HLA genes, which are highly diverse, you would need the alternative contigs. I guess most people just ignore alternative contigs because of the increase in complexity.

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6