Question

How to map reads onto human hg38 gene body regions instead of whole genome?

0

Entering edit mode

3.3 years ago

kindle.ama98 • 0

Hi there, I am thinking of mapping reads onto gene bodies to make it less strict for off-target identification. Wondering if there's any file there already, or I have to extract certain regions manually. Thanks!

alignment genome • 893 views

ADD COMMENT • link updated 3.3 years ago by jordi.planells ▴ 480 • written 3.3 years ago by kindle.ama98 • 0

score 1 · Answer 1 · 2021-01-13

I would get the fasta file of the human transcripts and map it against it. You can get the RefSeq annotation from here. Then you can build an index with your favorite aligner and align against it.
Other option would be to extract the genes from a gtf annotation and use bedtools getfasta to get the fasta file from the desired intervals.

score 0 · Answer 2 · 2021-01-12

0

Entering edit mode

3.3 years ago

karl.stamm 4.1k

Can't say if it's a good idea or not, but the way I would do that is to mask the genome reference. Just hack up your genome reference to have a bunch of NNNN in the intergenic regions.

Or use a transcriptome mapper like STAR/RSEM directly, and forego the genome mapping.

ADD COMMENT • link 3.3 years ago by karl.stamm 4.1k