Generate Read counts from bam file
2
0
Entering edit mode
5 months ago

Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss).

I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is taken from here link

To do read mapping for this dataset what should I take as reference genome (human reference genome or mitochondrial reference genome)

After doing read mapping I have to generate read counts to apply DeSeq2 on it.

If I'm using mitochondrial reference genome to align the fastq files then in further steps I'll need annotation file (gtf) to get read counts. But I'm unable to find annotation file of mitochondrial genome. How can I get this annotation file for mitochondrial genome?

And my other query is: can i use human reference genome(hg38) for read mapping because annotation file for this genome is available and then generate read counts.

Please tell me which approach will be better.

RNAseq reference_genome Deseq2 read_counts • 704 views
ADD COMMENT
0
Entering edit mode

Are you analysing RNA or DNA heteroplasmy levels?

ADD REPLY
2
Entering edit mode
5 months ago
ATpoint 82k

I don't follow. You always align to the full genome (which includes the mt genome in case of human reference genomes).

Anyway, if you go to the link you provide and scroll down to Supplementary file the authors already provide a matrix of raw counts, so why bother and not just use that?

ADD COMMENT
0
Entering edit mode
5 months ago
Enrique • 0

Hello, I recommend you using the mitochondrial reference genome. For the GTF file (or GFF, they are in general the same), checkout this post: Where are the associated annotation (GFF) files for mitochondrial genomes on NCBI?.

ADD COMMENT
3
Entering edit mode

No, absolutely not. Mapping to such a tiny subset leads to false positives. Use the entire genome that includes the mt reference.

ADD REPLY
0
Entering edit mode

Great appreciation. If you don't use restrictive arguments in the mapping, is better to use the entire genome to avoid the false positives related to the "low complexity" of the mitochondrial genome.

ADD REPLY
0
Entering edit mode

It has nothing to do with low complexity. You always map to the entire genome since the reads can come from the entire genome. If you take away the true origin of the reads during mapping then the aligner will still try to match it elsewhere, leading to false results.

ADD REPLY

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6