Viral genes not showing up in combined mouse+virus alignment
0
0
Entering edit mode
3 months ago
cook.675 ▴ 220

I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command.

The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this:

enter image description here

I then made a reference with this combined fasta and GTF file using STAR and aligned to my samples. The BAM output files show the reference contains information from the MHV-A59 genome which it recognizes as an extra chromosome. Here is the last 4 entries from the Log.out file that lists the chromosomes. NC_001846.1 is the A59 genome and has 31357 bp:

58  NT_166452.1 20208   2737307648
59  NT_187064.1 114452  2737569792
60  NW_023337853.1  31129   2737831936
61  NC_005089.1 16299   2738094080
62  NC_001846.1 31357   2738356224

When I generate the counts table, all of mouse gene names are in the table but none of the A59 names are like "N" and "ORF1ab"

Thanks for any suggestions you can provide!

RNAseq • 405 views
ADD COMMENT
0
Entering edit mode

Hello I was curious to follow up and wonder if anyone knew why the viral genes are not showing up in the count matrix? I can't seem to solve this one

ADD REPLY
0
Entering edit mode

Try taking out the additional header lines from the combined GTF file. NC_005089.1 lines should be immediately followed by NC_001846.1 lines.

ADD REPLY
0
Entering edit mode

I figured it out! The GTF.featureType uses 'exon' by default which is present in the mouse GTF but the viral GTF file has no exon feature name. The quickest fix was to change the viral 'CDS' feature to 'exon' (only 6 edits at the end of the document) and re-run featureCounts

ADD REPLY

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6