How to safely rename the chromosome names in fasta and gtf
0
0
Entering edit mode
5.2 years ago
ddzhangzz ▴ 90

I have a fasta with headers such like:

>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chrX X
>chrY Y
>chrM MT
>GL456210.1 GL456210.1
>GL456211.1 GL456211.1
>GL456212.1 GL456212.1
>GL456213.1 GL456213.1
>JH584292.1 JH584292.1
>JH584293.1 JH584293.1
>JH584294.1 JH584294.1
>JH584295.1 JH584295.1
>JH584296.1 JH584296.1

And in GTF

chr1    HAVANA  gene    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "4933401J01Rik"; level 2; havana_gene "OTTMUSG00000049935.1";
chr1    HAVANA  transcript      3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; tr
chr1    HAVANA  exon    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; transcript
chr1    ENSEMBL gene    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; gene_type "snRNA"; gene_name "Gm26206"; level 3;
chr1    ENSEMBL transcript      3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; tran
chr1    ENSEMBL exon    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; transcript_n
chr1    HAVANA  gene    3205901 3671498 .       -       .       gene_id "ENSMUSG00000051951.5"; gene_type "protein_coding"; gene_name "Xkr4"; level 2; havana_gene "OTTMUSG00000026353.2";

I wanted to add a letter "m" (denotes mouse) to these chromosome names such like mchr1 mchr2 etc.. How to safely do that?

RNA-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode

for the fasta you can gsub >chr with >mchr and for the GTF simple print "m"$1, both with awk. Please use google for these simple text manipulation tasks. Still I recommend against it because this is not a standard notation and might cause trouble if you download other resources from the web (or at least requires additional preprocessing of everything you download like annotations etc.)

ADD REPLY

Login before adding your answer.

Traffic: 3182 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6