Question: How to safely rename the chromosome names in fasta and gtf
0
gravatar for ddzhangzz
4 weeks ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I have a fasta with headers such like:

>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chrX X
>chrY Y
>chrM MT
>GL456210.1 GL456210.1
>GL456211.1 GL456211.1
>GL456212.1 GL456212.1
>GL456213.1 GL456213.1
>JH584292.1 JH584292.1
>JH584293.1 JH584293.1
>JH584294.1 JH584294.1
>JH584295.1 JH584295.1
>JH584296.1 JH584296.1

And in GTF

chr1    HAVANA  gene    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "4933401J01Rik"; level 2; havana_gene "OTTMUSG00000049935.1";
chr1    HAVANA  transcript      3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; tr
chr1    HAVANA  exon    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; transcript
chr1    ENSEMBL gene    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; gene_type "snRNA"; gene_name "Gm26206"; level 3;
chr1    ENSEMBL transcript      3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; tran
chr1    ENSEMBL exon    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; transcript_n
chr1    HAVANA  gene    3205901 3671498 .       -       .       gene_id "ENSMUSG00000051951.5"; gene_type "protein_coding"; gene_name "Xkr4"; level 2; havana_gene "OTTMUSG00000026353.2";

I wanted to add a letter "m" (denotes mouse) to these chromosome names such like mchr1 mchr2 etc.. How to safely do that?

rna-seq • 93 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by ddzhangzz90
1

for the fasta you can gsub >chr with >mchr and for the GTF simple print "m"$1, both with awk. Please use google for these simple text manipulation tasks. Still I recommend against it because this is not a standard notation and might cause trouble if you download other resources from the web (or at least requires additional preprocessing of everything you download like annotations etc.)

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1897 users visited in the last hour