Question: How to safely rename the chromosome names in fasta and gtf
0
gravatar for ddzhangzz
6 months ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I have a fasta with headers such like:

>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chrX X
>chrY Y
>chrM MT
>GL456210.1 GL456210.1
>GL456211.1 GL456211.1
>GL456212.1 GL456212.1
>GL456213.1 GL456213.1
>JH584292.1 JH584292.1
>JH584293.1 JH584293.1
>JH584294.1 JH584294.1
>JH584295.1 JH584295.1
>JH584296.1 JH584296.1

And in GTF

chr1    HAVANA  gene    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "4933401J01Rik"; level 2; havana_gene "OTTMUSG00000049935.1";
chr1    HAVANA  transcript      3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; tr
chr1    HAVANA  exon    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; transcript
chr1    ENSEMBL gene    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; gene_type "snRNA"; gene_name "Gm26206"; level 3;
chr1    ENSEMBL transcript      3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; tran
chr1    ENSEMBL exon    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; transcript_n
chr1    HAVANA  gene    3205901 3671498 .       -       .       gene_id "ENSMUSG00000051951.5"; gene_type "protein_coding"; gene_name "Xkr4"; level 2; havana_gene "OTTMUSG00000026353.2";

I wanted to add a letter "m" (denotes mouse) to these chromosome names such like mchr1 mchr2 etc.. How to safely do that?

rna-seq • 283 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by ddzhangzz90
1

for the fasta you can gsub >chr with >mchr and for the GTF simple print "m"$1, both with awk. Please use google for these simple text manipulation tasks. Still I recommend against it because this is not a standard notation and might cause trouble if you download other resources from the web (or at least requires additional preprocessing of everything you download like annotations etc.)

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2500 users visited in the last hour