Question: Add `chr` to first column of specific rows?
1
gravatar for star
8 days ago by
star80
Netherlands
star80 wrote:

I have a .gtf file as below I would like to add "chr" to the first column of the file but not in first 5 rows?

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

output:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
chr1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
chr1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

I used the foloww cods but it add "chr" to the first 5 lines, as well.

cat Homo_sapiens.GRCh37.gtf | sed 's/^/chr/' > chr.gtf
rna-seq alignment R • 113 views
ADD COMMENTlink modified 8 days ago by Chirag Parsania1.3k • written 8 days ago by star80

Check if a line starts with a number (and X/Y/MT) and only then add chr using sed.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax60k

Could be done using R, but this is more suitable to command line, sed, awk,...

ADD REPLYlink written 8 days ago by zx87546.2k
3
gravatar for ATpoint
8 days ago by
ATpoint12k
Germany
ATpoint12k wrote:
awk 'OFS="\t" {if (NR > 5) $1="chr"$1; print}' in.gtf

Could have been found on google easily...

ADD COMMENTlink written 8 days ago by ATpoint12k
1
gravatar for Cristian.perez
8 days ago by
Valencia
Cristian.perez50 wrote:

I would have gone with:

cat <(grep '^#' file.txt) <(grep -v '^#' file.txt  | sed 's/^/chr/g')

In order to be more dynamical.

ADD COMMENTlink written 8 days ago by Cristian.perez50
0
gravatar for Chirag Parsania
8 days ago by
Chirag Parsania1.3k
University of Macau
Chirag Parsania1.3k wrote:

~ R way.

dd <- tibble::tribble(
  ~V1,       ~V2,          ~V3, ~V4,       ~V5, ~V6, ~V7, ~V8,                                               ~V9,
    1, "ensembl", "chromosome",   1, 300239041, ".", ".", ".",      "ID=1;Name=chromosome:AGPv1:1:1:300239041:1",
    1, "ensembl",       "exon",   3,       104, ".", "+", ".", "Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07"
  )

dd2 <- dd %>% dplyr::mutate(V1 = paste("Chr" , V1 , sep=""))

# A tibble: 2 x 9
  V1    V2      V3            V4        V5 V6    V7    V8    V9                                             
  <chr> <chr>   <chr>      <dbl>     <dbl> <chr> <chr> <chr> <chr>                                          
1 Chr1  ensembl chromosome     1 300239041 .     .     .     ID=1;Name=chromosome:AGPv1:1:1:300239041:1     
2 Chr1  ensembl exon           3       104 .     +     .     Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
ADD COMMENTlink written 8 days ago by Chirag Parsania1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour