Question: Add `chr` to first column of specific rows?
1
gravatar for star
3 months ago by
star140
Netherlands
star140 wrote:

I have a .gtf file as below I would like to add "chr" to the first column of the file but not in first 5 rows?

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

output:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
chr1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
chr1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

I used the foloww cods but it add "chr" to the first 5 lines, as well.

cat Homo_sapiens.GRCh37.gtf | sed 's/^/chr/' > chr.gtf
rna-seq alignment R • 230 views
ADD COMMENTlink modified 3 months ago by Chirag Parsania1.4k • written 3 months ago by star140

Check if a line starts with a number (and X/Y/MT) and only then add chr using sed.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax65k

Could be done using R, but this is more suitable to command line, sed, awk,...

ADD REPLYlink written 3 months ago by zx87547.1k
3
gravatar for ATpoint
3 months ago by
ATpoint15k
Germany
ATpoint15k wrote:
awk 'OFS="\t" {if (NR > 5) $1="chr"$1; print}' in.gtf

Could have been found on google easily...

ADD COMMENTlink written 3 months ago by ATpoint15k
1
gravatar for Cristian.perez
3 months ago by
Valencia
Cristian.perez50 wrote:

I would have gone with:

cat <(grep '^#' file.txt) <(grep -v '^#' file.txt  | sed 's/^/chr/g')

In order to be more dynamical.

ADD COMMENTlink written 3 months ago by Cristian.perez50
0
gravatar for Chirag Parsania
3 months ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

~ R way.

dd <- tibble::tribble(
  ~V1,       ~V2,          ~V3, ~V4,       ~V5, ~V6, ~V7, ~V8,                                               ~V9,
    1, "ensembl", "chromosome",   1, 300239041, ".", ".", ".",      "ID=1;Name=chromosome:AGPv1:1:1:300239041:1",
    1, "ensembl",       "exon",   3,       104, ".", "+", ".", "Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07"
  )

dd2 <- dd %>% dplyr::mutate(V1 = paste("Chr" , V1 , sep=""))

# A tibble: 2 x 9
  V1    V2      V3            V4        V5 V6    V7    V8    V9                                             
  <chr> <chr>   <chr>      <dbl>     <dbl> <chr> <chr> <chr> <chr>                                          
1 Chr1  ensembl chromosome     1 300239041 .     .     .     ID=1;Name=chromosome:AGPv1:1:1:300239041:1     
2 Chr1  ensembl exon           3       104 .     +     .     Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
ADD COMMENTlink written 3 months ago by Chirag Parsania1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour