Question: Add `chr` to first column of specific rows?
1
gravatar for star
18 months ago by
star240
Netherlands
star240 wrote:

I have a .gtf file as below I would like to add "chr" to the first column of the file but not in first 5 rows?

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

output:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
chr1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
chr1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

I used the foloww cods but it add "chr" to the first 5 lines, as well.

cat Homo_sapiens.GRCh37.gtf | sed 's/^/chr/' > chr.gtf
rna-seq alignment R • 1.4k views
ADD COMMENTlink modified 18 months ago by Chirag Parsania1.8k • written 18 months ago by star240

Check if a line starts with a number (and X/Y/MT) and only then add chr using sed.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax85k

Could be done using R, but this is more suitable to command line, sed, awk,...

ADD REPLYlink written 18 months ago by zx87549.3k
5
gravatar for ATpoint
18 months ago by
ATpoint36k
Germany
ATpoint36k wrote:
awk 'OFS="\t" {if (NR > 5) $1="chr"$1; print}' in.gtf

Could have been found on google easily...

ADD COMMENTlink written 18 months ago by ATpoint36k
2
gravatar for Chirag Parsania
18 months ago by
Chirag Parsania1.8k
University of Macau
Chirag Parsania1.8k wrote:

~ R way.

dd <- tibble::tribble(
  ~V1,       ~V2,          ~V3, ~V4,       ~V5, ~V6, ~V7, ~V8,                                               ~V9,
    1, "ensembl", "chromosome",   1, 300239041, ".", ".", ".",      "ID=1;Name=chromosome:AGPv1:1:1:300239041:1",
    1, "ensembl",       "exon",   3,       104, ".", "+", ".", "Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07"
  )

dd2 <- dd %>% dplyr::mutate(V1 = paste("Chr" , V1 , sep=""))

# A tibble: 2 x 9
  V1    V2      V3            V4        V5 V6    V7    V8    V9                                             
  <chr> <chr>   <chr>      <dbl>     <dbl> <chr> <chr> <chr> <chr>                                          
1 Chr1  ensembl chromosome     1 300239041 .     .     .     ID=1;Name=chromosome:AGPv1:1:1:300239041:1     
2 Chr1  ensembl exon           3       104 .     +     .     Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
ADD COMMENTlink written 18 months ago by Chirag Parsania1.8k
1
gravatar for Cristian.perez
18 months ago by
Valencia
Cristian.perez50 wrote:

I would have gone with:

cat <(grep '^#' file.txt) <(grep -v '^#' file.txt  | sed 's/^/chr/g')

In order to be more dynamical.

ADD COMMENTlink written 18 months ago by Cristian.perez50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour