Question: prepare file for depth of coverage
gravatar for bioguy24
15 months ago by
bioguy24190 wrote:

Trying to prepare a Rod file for use with GATK depth of coverage. I downloaded a standard hg1g refseq file and I need to remove non-standars contigs other then chr1-22 chrx and y and chrM and sort in karotypic order. Is the below the best way to do so? Thank you :).

cat getRefGene.txt | grep -v chrUn* | grep -v *random | grep -v chrM | grep -v *hap* | sort -k1,1 -V -s > output.txt
depth of coverage • 240 views
ADD COMMENTlink written 15 months ago by bioguy24190

grep can accept multiple search patterns in the regex:

grep -vE 'random|chrM|hap|' getRefGene.txt | sort -k1,1 > (...)

I would not use -V as most tools expect standard rather than natural sort order. Ca you show the content of this getRefGene.txt and expected output?

ADD REPLYlink written 15 months ago by ATpoint41k

input file (getRefegene.txt)

#bin    name    chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    score   name2   cdsStartStat    cdsEndStat  exonFrames
0   NM_001308203.1  chr1    +   66999251    67216822    67000041    67208778    22  66999251,66999928,67091529,67098752,67105459,67108492,67109226,67136677,67137626,67138963,67142686,67145360,67154830,67155872,67160121,67184976,67194946,67199430,67205017,67206340,67206954,67208755,  66999355,67000051,67091593,67098777,67105516,67108547,67109402,67136702,67137678,67139049,67142779,67145435,67154958,67155999,67160187,67185088,67195102,67199563,67205220,67206405,67207119,67216822,  0   SGIP1   cmpl    cmpl    -1,0,1,2,0,0,1,0,1,2,1,1,1,0,1,1,2,2,0,2,1,1,
0   NM_032291.3 chr1    +   66999638    67216822    67000041    67208778    25  66999638,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,   67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67216822,   0   SGIP1   cmpl    cmpl    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,

The expected output I believe would be a column 3 with only chr1-22 x y and m. Thank you :).

ADD REPLYlink modified 15 months ago • written 15 months ago by bioguy24190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour