Question: prepare file for depth of coverage
0
gravatar for bioguy24
29 days ago by
bioguy24190
Chicago
bioguy24190 wrote:

Trying to prepare a Rod file for use with GATK depth of coverage. I downloaded a standard hg1g refseq file and I need to remove non-standars contigs other then chr1-22 chrx and y and chrM and sort in karotypic order. Is the below the best way to do so? Thank you :).

cat getRefGene.txt | grep -v chrUn* | grep -v *random | grep -v chrM | grep -v *hap* | sort -k1,1 -V -s > output.txt
depth of coverage • 132 views
ADD COMMENTlink written 29 days ago by bioguy24190
1

grep can accept multiple search patterns in the regex:

grep -vE 'random|chrM|hap|' getRefGene.txt | sort -k1,1 > (...)

I would not use -V as most tools expect standard rather than natural sort order. Ca you show the content of this getRefGene.txt and expected output?

ADD REPLYlink written 29 days ago by ATpoint23k

input file (getRefegene.txt)

#bin    name    chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    score   name2   cdsStartStat    cdsEndStat  exonFrames
0   NM_001308203.1  chr1    +   66999251    67216822    67000041    67208778    22  66999251,66999928,67091529,67098752,67105459,67108492,67109226,67136677,67137626,67138963,67142686,67145360,67154830,67155872,67160121,67184976,67194946,67199430,67205017,67206340,67206954,67208755,  66999355,67000051,67091593,67098777,67105516,67108547,67109402,67136702,67137678,67139049,67142779,67145435,67154958,67155999,67160187,67185088,67195102,67199563,67205220,67206405,67207119,67216822,  0   SGIP1   cmpl    cmpl    -1,0,1,2,0,0,1,0,1,2,1,1,1,0,1,1,2,2,0,2,1,1,
0   NM_032291.3 chr1    +   66999638    67216822    67000041    67208778    25  66999638,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,   67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67216822,   0   SGIP1   cmpl    cmpl    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,

The expected output I believe would be a column 3 with only chr1-22 x y and m. Thank you :).

ADD REPLYlink modified 29 days ago • written 29 days ago by bioguy24190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 698 users visited in the last hour