"C" and "G" positions
1
0
Entering edit mode
4.1 years ago

I need to creat a BED file with all "C"s positions, and another one with all "G" position in Human38. how can I do that?

R genome • 476 views
ADD COMMENT
0
Entering edit mode

Locating A Sequence In A Fasta File.

Be aware that the list is going to be extensively long and big, given roughly 25% of nucleotides being either G or C this will be 750.000.000 entries per C and G.

ADD REPLY
0
Entering edit mode

it semms like im missing something... /path/ref.fa.fai - what path should it be?

ADD REPLY
0
Entering edit mode

path to the indexed fasta reference (indexed with 'samtools faidx`)

ADD REPLY
1
Entering edit mode
4.1 years ago
cut -f 1 /path/ref.fa.fai  | while read S ; do  samtools faidx -n 1 /path/ref.fa $S | awk -v S=$S '/^[CGcg]/ {printf("%s\t%d\t%d\n",S,NR-1,NR);}' ; done
ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6