2
0
Entering edit mode
6 weeks ago
Yoosef ▴ 50

Hello I have some code to extract data from my .VCF file.

 cat '/home/yousef/Desktop/Haplotyp_results/INDEL/info_8indel.vcf' |
sed 's/^.*;DP=$$[0-9]*$$;.*$/\1/'  I wanted to know what is the meaning of this 1. (sed 's/^.*;DP=$$[0-9]*$$;.*$/\1/')

section in the code. I will be pleased if you could give me a source to find the meaning and function of such codes. Thanks

NGS Sequesncing calling varient VCF • 303 views
4
Entering edit mode
6 weeks ago
lacb ▴ 120

This is a sed command with a substitution instruction (s/<regex>/<subtitution>/) using a regular expression (regex).

There are many websites where you can learn how regex and the sed command works, but basically what this code does :

• it takes the lines from the vcf file
• ^.*;DP=$$[0-9]*$$;.*$ matches the whole line and captures the numbers following the "DP" tag (see https://regex101.com/ to explain the regex) • finally it replaces the whole match (the line) by the captured number (\1) and prints it So the whole command transforms the vcf file to a file containing only the depths of the variant call. ADD COMMENT 3 Entering edit mode 6 weeks ago This code is called a regex - short for regular expression. Phil Ewels gave a nice introductionary talk about them a while ago and the website regex101.com is very helpful to explain and try them out. sed is (mostly) a replacement tool. So it takes a piece of text and replaces it with a modified version of the input. The basic notation is sed 's/ take this / replace it with that /'  the replace it with that part is in your case only $1, which refers to the contents between the braces: [0-9]*. The regex roughly says: in each line, search for exactly this combination of letters and symbols ;DP= and retain only the number (all the digits) that follows right thereafter.

As you are learning regex and sed use, also start right away with avoiding any Useless Use of Cat ;-)

sed 's/^.*;DP=$$[0-9]*$$;.*\$/\1/' <  '/home/yousef/Desktop/Haplotyp_results/INDEL/info_8indel.vcf' > output.txt