Retrieve reads overlapping indels
1
0
Entering edit mode
2.8 years ago
yliueagle ▴ 290

I am doing allelic-specific RNA-seq analysis. Given a list of variants, is there an efficient program to separate reads at each variant position based on whether the variant is present? (especially for variant=indels since they accounted for 10% of my variant list). This looks quite obvious when visualized by IGV but I did not find a program doing it.

(Note: preferably I will need the output to be reads instead of other summarized statistics

Thanks

variants reads indel • 582 views
ADD COMMENT
0
Entering edit mode
2.8 years ago
Medhat 9.7k

Extract intervals as a bed file (do not forget to add padding like 30 bp).
Later, you need to use something like that:

 function extract_fastq {
   myFile=$1
   while read -r line; do a=$(echo $line | awk '{print $1":"$2"-"$3}'); samtools view  "${myFile}".bam  "$a" | awk '{print $1"\t"$10}' |  sort -u -k1,1 |  awk '{print ">"$1"\n"$2}' | bgzip > "${a/:/_}".fq.gz; done < "${myFile}".bed
 }

To use it:

extract_fastq your_bed_file
ADD COMMENT

Login before adding your answer.

Traffic: 2832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6