How to extract reads/fragments aligned to intergenic regions from bam/sam file?
1
0
Entering edit mode
15 months ago
MACRODER • 0

Hi. I'm new to RNA-seq data analysis, and I'm trying to extract all reads/fragments aligned in intergenic regions from a sam/bam file. I have a .GTF file with genomic features. I'm working with an organism that does not have introns, so, I would like to use the end position of each gene as a start position of my region of interest, and the start position of the following gene as the end position of my roi. I would like the output to be a bam file with only the reads that map to intergenic regions. Can anyone help me? Sorry if it is a basic question.

And I would also like to construct a GTF file with this intergenic regions as if they were features, but my skills in programming languages are not really good.

Any help will be appreciated. Thank you

RNA-Seq alignment sequencing bam intergenic • 716 views
ADD COMMENT
1
Entering edit mode

You should be able to do it with bedtools intersect see here: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

ADD REPLY
2
Entering edit mode
15 months ago

extract the gene positions and convert as bed.

awk -F '\t' '($3=="gene") {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' in.gtf | sort -t $'\t' -k1,1 -k2,2n | bedtools merge > genes.bed

get the complement bed with bedtools complement

extract the bam:

samtools view -L intergenic in.bam
ADD COMMENT
0
Entering edit mode

Pretty sure bedtools needs -i - to read from stdin ;-)

ADD REPLY
0
Entering edit mode

I don't thing 'merge' needs it.

ADD REPLY
0
Entering edit mode

Thank you so much. I tried your code with minor modifications and I think it worked perfectly. If anyone ever has the same problem, I used the following:

1-Convert genes.gtf to genes.bed

awk -F '\t' '($3=="gene") {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' genes.gtf | sort -t $'\t' -k1,1 -k2,2n | bedtools merge > genes.bed

2-Extract intergenic regions from genes.bed

bedtools complement -i genes.bed -g file.genome > intergenic.bed

file.genome is a file containing the length of the chromosomes/contigs

3-Output alignments in a bam file overlaping the intergenic.bed file

samtools view -b -L intergenic.bed input.bam | samtools sort > intergenic.bam
ADD REPLY
0
Entering edit mode

I don't think you need samtools sort if input.bam was already sorted.

Please check the green mark on the left to validate and close the quesion.

ADD REPLY
0
Entering edit mode

Yes, you are right, my input bam was already sorted. I supposed samtools sort was not necessary, but just in case I added it. Green mark checked. Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 2252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6