bedtools usage to extract the non overlaps between two gtfs
0
0
Entering edit mode
4.0 years ago
newbie ▴ 120

From one of my analysis, I have found some novel lncRNAs, which are not annotated in Gencode and they are in a gtf file which looks like below:

My gtf [example]:

chr17   StringTie       transcript      49187581        49191235        1000    +       .       gene_id "MSTRG.100038"; transcript_id "MSTRG.100038.1";  class_code "u"; transcript_length "1188"; lncRNA_type "LincRNA"; 
chr17   StringTie       exon    49187581        49187711        1000    +       .       gene_id "MSTRG.100038"; transcript_id "MSTRG.100038.1"; exon_number "1";  class_code "u"; transcript_length "1188"; lncRNA_type "LincRNA"; 
chr17   StringTie       exon    49190179        49191235        1000    +       .       gene_id "MSTRG.100038"; transcript_id "MSTRG.100038.1"; exon_number "2";  class_code "u"; transcript_length "1188"; lncRNA_type "LincRNA"; 
chr17   StringTie       transcript      49479713        49480376        1000    -       .       gene_id "MSTRG.100058"; transcript_id "MSTRG.100058.1";  class_code "u"; transcript_length "664"; lncRNA_type "LincRNA"; 
chr17   StringTie       exon    49479713        49480376        1000    -       .       gene_id "MSTRG.100058"; transcript_id "MSTRG.100058.1"; exon_number "1";  class_code "u"; transcript_length "664"; lncRNA_type "LincRNA"; 
chr17   StringTie       transcript      47869876        47875390        1000    -       .       gene_id "MSTRG.100064"; transcript_id "MSTRG.100064.9";  class_code "u"; transcript_length "5364"; lncRNA_type "LincRNA"; 
chr17   StringTie       exon    47869876        47873933        1000    -       .       gene_id "MSTRG.100064"; transcript_id "MSTRG.100064.9"; exon_number "1";  class_code "u"; transcript_length "5364"; lncRNA_type "LincRNA";

And I downloaded the mitranscriptome.gtf from here Mitranscriptome and below I'm showing some example from the gtf:

chr1    mitranscriptome transcript      11017   15297   1000.0  -       .       tcat "pseudogene"; gene_id "G000001"; tss_id "TSS000001"; uce "FALSE"; transcript_id "T000001"; tstatus "annotated"; t
genic "NA"; func_name_final "NA";
chr1    mitranscriptome transcript      11017   29382   1000.0  -       .       tcat "pseudogene"; gene_id "G000001"; tss_id "TSS000002"; uce "FALSE"; transcript_id "T000002"; tstatus "annotated"; t
genic "NA"; func_name_final "NA";
chr1    mitranscriptome exon    11017   11526   1000.0  -       .       exon_number "0"; tcat "pseudogene"; gene_id "G000001"; tss_id "TSS000001"; uce "FALSE"; transcript_id "T000001"; tstatus "anno
tated"; tgenic "NA"; func_name_final "NA";
chr1    mitranscriptome exon    11017   11526   1000.0  -       .       exon_number "0"; tcat "pseudogene"; gene_id "G000001"; tss_id "TSS000002"; uce "FALSE"; transcript_id "T000002"; tstatus "anno
tated"; tgenic "NA"; func_name_final "NA";
chr1    mitranscriptome transcript      11993   13957   1000.0  +       .       tcat "pseudogene"; gene_id "G000002"; tss_id "TSS000003"; uce "FALSE"; transcript_id "T000003"; tstatus "annotated"; t
genic "NA"; func_name_final "NA";
chr1    mitranscriptome exon    11993   12227   1000.0  +       .       exon_number "0"; tcat "pseudogene"; gene_id "G000002"; tss_id "TSS000003"; uce "FALSE"; transcript_id "T000003"; tstatus "annotated"; tgenic "NA"; func_name_final "NA";
chr1    mitranscriptome exon    12613   12721   1000.0  +       .       exon_number "1"; tcat "pseudogene"; gene_id "G000002"; tss_id "TSS000003"; uce "FALSE"; transcript_id "T000003"; tstatus "annotated"; tgenic "NA"; func_name_final "NA";

I would like to overlap my gtf with lncRNAs I found from my analysis with mitranscriptome gtf file and find the real novel lncRNAs which are not found in mitranscriptome.

For this I did like below:

bedtools intersect -v -b mitranscriptome.v2.gtf -a myAnalysis.lncRNAs.unique.gtf > myAnalysis.lncRNAs.unique.NOT.IN.MITRANSCRIPTOME.gtf

Is the above usage of betools intersect right way to get the novel one?

gtf bedtools intersect mitranscriptome lncrna • 938 views
ADD COMMENT

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6