Question: Difference of transcripts between my GFF file and the IGV results
0
gravatar for pablo
11 weeks ago by
pablo160
pablo160 wrote:

Hi,

I aligned my transcripts reads generated by the Isoseq tool, against my reference. Then, I generated a GFF file.

When I look at the number of transcripts in the GFF file and with IGV, there is a difference. For example, I focus on a specific scaffold (named Super-Scaffold_100047) of the reference.

GFF file :

cat out.gff | grep "Super-Scaffold_100047" | awk '{print $3}' | grep "transcript" | wc -l

21 transcripts

IGV :

I count 25 transcripts as you can see on the image.

IGV

I share the GFF file corresponding to the scaffold in question (only the beginning because of the size of the file ) :

Super-Scaffold_100047   PacBio  transcript      281156  287937  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    281156  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  exon    287372  287937  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.1";
Super-Scaffold_100047   PacBio  transcript      281168  287895  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    281168  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  exon    287372  287895  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.2";
Super-Scaffold_100047   PacBio  transcript      281217  288458  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    281217  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  exon    287372  288458  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.3";
Super-Scaffold_100047   PacBio  transcript      281287  288086  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    281287  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  exon    287372  288086  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.4";
Super-Scaffold_100047   PacBio  transcript      281544  288081  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    281544  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    284686  284805  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  exon    287372  288081  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.5";
Super-Scaffold_100047   PacBio  transcript      281590  286734  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    281590  281855  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    282017  282094  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    282323  282446  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    283108  283380  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  exon    284686  286734  .       +       .       gene_id "PB.8691"; transcript_id "PB.8691.6";
Super-Scaffold_100047   PacBio  transcript      220944  223787  .       -       .       gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047   PacBio  exon    220944  223787  .       -       .       gene_id "PB.8692"; transcript_id "PB.8692.1";
Super-Scaffold_100047   PacBio  transcript      311770  318491  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    311770  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317520  317738  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    317816  318154  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  exon    318223  318491  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.1";
Super-Scaffold_100047   PacBio  transcript      311770  318501  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    311770  312050  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    312246  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317520  317738  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    317816  318154  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  exon    318223  318501  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.2";
Super-Scaffold_100047   PacBio  transcript      311770  316826  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    311770  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  exon    316582  316826  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.3";
Super-Scaffold_100047   PacBio  transcript      311771  317638  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    311771  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316280  316494  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    316582  316945  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    317141  317314  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  exon    317520  317638  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.4";
Super-Scaffold_100047   PacBio  transcript      311771  316452  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    311771  312629  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    312770  312942  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    314316  314634  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    314763  315028  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    315299  315530  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    315681  315995  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    316110  316180  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";
Super-Scaffold_100047   PacBio  exon    316280  316452  .       -       .       gene_id "PB.8693"; transcript_id "PB.8693.5";

Do you have an explanation?

Best

igv transcripts gff • 173 views
ADD COMMENTlink modified 10 weeks ago • written 11 weeks ago by pablo160

These these are giant sequences perhaps there is a secondary alignment with one or two of reads=transcripts?

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by GenoMax95k

Actually, these are full length transcripts obtained from PacBio sequencing. That's why they are giant.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by pablo160

Is there a secondary alignment with one or two of them? Leading to two extra alignments you see.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by GenoMax95k

There's not. I check each transcript : they are all unique and align only once on the genome.

Also, when I count the number of transcripts specific to that scaffold in my alignment.bam file, I got 25 sequences.

Do you know if there is a way to count the number of alignments with IGV? I could verify with other scaffols but there are much more alignments , so boring to count..

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by pablo160

IGV is only a viewer. You can count number of alignments easily with samtools idxstats.

ADD REPLYlink written 11 weeks ago by GenoMax95k

Hi,

Could you provide the mentioned GFF file?

ADD REPLYlink written 11 weeks ago by Arsenal20
1

I updated my post as you can see with the mentioned GFF file.

ADD REPLYlink written 10 weeks ago by pablo160

Maybe identical isoforms that are collapsed by IGV. Could you run your gff file through AGAT to check if there is any identical isoform?:

agat_convert_sp_gxf2gxf.pl --gff input.gff -o output.gff
ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Juke345.0k

I will install that tools. Actually, when I run samtools inxstats my_alignment.bam , I find the right number of reads/isoforms for the scaffold in question Super-Scaffold_100047 379068 25 0 . Then, the problem comes from the GFF file which is bad created.

ADD REPLYlink written 10 weeks ago by pablo160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1163 users visited in the last hour
_