Creating a gene version of the Kallisto Gene Matrix: Perl
1
0
Entering edit mode
2.0 years ago
VenGeno ▴ 100

Hi, I am getting the following error when I use the perl script available at rnabio.org tutorial

Could not identify gene id from trans id: Soltu.Cru.S183680.1

This is the relvent part of the script

#Build a map of transcript to gene IDs
my %trans;
my $gtf_fh = IO::File->new($gtf_file, 'r');
while (my $gtf_line = $gtf_fh->getline) {
  chomp($gtf_line);
  my @gtf_entry = split("\t", $gtf_line);
  next unless $gtf_entry[2] eq 'transcript';
  my $g_id = '';
  my $t_id = '';
  if ($gtf_entry[8] =~ /gene_id\s+\"(\w+)\"/){
    $g_id = $1;
  }
  if ($gtf_entry[8] =~ /transcript_id\s+\"(\w+)\"/){
    $t_id = $1;
  }
  die "\n\nCould not identify gene and transcript id in GTF transcript line:\n$gtf_line\n\n" unless ($g_id && $t_id);
  $trans{$t_id}{g_id} = $g_id;
}
$gtf_fh->close;

The related entry in my GTF file appears as follows;

scaffold11494   MSU_Castle_Russet_v2.0  mRNA    8589    9215    .   +   .   gene_id "Soltu.Cru.S183680"; transcript_id "Soltu.Cru.S183680.1"; ID "Soltu.Cru.S183680.1"; Name "Soltu.Cru.S183680.1"; Parent "Soltu.Cru.S183680";

Can someone help me with this?

perl • 600 views
ADD COMMENT
2
Entering edit mode
2.0 years ago
JC 13k

the problem is the RegEx used, as \wmatches only alfanumeric and "_", your gene ids contains ".", just change /gene_id\s+\"(\w+)\"/ as /gene_id\s+\"(.+?)\"/ and /transcript_id\s+\"(\w+)\"/ to /transcript_id\s+\"(.+?)\"/

ADD COMMENT
0
Entering edit mode

Hi JC , Thank you for the answer.

ADD REPLY

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6