Question: Trinity, longest contigs
0
gravatar for loly.pearl86
3.1 years ago by
loly.pearl8620
Australia
loly.pearl8620 wrote:

Hi All,

Can some one explain to me how can i generate the longest and shortest contigs length ?? from this trinity assembly summery:

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 306290
Total trinity transcripts: 369174
Percent GC: 48.47

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 5172
Contig N20: 3581
Contig N30: 2517
Contig N40: 1717
Contig N50: 1095

Median contig length: 333
Average contig: 659.77
Total assembled bases: 243568232

#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

Contig N10: 4276
Contig N20: 2665
Contig N30: 1655
Contig N40: 1006
Contig N50: 664

Median contig length: 313
Average contig: 540.34
Total assembled bases: 165499422

 

Thanks

assembly • 1.6k views
ADD COMMENTlink modified 3.1 years ago by iraun3.5k • written 3.1 years ago by loly.pearl8620
1
gravatar for iraun
3.1 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

There are a lot of ways to do this. If you want a command line fast solution, this one can deal with your issue:

awk 'BEGIN{RS = ">" ; ORS = ""}NR==2{ min=length($2); max=length($2); next} max < length($2) {max=length($2)} min > length($2) {min=length($2)} END {print "Shortest: "min"\nLongest: " max"\n"}' file

 

Basically you need to extract the length of each sequence, and save the highest and lowest value. As I told you, there are a plenty ways to do it, in different languages, or different tools. In my opinion awk is great for this kind of situations.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by iraun3.5k

Thanks for u quick replay but i am not sure where can i run this script??! look like for Rstudio !!  i am working on linux using (Putty) SSH. So can i run this script on SHH command line? if no, then can you please suggest to me other way which i can use it in linux 

 

thanks

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by loly.pearl8620

Yes. You can do it from the command line. Just move to the folder where your fasta file is located, and copy paste the command replacing "file" with your file name.

ADD REPLYlink written 3.1 years ago by iraun3.5k

Thanks i did run it 

its give me this result: shortest = 0   that is all !!i

ADD REPLYlink written 3.1 years ago by loly.pearl8620

Sorry, I'd a little mistake. Can you try now?

ADD REPLYlink written 3.1 years ago by iraun3.5k

Thanks its work now .But the results its unusual

i have 5 assembled samples , and the outcome of the script was 

sample 1: shortest 1 longest 14357

Sample 2 , 3 ,4:    shortest= 7  longest= 9

Sample 5 shortest=6  longest =89   Is this mean any things like my samples quality are poor or low quality??!!!

 

 

thanks

 

ADD REPLYlink written 3.1 years ago by loly.pearl8620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour