Genbank format downloading from jbrowse1/2

lybCNU · January 22, 2023, 7:00am

As GBrowse is goinig to be retired January 2023, is there anyway to download directly genbank format from jbrowse? Genbank format can be edited with snapgene with gene feature. Manual adding feature to jbrowse sequencing waste too much time for me at present. It will be convenient if Jbrowse can download genbank format as well.

scottcain · February 3, 2023, 7:28pm

Hi, GenBank is a “janky” format and difficult to write correctly. I see that Snapgene supports GFF3 as well:

https://support.snapgene.com/hc/en-us/articles/10384245528212?input_string=gff3%3F

which JBrowse 1 will currently output and is a feature that is being worked on for JBrowse 2. Would that work?

Scott

lybCNU · February 4, 2023, 4:38am

Not Work. The coordinates are Inconsistent.
When I tried change coordinates with excel and save as a small gff, snapgene said my gff does not contain features.
Still not working. From snapgene link your provided, it seems that snapgene does not support join feature with gff3.
From my point of view, genebank format is irreplaceable.
Is it possible to output a most simplified genbank format like the link below?
https : / / cloud . tsinghua . edu . cn / f / a88b5244ef5643ddb5a0 / ? dl = 1

lybCNU · February 4, 2023, 5:24am

LOCUS       Exported                1699 bp DNA     linear   UNA 04-FEB-2023
FEATURES             Location/Qualifiers
     source          1..1699
     CDS             join(428..659,794..1099)
     CDS             complement(join(1278..1411,1515..1650))
ORIGIN
       1 ttaatttgaaatagtttccattttttgataataatgaaaagctgctgaaaaaatggtttggcagttagcaattccaggaattttttcgagataagccataaattttaaaattatggaaattgatttacgtgtgtttttttctaattctaaattttttggtgacgttttccacgttgatttatttatttttcgaacccccctttccctcaaccaaaatagtatttattcttcagtttcaatattgtcaaaaagctcgatgcccgagtattttgaatcttctgcgatttcaattagaagaaatgctgcaggaaacgacgttcaaaaggtaattgaaagcatttagaacatctcataaagatgatgtttcagaacaaagttcaaaattggcttcacagtgtgatcgagcgtctcaagtggtggagtcccggacgatgtcagcagctcttcgtcgagaatgagctcatcgagctatgctacagagctcgtgagcagttctggaaaaacaaagtgaagctagatgtacgtttagcgtatgagggattagcaattcattttctaataatttcagatcgaagctcctgtcaaaatctgtggagacattcacggacagttcgaggacttgatggctctgttcgagttgaatgggtggcctgaagagcataagtaagccgccaatttgaatttggattagtatatgttttcatttcagatatctctttcttggtgattatgttgaccgtggtccattctccattgaagtcatcacactcctcttcacctttcaaatattgatgcctgacaaagtcttccttcttcgaggaaaccacgaaagccgccccgtcaatatgcaatatggattttatctggaatgcaagaagcgctactcagtcgccttgtatgatgcatttcaacttgcattcaattgtatgccactgtgcgctgtcgtgagcaagaagatcatatgtatgcatggaggaatatctgaagatctgattgacttgacgtaagatctttttccaatttccttatgtacttcaacaaccaatttccagacaactcgaaaagattgatcgtccatttgatattccggacattggcgtcatctccgacttgacctgggctgatcccgacgagaaggtcttcggatatgccgattctccacgtggcgcgggacgttctttcggtccgaatgcggtcaagaagttccttcaaatgcacaacctggatctagtcgttcgtgcccatcaggtcgtcatggatggttatgaattctttgcggaccgccaacttgtcacagtcttctcggcaccatcatactgcggacaattcgacaatgctgctgccgtgatgaatgttgacgacaaattgctctgtactttcacaatcttccgcccggatttgaaagttggcgacttcaagaagaaggacaagtgatattttgatttatcgaaataaagcattttttgtaccgtcttgattttcaggttaggctcgaatcacgcgcgcctgcttctcgaccttaaaaatgcctccaggtacaccaggaggcgagcccgctaagcaagaattccagcgccttctcccttctctcccgcttcctgagaatattgatgacataatcggtattctttttgtgtgtgcctgtatccattattcacgcacacaagaacaccaacaagcatgctggttttcttatata
//

scottcain · February 6, 2023, 11:54pm

I am working on a plugin or other method of generating that format; I’ll let you know what I come up with.

scottcain · February 7, 2023, 6:23pm

I am thinking about multiple approaches to this problem; while what I want is a JBrowse plugin (for either JB1 or JB2) that downloads in the format you want, a quicker solution for me to implement would be one where you get downloads from the gene and reference sequence tracks (GFF and FASTA) and then process those with a script to get GenBank format. My question for you is this: would you have a preference for a BioPerl based solution or a NodeJS based solution? Both would require you to install external software but if you already use one, I would go down that path. (I could probably have something sooner using BioPerl but if I do it in NodeJS it would probably find wider adoption).

lybCNU · February 8, 2023, 2:14am

Dear, scott. Thank you very much for your help with this matter.
For me personally, both BioPerl and NodeJS is ok. Actually, everyone else in our lab have no background in bioinformatics and both of them need this feature. It will be hard to setup more than 10 windows computer and make it work for everyone. A plugin would be a better way.
Thanks again to you

scottcain · February 10, 2023, 9:51pm

As I was thinking about cobbling together a solution for JBrowse 1 and chatting with the JBrowse lead developer and he thought he could get something working for JBrowse 2, and there is already a working prototype. Please take a look at JBrowse and click on the three vertical dots in the track label to get the menu that has “save track data” as an option. Obviously, this is alpha software and I am reasonably sure it’s getting the coordinates right, but checking output is a good idea! Please let me know what you think.

lybCNU · February 11, 2023, 1:55am

Hi, Scott. Thank you so much for you and jbrowse2 developer to develop this prototype so quickly.
At first not worked, and than I found that I used the gene model historical track and getting the wrong genbank.
The prototypes worked very well in the Curated Genes track.
Thank you very much for your efforts.

lybCNU · February 11, 2023, 2:23am

A very small suggestion, all downloaded files are named as jbrowse_track_data. To reduce filename conflict and confusion, It is better to use coordinates as filename like gbrowse do.

scottcain · February 13, 2023, 9:02pm

Thank you for the push–this was something that was on my radar for a while but having a user saying it’s important helps to make things happen!

I’ll pass on the suggestion to modify the file names; that’s a good one too!

lybCNU · February 20, 2023, 7:32am

     mRNA            complement(13673..15502)
                     /gene="gene:Cnig_chr_X.g24897"
                     /name=transcript:Cnig_chr_X.g24897
                     /id="transcript:Cnig_chr_X.g24897"
                     /info="method:InterPro accession:IPR013750 description:GHMP kinase, C-terminal domain 
method:InterPro accession:IPR014721 description:Ribosomal protein S5 domain 2-type fold, subgroup 
method:InterPro accession:IPR015192 description:Switch protein XOL-1, N-terminal 
method:InterPro accession:IPR015193 description:Switch protein XOL-1, GHMP-like 
method:InterPro accession:IPR020568 description:Ribosomal protein S5 domain 2-type fold"
                     /jbrowse_parent="gene:Cnig_chr_X.g24897"
                     /Name="Cnig_chr_X.g24897"
     CDS             complement(join(15426..15502,15288..15369,15060..15242,14642..14750,14435..14594,14020..14389,13673..13972))
                     /mRNA="transcript:Cnig_chr_X.g24897"

I found a bug. The CDS feature is not recognized in the above genebank. This error may originate from long multiple lines info in mRNA feature.

scottcain · February 23, 2023, 12:10am

Interesting, if you manually take out the carriage returns in the “info” does it then work? I’m trying to figure out what we need to do generally, since that info section can frequently be quite long.

lybCNU · February 23, 2023, 1:45am

     mRNA            complement(13673..15502)
                     /gene="gene:Cnig_chr_X.g24897"
                     /name=transcript:Cnig_chr_X.g24897
                     /id="transcript:Cnig_chr_X.g24897"
                     /info="method:InterPro accession:IPR013750 description:GHMP kinase, C-terminal domain 
                     method:InterPro accession:IPR014721 description:Ribosomal protein S5 domain 2-type fold, subgroup 
                     method:InterPro accession:IPR015192 description:Switch protein XOL-1, N-terminal 
                     method:InterPro accession:IPR015193 description:Switch protein XOL-1, GHMP-like 
                     method:InterPro accession:IPR020568 description:Ribosomal protein S5 domain 2-type fold"
                     /jbrowse_parent="gene:Cnig_chr_X.g24897"
                     /Name="Cnig_chr_X.g24897"
     CDS             complement(join(15426..15502,15288..15369,15060..15242,14642..14750,14435..14594,14020..14389,13673..13972))
                     /mRNA="transcript:Cnig_chr_X.g24897"

The above format worked.