Online tools

Legacy versions

Online protein annotation tool

Mercator4 is an online tool to assign functional annotations to protein sequences of land plants (including flowering plants, ferns, horsetails, mosses, liverworts, and hornworts). Mercator4 can also annotate highly conserved proteins among the green algae groups of Archaeplastida. The results from user-submitted protein sequences can be visualized online and downloaded for further analysis.

The Mercator4 functional annotations are designed as a hierarchical framework (called "Mapman4 framework", see figure), with each child node term being more specialised than its parent node term. The framework has thirty top-level categories which end with the protein categories at the leaf-level. Protein sequences are only assigned to leaf-level categories but the annotation is based on the full hierarchical path including all levels.

A protein's context and category is depicted as a hierarchical number. The first number of the hierarchy refers to one of the thirty Mercator4 top-level categories (see list in top figure). Protein sequences which cannot be categorized by Mercator4, are assigned by default to the top-level protein pseudo-category 35 not assigned (introduced in the former MapMan framework, Thimm et al. 2004). For a standard plant proteome, approximately 55% to 60% of the predicted protein sequences can currently be categorized by Mercator4.

Optionally, the protein annotation tool ProtScriber v.0.1.3 (Eiteneuer and Hallab, unpublished, available on GitHub) can try to assign a functional annotation to the protein. Another option is a local alignment tool (Blast) that assigns a protein annotation from Swiss-Prot to the protein. For an average plant proteome, a ProtScriber or a Swiss-Prot annotation is available for more than 60% of the proteins, but in general the annotations are less specific. If an appropiate ProtScriber or Swiss-Prot annotation is available, the hierarchical number for the protein pseudo-category is 35.1 not assigned.annotated. If there is no Mercator4 categorization, no ProtScriber and no Swiss-Prot annotation available, a protein sequence gets assigned to the pseudo-category 35.2 not assigned.not annotated.

Job submission

On the Mercator4 job submission website:

After submitting the sequences, the Mercator4 website keeps you informed about the status of your job. In general, the functional annotation of a couple of thousands of plant protein sequences takes only a few minutes. The Mercator4 annotation is faster without the ProtScriber or Swiss-Prot annotation options.

Protein function annotation results

When a Mercator4 job finishes, a list gives a simple statistics on how many of the protein sequences were successfully categorized. The list uses the terms

Additionally, a bar chart summarising the protein assignments across the top-level context descriptions is diplayed. Each bar represents a top-level context description and the percentage of its protein categories occupied by at least one protein from the submitted protein sequences.
The annotations can be downloaded for further processing on your local computer,

TreeViewer

The TreeViewer shows the protein categorization visualized as hierarchical tree with annotation context descriptions as branch nodes and protein categories as leaf nodes. Expanding the tree diagram at a location of interest displays the protein count per species per category. A mouse-over on a count tab pops up the individual names of the categorized proteins.

HeatmapViewer

The HeatmapViewer displays the comparison of two protein sets with protein categories as spots colored according to the comparison outcome.

The color of a spot indicates whether the protein category is available in one or both protein sets and whether one of the protein sets has more or less proteins assigned to that protein category. A mouse-over on a spot pops up the description of the protein category and its context. The Heatmap Viewer creates diagrams in Scalable Vector Graphics (SVG) format which conveniently can be downloaded with a browser that has an add-on for SVG export installed (for example the add-on SVG Export).

Updates and legacy versions

The hierarchical framework for Mercator4 is regularly updated and extended (see "History"). The latest version of Mercator4 is release 5 (2022) with more than 5700 individual protein family categories. Although it is recommended to use the latest version of Mercator4, it is possible to submit sequences to legacy versions.

Mercator v.3.6 is an older release of the context-based annotation approach based on a different annotation framework (Lohse et al. 2014, Thimm et al. 2004). The Mercator v.3.6 online tool is still available but any active maintenance has ended.

Publications

current version

  • Mercator4 v.5 (July 2022)

    • updated: annotation framework
      • BIN-01..BIN-28 + BIN-30 consist of 7544 nodes
        • 29 top-level context nodes
        • 1782 context nodes
        • 5733 protein categories

previous versions

  • Mercator4 v.4 (October 2021)

    • added: HeatmapView online tool
    • added: new top-level context (BIN-28 "Plant reproduction")
    • updated: annotation framework
      • BIN-01..BIN-28 + BIN-30 consist of 6897 nodes
        • 29 top-level context nodes
        • 1667 context nodes
        • 5201 protein categories
  • Mercator4 v.3 (July 2020)

    • added: download of FASTA-formatted annotation data
    • added: new top-level context (BIN-30 "Clade-specific metabolism")
    • updated: annotation framework
      • BIN-01..BIN-27 + BIN-30 consist of 6420 nodes
        • 28 top-level context nodes
        • 1573 context nodes
        • 4819 protein categories
  • Mercator4 v.2 (July 2019)

    • updated: annotation framework
      • BIN-01..BIN-27 consist of 5934 nodes
        • 27 top-level context nodes
        • 1456 context nodes
        • 4450 protein categories
  • Mercator4 v.1 (May 2018)

    • added: new (low-specific) top-level context (BIN-50 "Enzyme classification")
      • 7 context nodes
      • 50 EC enzyme families
    • updated: annotation framework
      • BIN-01..BIN-27 consist of 5427 nodes
        • 27 top-level context nodes
        • 1304 context nodes
        • 4095 protein categories
  • Mercator4 beta v.0.3 (November 2017)

    • updated: annotation framework
      • BIN-01..BIN-27 consist of 5132 nodes
        • 27 top-level context nodes
        • 1222 context nodes
        • 3883 protein categories

Can I submit a FASTA-formatted file containing both DNA and protein sequences ?

No, this will result in an error. FASTA-formatted files submitted to Mercator4 must be exclusively DNA or Protein sequences.

Very few of my sequences are assigned to functional BINs. Why ?

You should verify that you have selected the correct sequence type (DNA or Protein) before submitting the Mercator4 job. If you submit DNA sequences, but specified the 'Sequence type' as 'Protein', an error will be not be generated, but very few sequences are likely to be assigned to functional BINs. If you are sure that you have selected the correct sequence type and the type is 'DNA', verify that you have submitted gene sequences without introns (introns must be removed). Mercator4 is designed for land plants: if you submit sequences from non-plant organisms, the classification and annotation rate will likely be low.

I get an error that my sequences are incompatible with Mercator4 - what can I do ?

Mercator4 has been upgraded to accept a number of ambiguous protein sequences. However, there are still certain criteria which a sequence has to meet. To validate your sequence for Mercator4, you can run your fasta file on the 'Mercator4 Fasta Validator' tool which will give a detailed report including the possibility to generate a 'Mercator4 valid' fasta file with the offending records removed.

I get an Unknown Error / Internal Error / Server Error when running my job. What does this mean ?

We try to handle every error scenario and provide a detailed description why the job failed. If you experience such an error, please send an email to plabipd@gmail.com with the 'JOB ID' (starts with GFA-XXXXXXXX).

I ran my sequences on Mercator4 six months ago, but now the version has changed. Can I run my sequences against an older version?

Yes. We provide a 'Legacy Mercator' tool which will allow users to run against older versions of Mercator4.

My Job has been queued for hours. Is it really running ?

The HPC cluster is capable of running many jobs in parallel, but can still be overpowered if many users submit jobs simultaneously. If your job has been queued for hours, submitting the same jobs again will not speed up the process. If your job has not completed after 4 hours, then you should contact us at plabipd@gmail.com providing us with the 'JOB ID' .

My browser crashed while running a job, and now I cannot access my job any more. What can I do ?

As we do not require users to login to submit a job, the only way we have to track your job is using a 'browser session'. If your browser has crashed, then a new session is created and the link to your jobs is lost. However, if you entered a email address when you submitted the job, you will still be notified (along with a link to the results) when the job has finished. If you did not enter an email address, but have taken a note of the 'JOB ID', then you can email us at plabipd@gmail.com to get the results.

© 2022 Usadel lab, IBG-4, Forschungszentrum Jülich / Heinrich Heine University Düsseldorf