Wikibotanica

About Wikibotanica

This website will provide straight-forward identification for plants worldwide. Plants will be recorded with validated, consistent characteristics to enable searches across all plants in the database. The goal is for a casual user to quickly and easily identify the plant in front of them.

This initial phase aims to establish a standard way to store plant characteristics, to experiment with a search algorithm, populate the database and fix up the site formatting. I have started with readily-identified garden plants for the initial entries – it takes me about 1-3 hours to collate data for each plant (I'm not a botanist, so I'm learning). Surprisingly, it is often difficult to find detailed descriptions for common plants; for example Clivia (Clivia miniata) has no morphological descriptions in any of the floras that I can find online (well, I did find it! - in the South Africa e-Flora, stored in the Darwin Core Archive format and not directly readable.)

Currently, the site displays entries using formal botanical terms which is how the data is recorded. Once the initial proof of concept is completed, these terms will be translated to normal English with illustrations for general use, while retaining the actual terms for those skilled in the art. The site then will accept entries from the public – effectively becoming a structured wiki – with specialists to oversee plant records (similar to editors on wikipedia.org). I hope it will become useful for biologists, schools and anyone wanting to identify and work with plants.

You are welcome to contact me via email: phil (AT) wikibotanica.net

Plant Data Structure

When starting out to create a plant identification system and website, I soon found there is no standard way to organise plant data on a computer. Yes, there are websites such as https://worldfloraonline.org, https://powl.org, https://gbif.org as well as (proprietary) systems such as Lucid, BRAHAMS, etc. but the data is inconsistently presented or missing and not interoperable.

Botanical descriptions are most often written in unstructured and arbitrary text. This is understandable as we are trying to reduce the endless variability of nature to our rigid categories. Here is part of the leaf description of “old man banksia”, Banksia serrata, from Flora of Australia, illustrating the problem:

          [leaf blade] broadly oblong to narrowly obovate, 7–22 cm long, usually 2–4 cm wide, 
          truncate, serrate; margins slightly recurved; both surfaces tomentose, glabrescent 
          except pits in lower surface
     
The same from PlantNet of the NSW Royal Botanic Gardens:
          [leaf blade] oblong to narrow-obovate, 5–20 cm long, 15–40 mm wide, apex truncate but 
          with a short mucro, base attenuate, margins ± toothed but entire for 1–5 c. from base, 
          lower surface rusty-tomentose but becoming ± glabrous
     

And, just to rub it in, here is the Flora of Victoria description:

          [leaf blade] broad-oblong to narrowly obovate, mostly 10–20 cm long, 1–4.5 cm wide, not 
          or slightly discolorous, upper surface green, shiny, lower surface at first tomentose; 
          margins serrate except near base, slightly recurved, apex obtuse to truncate, mucronate;
     
These recognisably describe the same leaves, but they also contain a lot of varied expression and assumed contextual knowledge, making it very difficult if not impossible to parse with a computer.

With a professional background in computer systems engineering and a BSc in Agricultural Entomology, I've brought a technical de novo approach to the problem, excluding botanical conventions not relevant to design to build a data-focused and ultimately accessible system.

Overall, I try to follow the physical structure of plants to define the data structure – function follows form? Each plant becomes an object (programmatically a nested dictionary) containing all its characteristics. For those familiar with plant ontology, this provides similar 'is_a', 'part_of' relations but is expanded for plant descriptions. A representative set of of designated characteristics and attributes are enforced for each plant to ensure consistency.

Here is a rough merging of the above descriptions into this new format:

          stem.leaf.type: simple
          stem.leaf.shape: oblong, obovate
          stem.leaf.blade_length: 5-22cm
          stem.leaf.blade_width: 1.5-4cm
          stem.leaf.apex: obtuse,truncate,mucronate
          stem.leaf.base: attenuate
          stem.leaf.margin: serrate
          stem.leaf.profile: recurved
          stem.leaf.adaxial_surface: lustrous
          stem.leaf.adaxial_surface.colour: green
          stem.leaf.adaxial_surface.indumentum.hairs: tomentose,glabrate
          stem.leaf.adaxial_surface.indumentum.colour: rusty
          stem.leaf.abaxial_surface: foveate
          stem.leaf.abaxial_surface.colour: green, lighter_than_adaxial
          stem.leaf.abaxial_surface.indumentum.hairs: tomentose,glabrate
          stem.leaf.abaxial_surface.indumentum.colour: rusty
     
This allows a consistent basis for comparisions and queries across all plant groups. The plants currently in the database each have around 60 to over 300 characteristics recorded, depending on their complexity and the completeness of the data I've been able to find and observe.

The structure is designed to be readily extended: genetic, biochemical, horticultural, ecological and other types of data can be incorporated.

About Copyright

As everyone should know, facts cannot be copyrighted, only their specific expression. So, I've collated many facts into this system from any number of sources (including my own observations), the most important of which are nominated in the references.

The glossary has many entries directly copied from Alison McAlister's work which forms the basis of the glossary used by Flora Austalia and regularly used ideas from Prof. Peter Stevens' APW glossary. I am very grateful for McAlister and the Australian Government for making the glossary work freely available, for Prof. Stevens, whose APW has been so helpful and for Henk Beentje for his excellent glossary from which I've occasionally cribbed a phrase that most clearly expresses a term meaning.

So far, I intend that everything published on the website to be CC-BY-SA, which means it can be freely used with attribution and any derivative works must also be freely shared in the same manner. As the site develops this will be better formalised.