Skip to contents

To use the camalienr package you will need to be able to connect to the database with the data.

The camalienr package includes functions for connecting and disconnecting from the database, but they require a couple of environment variables are set in the .REnviron file. The easiest way to work with .REnviron is to call usethis::edit_r_environ() from RStudio. This will open the file for editing. The file might be empty at first but that’s fine. The camalienr::ca_connect() function expects to find the following two variables:

CAMALIEN_USER=<YOUR-USER>
CAMALIEN_PASSWORD=<YOUR-PWD>

Database name, host and port are all set via ca_connect() so you don’t have to worry about those. Once the environment variables are set (and you have restarted your R session) you are ready to use the camalienr package. Connecting to the database is now a simple function call:

library(camalienr)
#> camalienr v0.3.6
con <- ca_connect()

Notice that we store the connection as an object so we can reuse it. Now that we have the connection object we supply that to any one of the ca_get family of functions from camalienr. See e.g. ca_get_image()

con |> 
  ca_get_image()
#> # Source:   table<image> [?? x 10]
#> # Database: postgres  [au206907@ecos-postgis-prod.cluster-cjqc9gl8vdbe.eu-north-1.rds.amazonaws.com:5432/camalien]
#>    id               chunkid imagemetaid created             filename path  type 
#>    <chr>            <chr>   <chr>       <dttm>              <chr>    <chr> <chr>
#>  1 0d6827f0-7f85-4… 5145ab… 7b5e5abf-2… 2023-07-11 09:14:21 CT_2023… raw/… tiff 
#>  2 1124cb2f-1eb3-4… 5145ab… 7b5e5abf-2… 2023-07-11 09:14:21 CT_2023… raw/… jpg  
#>  3 cb6ca2a8-3996-4… 5145ab… de1405f1-a… 2023-07-11 09:14:21 CT_2023… raw/… tiff 
#>  4 64174680-e125-4… 5145ab… de1405f1-a… 2023-07-11 09:14:21 CT_2023… raw/… jpg  
#>  5 29c746a0-686f-4… 5145ab… 45137bcc-7… 2023-07-11 09:14:21 CT_2023… raw/… jpg  
#>  6 538d0d76-7491-4… 5145ab… 2de384a7-e… 2023-07-11 09:14:21 CT_2023… raw/… tiff 
#>  7 ec283755-61d9-4… 5145ab… 2de384a7-e… 2023-07-11 09:14:21 CT_2023… raw/… jpg  
#>  8 632c038b-42e6-4… 5145ab… 772bef87-d… 2023-07-11 09:14:21 CT_2023… raw/… tiff 
#>  9 869740ca-e97b-4… 5145ab… 772bef87-d… 2023-07-11 09:14:21 CT_2023… raw/… jpg  
#> 10 3bb4222b-0d28-4… 5145ab… 1b5ac0b3-8… 2023-07-11 09:14:21 CT_2023… raw/… tiff 
#> # ℹ more rows
#> # ℹ 3 more variables: md5expected <chr>, md5actual <chr>, state <chr>

The ca_get functions all return lazy tables which means that although the tables on the database potentially contain millions of rows we only download a few rows at first. This is great because we can then build a pipeline of filtering and selection functions to be sent to the database as a query before the table is downloaded to our session.

Database structure

In the diagram below we show the tables and their relationships. Only key columns are shown for each table.

Darker green are the tables relating to calls to and from Plantnet, ligther green are tables relating to the images as such

The most relevant tables for a consumer of the data all have a ca_get function.