To use the camalienr package you will need to be able to connect to the database with the data.
The camalienr package includes functions for connecting and
disconnecting from the database, but they require a couple of
environment variables are set in the .REnviron file. The
easiest way to work with .REnviron is to call
usethis::edit_r_environ() from RStudio. This will open the
file for editing. The file might be empty at first but that’s fine. The
camalienr::ca_connect() function expects to find the
following two variables:
CAMALIEN_USER=<YOUR-USER>
CAMALIEN_PASSWORD=<YOUR-PWD>
Database name, host and port are all set via
ca_connect() so you don’t have to worry about those. Once
the environment variables are set (and you have restarted your R
session) you are ready to use the camalienr package. Connecting to the
database is now a simple function call:
library(camalienr)
#> camalienr v0.3.6
con <- ca_connect()Notice that we store the connection as an object so we can reuse it.
Now that we have the connection object we supply that to any one of the
ca_get family of functions from camalienr. See
e.g. ca_get_image()
con |>
ca_get_image()
#> # Source: table<image> [?? x 10]
#> # Database: postgres [au206907@ecos-postgis-prod.cluster-cjqc9gl8vdbe.eu-north-1.rds.amazonaws.com:5432/camalien]
#> id chunkid imagemetaid created filename path type
#> <chr> <chr> <chr> <dttm> <chr> <chr> <chr>
#> 1 0d6827f0-7f85-4… 5145ab… 7b5e5abf-2… 2023-07-11 09:14:21 CT_2023… raw/… tiff
#> 2 1124cb2f-1eb3-4… 5145ab… 7b5e5abf-2… 2023-07-11 09:14:21 CT_2023… raw/… jpg
#> 3 cb6ca2a8-3996-4… 5145ab… de1405f1-a… 2023-07-11 09:14:21 CT_2023… raw/… tiff
#> 4 64174680-e125-4… 5145ab… de1405f1-a… 2023-07-11 09:14:21 CT_2023… raw/… jpg
#> 5 29c746a0-686f-4… 5145ab… 45137bcc-7… 2023-07-11 09:14:21 CT_2023… raw/… jpg
#> 6 538d0d76-7491-4… 5145ab… 2de384a7-e… 2023-07-11 09:14:21 CT_2023… raw/… tiff
#> 7 ec283755-61d9-4… 5145ab… 2de384a7-e… 2023-07-11 09:14:21 CT_2023… raw/… jpg
#> 8 632c038b-42e6-4… 5145ab… 772bef87-d… 2023-07-11 09:14:21 CT_2023… raw/… tiff
#> 9 869740ca-e97b-4… 5145ab… 772bef87-d… 2023-07-11 09:14:21 CT_2023… raw/… jpg
#> 10 3bb4222b-0d28-4… 5145ab… 1b5ac0b3-8… 2023-07-11 09:14:21 CT_2023… raw/… tiff
#> # ℹ more rows
#> # ℹ 3 more variables: md5expected <chr>, md5actual <chr>, state <chr>The ca_get functions all return lazy tables which means
that although the tables on the database potentially contain millions of
rows we only download a few rows at first. This is great because we can
then build a pipeline of filtering and selection functions to be sent to
the database as a query before the table is downloaded to our
session.
Database structure
In the diagram below we show the tables and their relationships. Only key columns are shown for each table.
Darker green are the tables relating to calls to and from Plantnet, ligther green are tables relating to the images as such
The most relevant tables for a consumer of the data all have a
ca_get function.