Fabio Marini

PostgreSQL and graph databases with Apache AGE, time for an update

Fabio Marini — Sun, 02 Apr 2023 10:21:23 GMT

It's been a while since I last posted about Apache AGE, and a lot of things happened since then. Let's have a look at the changes to the database itself and the developer experience.

What's new?

First of all, the project has now graduated from incubator to top-level Apache project, so you can expect it to have greater support from the Foundation, and less to worry about the possibility of it being abandoned anytime soon.

Second, the project now reached version 1.1, so it is now considered stable and usable for "serious projects". A lot of Open Source projects are considered to be production-ready way before reaching version 1.0, but you know, that little 1 in front of the version number is still something that can make you feel less anxious when relying on some tool for your project.

Getting started, with Docker, on an Apple Silicon Mac

When doing some experiments with a tool, be it an application or library, I usually prefer Docker instead of installing the tool directly on my Mac. This is just a way to keep my computer clean, and since I often need to have different versions of databases or other libraries inside my projects, so I prefer not to mess things up.

The official Docker image for Apache AGE is available on Docker Hub at the following link: https://hub.docker.com/r/apache/age.

If you have a look at the tags page, by the way, you can see that at the moment the images are all built for the AMD64 architecture. This doesn't mean that the application won't work on an Apple Silicon Mac, but that the performance is not the best possible, so as I did in my previous article, I'll show you how to build a custom image optimized for my machine.

Step 1: Clone the AGE repo locally

The first step to build the custom Docker image for my machine is to get a copy of the AGE repo locally, so let's clone it via git

git clone https://github.com/apache/age

The cloning process is quite fast since the repo is small, it just takes a few seconds depending on your Internet connection speed.

❯ git clone https://github.com/apache/age
Cloning into 'age'...
remote: Enumerating objects: 13026, done.
remote: Counting objects: 100% (1383/1383), done.
remote: Compressing objects: 100% (792/792), done.
remote: Total 13026 (delta 566), reused 1285 (delta 537), pack-reused 11643
Receiving objects: 100% (13026/13026), 36.17 MiB | 2.55 MiB/s, done.
Resolving deltas: 100% (4660/4660), done.

Now that we have a copy of the repo, let's move on to build the custom image.

Step 2: Build the Docker image

Inside the repo there's the official Dockerfile for Apache AGE, so let's see if we can build the image straight from it without any modification.

cd age
docker build -t apache/age .

The good news is that with the new release, the image builds correctly on Apple Silicon, without needing to edit the Dockerfile as I did in my previous post.

I now have my custom Apache AGE image built locally, let's try to run it to see if it works correctly.

Step 3: Run the locally built image

Now that the image is ready, let's run it to start playing with our graph database

docker run -it \
    -e POSTGRES_USER=age \
    -e POSTGRES_PASSWORD=developer \
    -e POSTGRES_DB=age \
    -p 5432:5432 \
    --name 'age-playground' \
    apache/age

In a couple of seconds, the instance starts correctly and PostgreSQL is ready to accept connections, so let's open our SQL editor and try to connect.

Next steps

Once the database is running you can connect and start playing with the SQL or graph part of it.

Please have a look at my previous article on Apache AGE for instructions on how to connect and create and query graphs.

PostgreSQL with Apache AGE - Playing more seriously with Graph Databases

Making something useful with a graph database, beyond the basic stuff. Creating multiple nodes and relationships and navigating the graph

Fabio MariniFabio Marini

Dirkeeper - a small utility to keep directories clean and move files around

Fabio Marini — Mon, 31 Oct 2022 18:29:42 GMT

Raise your hand if it happened to you too that an application working on a server for years ended up filling up the whole disk space blocking the entire server.

Too many files or no more space on disk

It is a pretty common problem if you work with applications that generate output on disk, that in one way or another, they also leave temporary files or logs spread across one or more directories. And sometimes you don't notice them until it's too late, because you already ended up with a directory with tens of thousands of small or empty files or with no space left on disk.

They are two faces of the same problem because a directory with too many files becomes impossible to navigate, and a disk with no space is the root of weird application behavior and crashes, sometimes making it even impossible to log into the server.

When a new file is generated, copy it over there

Another common issue I get, still related to files generated by applications is, as soon as a new file is generated, please copy it over there for that other application to process it. But not only that, if the file is named this way move it here, if it is named that other way move it there, and so on...

To handle this kind of requests without having to periodically check by hand all of the servers, and to play around with file handling in Go, I created Dirkeeper, a small command line utility that can be used directly or inside scripts to automate this kind of tasks.

Dirkeeper

GitHub - fabiomarini/dirkeeper: Dirkeeper - file handling utility

Dirkeeper - file handling utility. Contribute to fabiomarini/dirkeeper development by creating an account on GitHub.

GitHubfabiomarini

The tool is quite simple at the moment and has a few commands:

Directory management utilities

Usage:
  dirkeeper [command]

Available Commands:
  cleanold    clean old files
  completion  Generate the autocompletion script for the specified shell
  help        Help about any command
  match       match and process files
  watch       watch for new files and process them based on config rules

Flags:
  -h, --help   help for dirkeeper

Use "dirkeeper [command] --help" for more information about a command.

cleanold - clean old files

clean old files

Usage:
  dirkeeper cleanold [flags]

Flags:
  -d, --directory strings   List of directories to cleanup
      --dry-run             Only check for old files without deleting
  -h, --help                help for cleanold
      --max-age int         Maximum age of the file in days

The cleanold command is pretty simple, give it a comma-separated list of directories and the maximum number of days since the file creation, and dirkeeper will delete all the files older than the given age.

You can also specify the --dry-run flag to only check the matching files before deleting them.

match - execute a task on matching files

match and process files

Usage:
  dirkeeper match [flags]

Flags:
  -a, --action string      Action to execute
      --dest-dir string    Destination directory
  -d, --directory string   Base directory
      --dry-run            Do not execute action
  -h, --help               help for match
      --max-age int        Min file age in minutes
      --pattern strings    List of file name patterns
      --prefix strings     List of file name prefixes
      --suffix strings     List of file name suffixes

The match command checks one or more directories for files matching a given pattern and executes an action when a match is found.

The list of directories can be specified with the -d or --directory flag and is mandatory.

The matching rules can be specified using one or more of the --pattern --prefix and --suffix flags, where prefix and suffix accept a comma-separated list of prefixes and suffixes of the file name, while pattern accept a comma-separated list of regular expressions that the file name has to match.

In addition to these rules, a --max-age flag could be specified to indicate the minimum age of the file in minutes before executing the action. This could be useful when the file takes several seconds to be generated and prevent working on a partial file.

When one of the matching rules is triggered, then the action specified with the -a or --action flag is executed. Valid actions are copy, move, delete or copy-delete. The latter is useful when the system does not allow directly moving the file, like when working on remotely mounted directories.

watch - keep watching for new files and execute tasks on matching rule

watch for new files and process them based on config rules

Usage:
  dirkeeper watch [flags]

Flags:
  -c, --config string   Config file
      --debug           Enable debug log
      --frequency int   Watch frequency in seconds (default 10)
  -h, --help            help for watch

The watch command is similar to the match one, but runs periodically with a frequency in seconds specified by the --frequency flag.

The other main difference with the watch command is that it takes a config file as input so that it's not required to specify all the parameters on the command line.

The config file is in YAML format and an example is like the following:

watch:
  # Dry run indicates if the action should be executed or only logged
  dryRun: false
  # Can have a list of input directories to watch
  directories:
    # The path of the directory to watch
    - name: "/test/input"
      # The list of rules to apply
      rules:
        # The action to execute for every matching file, can be copy, move or delete
        - action: "move"
          pattern:
            # The list of pattern to match, as regular expressions on file name
            - "RY59A.*"
          prefix:
            # The list of prefixes to match
          suffix:
            # Th elist of suffixes to match
          # The destination directory for the copy or move actions
          destination: "/tmp/test/outputA"

        - action: "delete"
          pattern:
            # Regular expression of file name
            - "RY59B.*"
          destination: "/tmp/test/outputB"

Contributions and suggestions are welcome

I created this little utility for my customers, but it is open source and available on Github for anyone to use and extend.

If you find interesting additional commands or bugs you can open an issue directly on Github or contact me via Twitter @fabiomarininet.

Write Once Run Everywhere, is it still that the way to go?

Fabio Marini — Sat, 23 Jul 2022 17:21:15 GMT

Write Once Run Everywhere has always been the motto for the Java language and it first selling point, but is it still that important? Let me share some thoughts on the topic.

The Java programming language was created by Sun Microsystems back in late '90s as a language that tried to solve some of the complexities of software development, while at the same time preserving a syntax that could still be familiar for users coming from other popular languages like C and C++.

One build to rule them all

The main advantage at the time was the possibility to build the application only once to generate the output JAR or WAR file, and then being able to run it wherever a Java Virtual Machine is available instead of having to create different build environments for each operating system that needs to be supported.

This of course was a huge advantage for software developers that could simplify their build environments while also getting compatibility with operating systems that were not initially thought to be supported.

This latter point in fact was the main idea of Sun Microsystems, that created Java with the main goal of getting more software compatible with its hardware in return, so that it could be more appealing to potential customers, because the main business for Sun was in fact selling the hardware, not the software.

In the last twenty years, Java has become one of the main languages for building enterprise applications, where compatibility is a huge selling point and one of the main goals for software developers. The creation of frameworks like Spring and more recently Spring Boot then helped simplify many aspect of the web and enterprise application development giving to Java another big push.

Things change

With the advent of cloud computing the software development landscape has completely changed.

What used to be run on a bare metal dedicated server inside the company's datacenter, with overprovisioned resources and often a single application per machine, now needs to be run in a container inside a Kubernetes cluster in a datacenter somewhere around the world.

All while consuming the least possible resources to improve instance utilization and reduce costs and why not, with the possibility to scale down to zero instances or up as high as traffic requires.

The shift is so huge that now Java is not always the best choice for the task. And even in the enteprise world its position as the de-facto choice is starting to become less obvious.

The new Cloud Native requirements

Java is a nice language, easy and clean, compiles fast and is supported by every library, tool an service available. So why could it not the best option for the cloud? Because of its legacy. Let me explain.

The motto "Write Once Run Everywhere" come at a cost, the Java Virtual Machine and its system abstraction.

The JVM is a really powerful tool and it works wonderfully if your goal is compatibility, but its main focus on "Run Everywhere" introduced a lot of abstractions between the operating system and the application. Java also introduced the Garbage Collector to simplify the life of developers, and both of these choices is a tradeoff between performance and ease of use.

Java applications can be quite fast once running for a while, in fact tools like Cassandra and Kafka are built in Java, but the Just In Time compiler and the JVM both take some time to "warm up". This means that the time required to start the application and to reach top performance is not that low, and can range from seconds to tens of seconds or more. Not a problem for an always on system, but not the best if you need to continously spin up and stop instances.

In addition to this the Garbage Collector, and the memory model of the language, require a lot of RAM during runtime. In fact it is quite common to find even simple applications that take several GigaBytes of RAM after running for a while.

Now that microservices, containerization and serverless systems are becoming the norm, these metrics of startup time and memory requirements make Java "too expensive" for the "cloud native" world.

The contenders

In recent years other languages and frameworks like JavaScript / TypeScript with the advent of Node.js and more recently of Deno, or Go and Rust are gaining more and more traction on the developers because they challenge the two main drawbacks of Java described before.

JavaScript and TypeScript are becoming ubiquitous and are the lingua franca of the frontend development, while Go powers the backbone of the major cloud infrastructure tools like Docker, Kubernetes, Prometheus, or even database systems like InfluxDB and CockroachDB and these just to name a few.

Rust is the "most loved programming language" on StackOverflow for the last five years in a row and is becoming more and more popular and used by major cloud providers to power the foundations of their infrastructures, and it is also becoming the first choice language when working with WebAssembly.

None of them is the perfect choice for everything, and there is no such thing like "the perfect programming language". Each option has its own pros and cons and is more suitable for one or more contexts or projects.

Javascript is a dynamically typed language and it is also interpreted, so its performance and memory usage is not the best, but it can still be very fast when used with modern VM like V8 or JavaScriptEngine (as Bun has recently demonstated).

Even Go has a Garbage Collector, but the way the language works is way more efficient that the Java counterpart, and by compiling to native binary, it doesn't require a virtual machine. If compiled statically it also has zero external dependencies.

Rust is a powerful and very nice language, has great performance and optimizations and it also compile to static binaries with zero external dependencies. But it has a steep learning curve and even if things are improving, it is quite slow to compile.

Even with their cons, these three options are still serious contenders and oftwn way better choices for cloud projects if compared to Java.

The new landscape

As we saw there are now a lot of options to choose from when starting a new project, even an enterprise one, and the "run everywhere" thing took a different path, because in the last two decades Linux has become the de-facto operating system on servers so there is not anymore that need to support every operating system available. In fact some of the tools named before only run on Linux systems.

This means that the new requirements for a cloud native applications are now:

very fast startup time, so that they can be used in serverless or lambda platforms
low memory usage, to spin up more instances of the application on the same servers
(horizontal) scalability, to be able to react to spikes in traffic or to scale down to zero to reduce costs
resilience, to be able to resist to crashes and attacks

In addition to thiese requirements, the new direction in cloud development, and not only for that, must be Zero Trust, because security of data and personal information should always be one of the main focus of software and systems design.

That said Java is not going anywhere anytime soon, but it will probably lose some market share in the years to come. The language is moving toward native compilation, faster startup times and lower memory usage, but the plans to reach all of these goals still take a few years. And in the meantime new languages and tools will have probably widen the gap with Java.

The next years will tell which language will be the most "cloud native", in the meantime let's try to use the best tool among the available ones.

I recently moved to Go after more than twelve years of Java only projects and after the introduction of generics in version 1.18. I have to say that I quite love it. As many say Go is "easy to learn but hard to master" so the future will tell if I made the right choice.

PostgreSQL with Apache AGE - Playing more seriously with Graph Databases

Fabio Marini — Sun, 01 May 2022 10:20:14 GMT

In my initial post about PostgreSQL and Apache AGE I spent some time discussing the pros and cons of the technology and why it can be a great addition to the developer's toolbox. In this post, we will explore more features of the Apache AGE extension and the power of the OpenCypher query language.

Status of the technology

GitHub - apache/incubator-age: Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL. - GitHub - apache/incubator-age: Graph database optimized for fast analysis a...

GitHubapache

Before we start exploring the power of AGE let's talk about the status of the project. If you have a look at the GitHub project page, the first thing you notice from the URL is that we are approaching version 1.0.0 but the entire project is still in the Apache Incubator phase, so it's not yet a mainstream project.

Don't get me wrong, I'm not meaning you should be scared or keep away from it, you just need to be careful when using AGE for your projects, and remember that a young project is likely subject to big changes, often with little to no backward compatibility. This is happening in each release at the moment since the APIs are still getting stabilized and optimized, and when you upgrade for example from release 0.7 to 1.0.0 there is no real upgrade path.

To keep experimenting during this phase of heavy development, I prefer using the Docker image and moving my data in and out from one release to the other. Sure this is not a safe nor nice way of working, since the risk of losing some data is still present, but I'm planning to use AGE in a real project that is in its infancy right now, so the time could be just right to get to a production release of both at the same time.

Even with these premises, the technology is already pretty solid and you can use it safely for some serious graph development.

Running AGE inside Docker (on Apple Silicon)

Side note: I recently moved to a new MacBook Pro with the Apple M1 Pro processor, and I have to admit that I'm loving it very much. I was worried about having a bunch of problems with the migration but as of now, aside from Virtual Machines for x86 that are incompatible with Apple Silicon by design, I've yet to find any issue with the new hardware. To be honest, I'm impressed by the performance difference over my previous 2019 MacBook Pro with Intel i9, I'm really happy to see serious competition to Intel's monopoly in the CPU market.

That said, the "official" Docker image for Apache AGE is still built for the Intel architecture, and even if it works flawlessly with my M1 Pro, I still prefer to have a native image to play with, so let's see how to build it from sources.

To create a build of AGE from sources, you just need to clone the repository locally and launch the Docker build command. At the moment I found that a dependency has changed inside the base Debian Stretch image used by the base Postgres:11 image, so you need a small change to the Dockerfile.

Open it in your editor and change the FROM line from:

FROM postgres:11
...

The original Dockerfile

FROM postgres:11-bullseye
...

The patched Dockerfile

This changes the base image from Debian Stretch to Debian Bullseye where the postgresql-server-dev-11 package is available.

After changing the Dockerfile open the terminal and run the following command to build the image:

docker build -t sorrell/apache-age .

The Docker build command

I kept the sorrell/apache-age tag that is the same as the official image so that I can swap it with a new official image whenever it will be available.

At the end of the build process the new AGE Docker image is ready to play with, and in my case, also built for my hardware architecture.

When running dockerized database images, I prefer to keep the data outside the container so that I can upgrade the image version while preserving the content. For that reason, I use a local ./db-data folder mounted as a volume for the container. The service is then run using Docker Compose with the following definition:

version: "3.3"
services:
  db:
    image: sorrell/apache-age
    environment:
      - POSTGRES_PASSWORD=P@s5W0rd$
    ports:
      - 5432:5432
    volumes:
      - ./db-data:/var/lib/postgresql/data

The docker-compose.yml for Apache AGE

To start the service, from the folder where your docker-compose.yml file is stored, you can run the following command:

docker-compose up

Connecting to the database

After the startup process is completed you are ready to start playing with AGE inside your preferred SQL tool. I usually use DBeaver for this or the Database Console inside IntelliJ Idea while I'm writing code.

The connection is the standard PostgreSQL one, but to get AGE to work we need some additional configuration. It is possible to set the search path in the connector parameters, or they can be specified directly after the connection is established.

CREATE EXTENSION IF NOT EXISTS age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

Here we defined the age extension and loaded it, and we set the search path including ag_catalog to have the internal functions of AGE available without the need of prefixing them with the schema name.

Playing with the graph

Now that the database is ready let's play with it by creating a graph and putting some data inside it. In AGE each graph has its own schema inside Postgres, and the first thing to do before we can do anything is to create the graph.

SELECT create_graph('playground');

Now the graph is ready to be populated, and we have two possible ways of working:

define the structure of the graph by populating it with nodes and relations
create the structures for each node and relation beforehand

The first path is more dynamic so that you can define your graph while using it. This is a fast way, but you have to keep in mind that you cannot query things that don't yet exist. Like a relational database in fact, each node and relation is translated into one or more tables in the Postgres schema, and if you query a node with a specific label before creating it, the cypher query will return an error. That's why if you have your schema already defined it could be better to create all the structures before starting to populate them.

To create vertexes (nodes) and edges (relations) you can use the following functions:

-- Vertex definition
select create_vlabel(GRAPH_NAME,LABEL);

-- Edge definition
select create_elabel(GRAPH_NAME,LABEL);

An example schema

Let's imagine that we'd like to use AGE to model a basic product catalog.

An example catalog graph

We can create one or more catalogs, for example, an apparel one, and inside it, we want to manage collections, for example, the spring-summer and the fall-winter ones. Every collection is then made of items that can have sub items, and each item has its own attributes, images and attachments.

Each entity in this schema (the blue rectangles) can be translated into a Node or Vertex, and each relationship between nodes is then translated to a Relation or Edge inside the graph. As a convention, we'll use PascalCase for node labels (i.e. Catalog) and all upper snake case for relations (i.e. HAS_ATTRIBUTE).

Now that we know all our nodes and relations, we can start defining them inside the graph.

-- Nodes
select create_vlabel('playground','Catalog');
select create_vlabel('playground','Collection');
select create_vlabel('playground','Item');
select create_vlabel('playground','Attribute');
select create_vlabel('playground','Image');
select create_vlabel('playground','Attachment');

-- Relations
select create_elabel('playground','BELONG_TO');
select create_elabel('playground','PART_OF');
select create_elabel('playground','CHILD_OF');
select create_elabel('playground','HAS_ATTRIBUTE');
select create_elabel('playground','HAS_IMAGE');
select create_elabel('playground','HAS_ATTACHMENT');

Graph structure definition

As you can see, for relations we don't need to specify the connections, but just the label. The connection is then handled by the actual data.

A look under the hood

If we now have a look inside PostgreSQL, we will see that for each node and relation we created, AGE has defined a table to store the data. In addition to these tables there are also two generic _ag_label_edge and _ag_label_vertex.

Tables for our graph schema

In addition to these tables inside the playground schema (the name of the graph we created), there is an internal ag_catalog schema used by AGE to store the details of the graph.

Tables in the ag_catalog schema

The ag_graph table stores the information about the existing graphs, in our case there is one record referring to our playground graph.

The ag_label table contains one record for each table in our graph schema and the information about its function (edge or vertex) and the connection to the graph and the actual table.

Every vertex table is then defined by two columns, id and properties. The id is the internal ID of the record, while the properties are the internal "hashmap" in which you can store additional information for the node.

Every edge table is defined by four columns, id, start_id, end_id and properties. The id and properties columns have the same function of the same inside the node tables, while the start_id and end_id are the links to the ids of the nodes connected by the edge.

Filling the graph with some data

Now that we defined the schema and had a look under the hood at how things work, let's see how to fill the graph with some data about our catalog.

Creating the catalog

To create a node in our graph we can use the CREATE instruction of the OpenCypher language that AGE use (not every feature of Cypher is implemented at the moment). Since AGE works as an extension to PostgreSQL, the main language is SQL, and the Cypher part of the query is wrapped inside the cypher function.

select * from cypher('playground', $$
    CREATE (c:Catalog { code: "C001" })
    SET c.name = "Apparel Catalog"
    RETURN c
$$) as (catalog agtype);

This instruction creates a new node with the Catalog label and a property named code and value C001, then modifies the node adding a name property with value Apparel Catalog and returns the node just created.

The output of the command is the following:

{"id": 844424930131969, "label": "Catalog", "properties": {"code": "C001", "name": "Apparel Catalog"}}::vertex

As you can see, AGE created the node with the label Catalog, assigned it an id and set our additional properties inside the properties field, just as expected.

If you remember, when we created the graph schema and label definitions, we never defined some sort of primary key, and this is because in AGE there are no custom indexes (at least at the moment). This means that we have no "referential integrity" we are used to in traditional relational databases. In fact, if you try to run the same create query again you will get two identical nodes.

This opens up a new problem, we need to be careful when creating objects inside the graph otherwise we will end up with a lot of duplicates not connected to the right node and this will render the entire graph useless.

To "fix" this problem in Cypher there is a second way of creating objects, the MERGE instruction. Merge is like an UPSERT instruction meaning that it looks for a pattern and if it doesn't exist, it creates the required nodes.

Take the following example:

select * from cypher('playground', $$
    MERGE (c:Catalog { code: "C001" })
    RETURN c
$$) as (catalog agtype);

This checks if a node with a label Catalog and code equals to C001 exists in the graph, and if it doesn't, it creates it.

In this case we can't use the SET instruction that is similar to an UPDATE, but we need to split it into two separate commands:

one to create the node with the code property set to our "primary key"
one to set the additional properties

select * from cypher('playground', $$
    MERGE (c:Catalog { code: "C001" })
    RETURN c
$$) as (catalog agtype);

select * from cypher('playground', $$
    MATCH (c:Catalog { code: "C001" })
    SET c.name = "Apparel Catalog"
    RETURN c
$$) as (catalog agtype);

This way we can work around the missing index and ensure that the node is created only once.

You could be tempted to put all of your custom properties inside the MERGE instruction, but this is wrong, because as said, AGE works by pattern matching, so if you use more than just the key properties, you could end up duplicating the nodes just because the already existing one has a property with a different value.

So use the MERGE instruction with just the key property and then update the nodes or edges with the MATCH .. SET instruction.

Adding a collection to the catalog

Now that we have our catalog inside the graph, let's create a collection and connect it to the parent catalog with the BELONG_TO relationship.

select * from cypher('playground', $$
    MATCH (c:Catalog { code: "C001"} )
    MERGE (co:Collection { code: "SS22" })
    MERGE (co)-[:BELONGS_TO]->(c)
    RETURN co
$$) as (collection agtype);

In the example, we looked for the Catalog with code C001 keeping the reference in a variable c, then we created the Collection with code SS22 and the relationship between the two using the MERGE (co)-[:BELONG_TO]->(c) instruction.

One of the nice aspects of the Cypher language is that it is pretty graphical, in fact, it is quite clear that there is an oriented relationship between the Collection and the Catalog just by looking at the instruction.

Adding a couple of items to the collection

Let's now add a couple of items to the SS22 collection.

select * from cypher('playground', $$
    MATCH (co:Collection { code: "SS22" })
    MERGE (i1:Item { code: 'ISS001'}) 
    MERGE (i1)-[:PART_OF]->(co)
    MERGE (i2:Item { code: 'ISS002'}) 
    MERGE (i2)-[:PART_OF]->(co)
    MERGE (i3:Item { code: 'ISS003'}) 
    MERGE (i3)-[:PART_OF]->(co)
    RETURN [i1, i2, i3]
$$) as (items agtype);

Here we can see another useful feature of Cypher, the possibility to have multiple instructions inside a single query. In the example, we are creating three different Item nodes and connecting each with the SS22 collection.

Now things can get more interesting. Let's say we want to retrieve all the items inside the C001 catalog, we can do this with the explicit pattern matching as follows:

select * from cypher('playground', $$
    MATCH (c:Catalog { code: "C001" })<-[:BELONGS_TO]-(co:Collection)<-[:PART_OF]-(i:Item)
    RETURN i
$$) as (item agtype);

but we can also use a more compact and powerful representation, variable length paths. Take the following example:

select * from cypher('playground', $$
    MATCH (c:Catalog { code: "C001" })-[*]-(i:Item)
    RETURN i
$$) as (item agtype);

This tells the graph to return all the Items that have someway relation of arbitrary length with the catalog C001. This is where the power of graph databases can surpass the traditional relational model. This kind of query in a traditional model is really complex if not even impossible to accomplish, and forces you to design the data model for this kind of query with a lot of trade-offs.

Of course, this kind of unconstrained path matching should be used very carefully because on a huge graph, the complexity can explode, and with it also the time and resources needed to get the results.

Updating items

Now that we created a couple of items inside the collection, let's update them with some attributes. In the example the attributes are separate nodes connected to Item, so how do we handle them? In a way similar to what we did for collection and items: using a combination of MATCH and MERGE.

select * from cypher('playground', $$
    MATCH (i:Item)
    WHERE i.code = "ISS001"
    SET i.name = "T-Shirt"
    MERGE (i)-[:HAS_ATTRIBUTE { value: "S"}]->(:Attribute { name: "size" })
    RETURN i
$$) as (item agtype);

In the example we do many things in a single query:

we look for the Item with code = ISS001, this time using the WHERE clause that is very similar to the SQL counterpart
we update the Item with a property name with value T-Shirt
we then create an Attribute named size and connect it to the Item via an HAS_ATTRIBUTE relationship that also has a property named value with value S

This multiple-instruction capability is another useful feature of the Cypher language. Let's now add the color attribute to the item, this time with a single MERGE instruction.

select * from cypher('playground', $$
    MERGE (i:Item { code: "ISS001"})-[:HAS_ATTRIBUTE { value: "Blue"}]->(:Attribute { name: "color" })
    RETURN i
$$) as (item agtype);

If you noticed, in a graph also the edges can have properties, and this is because relations are first-class citizens inside a graph, and they carry the same if not more information than the nodes they connect.

Retrieving values from the graph

Now that we have a bunch of vertexes and edges in place, let's see how to get the information out of the graph. We previously saw how to get the Items of a Catalog but we retrieved the entire Vertex structure. In this case, we want to look for all the attributes of the ISS001 Item but we are interested only in the name and value pairs.

select * from cypher('playground', $$
    MATCH (:Item { code: 'ISS001'})-[v:HAS_ATTRIBUTE]->(a:Attribute)
    RETURN a.name, v.value
$$) as (name agtype, value agtype);

The result is the following:

name	value
"size"	"S"
"color"	"Blue"

You will notice that the values are surrounded by double quotes, and this is because AGE works with its own data type agtype that is a superset of JSON so the values you get in the results are JSON structures, in this case, JSON strings. You need to keep this in mind because you'll need to decode the value to remove the double quotes if you need the plain string content.

Another interesting feature of AGE and the Cypher language is that you can create custom return values, so for example, instead of returning the name and value pairs, we can ask the graph to return a list of JSON objects with the name and value properties with a query like this.

select * from cypher('playground', $$
    MATCH (:Item { code: 'ISS001'})-[v:HAS_ATTRIBUTE]->(a:Attribute)
    RETURN {
        name: a.name,
        value: v.value
    }
$$) as (attr agtype);

The result of the previous query will be the following:

attr
{"name": "size", "value": "S"}
{"name": "color", "value": "Blue"}

Playing with AGE in Java with MyBatis

As discussed in previous posts, for my work projects I usually use Java with Spring Boot and MyBatis. Since AGE has custom data types, MyBatis requires some additional customization to manage the results of the Mappers.

To help with this I created a small utility library that fills the gap and helps in using MyBatis with Apache AGE in a quite simple way.

Feel free to try it and if you have some questions you can open an issue directly on GitHub or contact me on Twitter @FabioMariniNet

GitHub - fabiomarini/age-utils: Apache AGE utilities for usage with Spring and MyBatis

Apache AGE utilities for usage with Spring and MyBatis - GitHub - fabiomarini/age-utils: Apache AGE utilities for usage with Spring and MyBatis

GitHubfabiomarini

Next steps

We still just scratched the surface of what a graph database can do, but we now have more knowledge of how things work inside AGE and how data are handled inside the Postgres schemas.

In future articles, we will go deeper into the model and try to see some more additional and powerful features of graph databases and AGE in particular.

Stay tuned for the next episodes.

Going reactive with PostgreSQL, Apache AGE and MyBatis

Fabio Marini — Sun, 03 Apr 2022 20:58:21 GMT

In my previous posts I described how to use MyBatis to interact with Apache AGE, the Graph database extension for PostgreSQL. In this post we will see how to write a reactive API using Spring WebFlux and MyBatis.

Traditional Spring Web APIs

If you have played with the Spring ecosystem and the Spring Boot framework you are probably used to the "traditional" way of building web applications and APIs, that is based on the Spring Web library built on top of Java Servlets.

I call these traditional because using Servlets has been the standard for almost the last two decades, and this is still the default way of building web applications in Java, being them built to run on Tomcat or on a Java Enterprise Edition application server like WebLogic or WebSphere.

There is nothing wrong on developing application with these building blocks, but in recent years the technology has changed a lot, and the processors inside computers are not scaling anymore "vertically" (by increasing the frequency) but "horizontally" (by increasing the number of cores). This means that the way we build applications needs to change to better exploit the power of these new CPUs.

In addition to this, servlets are still built on top of a mainly synchronous stack, so every request that comes to the application server keeps one thread busy until a response is sent back to the user. Servlet 3.0 and 3.1 are an evolution toward async execution but most of the business logic invoked by the servlets themselves is often still synchronous. This means that the application is still keeping threads busy while processing requests, and this limits the amount of calls that the application server is able to process and increases the resource utilization on the hardware.

The reactive evolution

In recent years the reactive approach was proposed and a few solutions have emerged to overcome this problem and increase the level of parallel calls while improving the overall utilization of the available processing power. Example of these libraries are Eclipse Vert.x, RxJava and Spring's own Project Reactor.

The latter is now part of the Spring Boot framework and you can choose which kind of application to build by swapping the starter dependencies of your project and following the different development paths related to the Web (Servlet) and WebFlux (Reactive) approaches.

All of these libraries are built on top of reactive streams, a new paradigm that changes the way information is processed. In a reactive stream, you reason in term of flow, the stream itself, and every computation should be thought in terms of processing one or more elements of a stream through Producers and Consumers.

In a stream context, you can have functions returning zero, one or N elements, and the basic types to work with a stream are Mono and Flux, with Mono being a stream of zero or one element, and flux being a stream of zero to N elements.

A stream should be thought as a pipeline, where elements enter on one side, are filtered and processed on one or more steps and then exit from the stream on the other side.

The main concept to keep in mind is that nothing happens inside the stream if no one subscribe to the output of the stream, so if you build the chain of calls to services and databases, but you don't subscribe to use the output of the stream, nothing gets invoked and processed.

This approach is more declarative, meaning that you declare the way every item of the stream should be processed by defining the steps of the pipeline that the item should travel from the Producer to the Consumer.

Another important concept is that the steps of the computation are not bound to a thread, and the reactive stream is built to use only a small amount of threads. This means that when a call reaches the endpoint, the application server (that is not Tomcat anymore but can be Netty or Undertow) takes the request and routes it to the business logic stream, then frees the thread that is immediately able to serve another request. When the backend business logic starts producing a response, the stream pipeline is followed backwards until a result is ready to be served back to the caller.

This way of processing, in addition to a better resource utilization, allows other features like backpressure, so that when the server reaches its limits, new calls are discarded after a timeout, preventing overload and denial of service.

Reactive Hello World with Spring Boot

To create a reactive Spring Boot application you can start with the Spring Initializer by selecting the Spring Reactive Web dependency.

Creating a Spring Reactive Web applicationT

Then pressing the Generate button at the bottom of the page you will get the template project ready to be imported inside your IDE of choice.

To add a simple HelloWorldController to the project, just create a new Class like the following:

@RestController
public class HelloController {

    @GetMapping("/")
    public Mono hello(String name) {
        return Mono.just("Hello ".concat(Optional.ofNullable(name).orElse("World!")));
    }
}

The HelloWorldController

As you can see the Controller is very similar to a standard Spring @RestController, with the only exception of returning a Mono instead of a plain String.

If you then start the project and point your browser to http://localhost:8080/?name=Fabio you should get the greeting Hello Fabio, or if you skip the name parameter you will get the Hello World! response.

Reactive DB access with R2DBC

When building a reactive application, as discussed, the entire processing chain should be reactive, meaning that no blocking calls should be used. This means that if you need to access a database in a reactive stream, you should not use JDBC, because it is built on a synchronous stack.

To solve this issue, the R2DBC project was created to build a new stack to interact with databases in a reactive fashion. The main databases are now supported, so your application can use MySQL, MariaDB, PostgreSQL, SQL Server, Oracle Database and other databases in a reactive way.

MyBatis in a reactive way

As discussed in other posts, my library of choice when dealing with databases is MyBatis, but as of now, it doesn't yet support reactive database access out of the box. To enable the support for reactive streams, I found a compatibility library that does just that, adds compatibility with both the R2DBC library and Spring.

GitHub - chenggangpro/reactive-mybatis-support: reactive mybatis support for reactive project using r2dbc

reactive mybatis support for reactive project using r2dbc - GitHub - chenggangpro/reactive-mybatis-support: reactive mybatis support for reactive project using r2dbc

GitHubchenggangpro

The good part of this library is that you can keep using MyBatis with almost no changes to the standard approach.

The differences are in the application.properties settings and in the signature of Mappers and TypeHandlers, the rest is pretty unchanged.

To add the library to your project (I use gradle but of course it works the same with Maven) you just have to add the dependencies to mybatis-r2dbc and mybatis-r2dbc-spring to the build.gradle file inside the dependencies

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-webflux'
    implementation 'pro.chenggang:mybatis-r2dbc:1.0.8.RELEASE'
    implementation 'pro.chenggang:mybatis-r2dbc-spring:1.0.8.RELEASE'
}

The build.gradle file for the project

Then in the application.properties file add the configurations for the database connection and mapper location.

spring.r2dbc.mybatis.r2dbc-url=r2dbc:postgresql://localhost:5432/postgres
spring.r2dbc.mybatis.username=postgres
spring.r2dbc.mybatis.password=postgres

r2dbc.mybatis.mapper-locations=classpath:mappers/*.xml

The application.properties file for the project

These are the main requirements to get MyBatis to work with reactive applications. After completing these steps we can create our mappers and start using them.

In our example we will build a couple of calls to retrieve data from a PostgreSQL database through Spring WebFlux.

The example refers to a SQL table named person that is defined as follows:

CREATE TABLE person (
    id UUID NOT NULL,
    name VARCHAR(100) NOT NULL,
    age INTEGER NOT NULL,
    PRIMARY KEY (id)
)

Let's start with the mapper XML declaration.

The PersonMapper XML declaration

Nothing too fancy here, just a couple of queries to retrieve all records from the person table or just the ones referring to a person with a specific name and to count the results of the latter.

These queries can be invoked through the Mapper interface named PersonMapper here described.

@Mapper
public interface PersonMapper {

    Flux findAllByName(@Param("name") String name,
                               @Param("maxItems") Integer maxItems
    );

    Mono countByName(@Param("name") String name);
}

The PersonMapper Mapper interface

Here the only difference you see from a standard Mapper interface is the usage of the Flux and Mono return types. The parameters are passed the same way as in the standard MyBatis.

Now to expose the queries to the world via REST APIs we need to declare a @RestController that calls the mapper methods. Here is the example controller:

@RestController("/persons")
public class PersonController {

    private final PersonMapper personMapper;

    public DemoController(PersonMapper personMapper) {
        this.personMapper = personMapper;
    }

    @GetMapping("/")
    public Flux getPersons(String name) {
        return personMapper.findAllByName(name, null);
    }

    @GetMapping("/paged")
    public Mono>> getPagedPersons(String name, Integer maxItems) {
        return Mono.zip(
            personMapper.findAllByName(name, maxItems).collectList(),
            personMapper.countByName(name)
        ).map(o -> new SizedResponse<>(o.getT1(), o.getT2()));
    }

}

The PersonController REST Controller

In this controller we can see a couple of example of basic reactive APIs, the getPersons method calls the mapper by passing it the name parameter and returning the list of results if any.

The second method, getPagedPersons does a similar job but it uses the Mono.zip method to bind together two different calls, one to the findAllByName query that looks for a page of result with maxItems results at most, and the other to the countByName query that return the total records matching the query. The result is then collected inside a SizedResponse custom object that contains both the data and the total count.

public class SizedResponse {

    private final T data;

    private final Long count;

    public SizedResponse(T data, Long count) {
        this.data = data;
        this.count = count;
    }

    public T getData() {
        return data;
    }

    public Long getCount() {
        return count;
    }
}

The SizedResponse object

Interacting with Apache AGE in a reactive API

Now that we have seen how to make MyBatis to work inside a Spring WebFlux application, let's see how to interact with Apache AGE inside the same application.

First we need to define a graph in AGE and create a couple of Item nodes inside it.

SELECT create_graph('playground');

SELECT * from cypher('playground', $$
  CREATE (i:Item { code: '1001', name: 'Item 1'})
  RETURN i
$$) as (a agtype);

SELECT * from cypher('playground', $$
  CREATE (i:Item { code: '1002', name: 'Item 2'})
  RETURN i
$$) as (a agtype);

SELECT * from cypher('playground', $$
  CREATE (i:Item { code: '1003', name: 'Item 3'})
  RETURN i
$$) as (a agtype);

Then we define our XML Mapper with the query for MyBatis:

The ItemMapper XML query definition

As you can see we have a simple Cypher query that looks for nodes labelled as Item and if we pass a code it filters the node with the given code. In addition to this if we pass a valid maxItems parameter we also limit the amount of items to the given value.

This is a quite standard MyBatis XML aside from the parameters that are handled with the ${} placeholder through the AgtypeWrapper we built in previous article and that you can find in my age-utils repository on GitHub.

The mapper interface for the previous query is the following

@Mapper
public interface ItemMapper {

    Flux findAll(
        @Param("code") AgtypeWrapper code,
        @Param("maxItems") AgtypeWrapper maxItems
    );

}

The ItemMapper interface

Here you can see that the mapper returns a Flux where an Item is defined as an extension to a GraphItem object

public class Item extends GraphItem {

}

The Item class

and the GraphItem that is also part of the age-utils repository is defined as follows:

public class GraphItem {
    private long id;
    private T properties;

    public GraphItem() {
    }

    public long getId() {
        return this.id;
    }

    public void setId(long id) {
        this.id = id;
    }

    public T getProperties() {
        return this.properties;
    }

    public void setProperties(T properties) {
        this.properties = properties;
    }
}

The GraphItem class

while the ItemProperties type is a Java bean representing the properties of an Item node.

public class ItemProperties {

    private String code;

    private String name;

    public String getCode() {
        return code;
    }

    public void setCode(String code) {
        this.code = code;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

The ItemProperties class

The call to the API is then managed through an ItemController like the following:

@RestController("/items")
public class ItemController {

    private final ItemMapper itemMapper;

    public DemoController(ItemMapper itemMapper) {
        this.itemMapper = itemMapper;
    }

    @GetMapping("/")
    public Flux getItems(String code, Integer count) {
        return itemMapper.findAll(
                AgtypeWrapper.from(code),
                AgtypeWrapper.from(Optional.ofNullable(count).orElse(5))
        );
    }

}

The ItemController REST Controller

The getItems method takes two optional parameters that are passed to the mapper via the AgtypeWrapper that handles the conversion to make it compatible with AGE and the Cypher query.

Handling the result types of reactive queries

If you try to run the previous code right now you will get an error from the application, because MyBatis doesn't yet know how to handle the results of the query and how to map it to an Item. To fix this issue we need to use the JsonTypeHandler defined in previous article and extend it to make it compatible with the mybatis-r2dbc library.

Just as a reference, the JsonTypeHandler is defined as follows:

public class JsonTypeHandler extends BaseTypeHandler {
    private static final Logger logger = LoggerFactory.getLogger(JsonTypeHandler.class);
    private static final ThreadLocal objectMapper = ThreadLocal.withInitial(() -> {
        ObjectMapper objectMapper = new ObjectMapper();
        objectMapper.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false);
        return objectMapper;
    });

    @Override
    public void setNonNullParameter(PreparedStatement preparedStatement, int i, Object t, JdbcType jdbcType) throws SQLException {
        try {
            preparedStatement.setString(i, objectMapper.get().writeValueAsString(t));
        } catch (JsonProcessingException e) {
            throw new SQLException(e);
        }
    }

    @Override
    public T getNullableResult(ResultSet resultSet, String columnName) throws SQLException {
        String jsonString = resultSet.getString(columnName);
        return readFromString(jsonString);
    }

    @Override
    public T getNullableResult(ResultSet resultSet, int columnIdx) throws SQLException {
        String jsonString = resultSet.getString(columnIdx);
        return readFromString(jsonString);
    }

    @Override
    public T getNullableResult(CallableStatement callableStatement, int columnIdx) throws SQLException {
        String jsonString = callableStatement.getString(columnIdx);
        return readFromString(jsonString);
    }

    protected T readFromString(String jsonString) {
        if (jsonString == null) {
            return null;
        }
        try {
            return (T) objectMapper.get().readValue(jsonString, new TypeReference() {});
        } catch (Exception e) {
            logger.error("Error converting JSON value", e);
            return null;
        }
    }
}

The JsonTypeHandler class

It is just a standard MyBatis type handler that leverages the Jackson library to convert a JSON object to the corresponding Java type.

The reactive library adds a layer on top of standard TypeHandlers so we need to extend the previous class to make it compatible with the new reactive paradigm.

public abstract class JsonR2dbcTypeHandlerAdapter extends JsonTypeHandler implements R2dbcTypeHandlerAdapter {

    @Override
    public abstract Class adaptClazz();

    @Override
    public void setParameter(Statement statement, ParameterHandlerContext parameterHandlerContext, T parameter
    ) {
        statement.bind(parameterHandlerContext.getIndex(), parameter.toString());
    }

    @Override
    public T getResult(Row row, RowMetadata rowMetadata, String columnName) {
        String value = row.get(columnName, String.class);
        return value != null ? readFromString(value) : null;
    }

    @Override
    public T getResult(Row row, RowMetadata rowMetadata, int columnIndex) {
        String value = row.get(columnIndex, String.class);
        return value != null ? readFromString(value) : null;
    }

}

The JsonR2dbcTypeHandlerAdapter class

This is the JsonR2dbcTypeHandlerAdapter class that extends the previous JsonTypeHandler, and as you can see it is abstract, because to make it work correctly with mybatis-r2dbc we need to implement a type handler adapter for each specific Java type.

I extended the JsonTypeHandler instead of creating a different object for two main reasons: to reuse code as much as possible and because mybatis-r2dbc still leverages TypeHandlers in addition to the R2dbcTypeHandlerAdapter if you use result maps.

We don't need to handle the entire Item type, but just the ItemProperties because the result class is built through reflection at the property level so we just need to tell the library how to manage the internal property types if they are not base Java types.

The final implementation that is able to handle our ItemProperties type is the following:

public class ItemPropertiesTypeHandlerAdapter extends JsonR2dbcTypeHandlerAdapter{
    @Override
    public Class adaptClazz() {
        return ItemProperties.class;
    }
}

As you can see it just extends the JsonR2dbcTypeHandlerAdapter specifying our custom type. We need to override the adaptClazz method because it is abstract in the parent class.

With all these in place we can now call our API and get the results:

curl http://localhost:8080/items/\?code\=1001

[{
    "id":844424930131976,
    "properties":{
        "code":"1001",
        "name":"Item 1"
    }
}]

I hope you find this article useful to get started both with MyBatis and Spring WebFlux, and to start playing with Apache AGE in your reactive applications.

Error handling for REST APIs with Spring Boot in servlet and reactive applications

Fabio Marini — Sun, 20 Mar 2022 23:32:21 GMT

For this week post I initially planned to write about creating Reactive API with Spring Boot Web Flux, but as you know, things don't always go as planned.

One foundamental part of API creation is error handling, and for my projects I found what I think is a great library to standardize this, named error-handling-spring-boot-starter by Wim Deblauwe.

GitHub - wimdeblauwe/error-handling-spring-boot-starter: Spring Boot starter for configurable REST API error handling

Spring Boot starter for configurable REST API error handling - GitHub - wimdeblauwe/error-handling-spring-boot-starter: Spring Boot starter for configurable REST API error handling

GitHubwimdeblauwe

To me this is a great addition to any application exposing REST services because it intercepts all the unhandled exceptions (but you have tests to prevent this, aren't you? :-) ) and also managed ones, and returns a standard JSON structure to make the errors easier to consume in web applications.

The library manages most of the more common exceptions and it also allows you to add your custom ones in a really easy way, all while preserving the JSON response and without messing up your code.

All you need to do to get all this automagic time saver is to add the library dependency to your project, the Spring Boot AutoConfiguration will handle all the burdens of wiring up the configurations at boot time. You just have to do... nothing else than declaring your own Exceptions and throwing it where needed.

Adding the maven dependency


    io.github.wimdeblauwe
    error-handling-spring-boot-starter
    LATEST_VERSION_HERE

Adding the gradle dependency

compile 'io.github.wimdeblauwe:error-handling-spring-boot-starter:LATEST_VERSION_HERE'

How the error handling works

As the documentation suggests, by adding the library on the classpath, it will become active. It registers an @ControllerAdvice bean in the context that will act if an exception is thrown from a @RestController method.

All you need to do to manage your own custom Exceptions is to declare them like the following:

package com.company.application.user;

public class UserNotFoundException extends RuntimeException {
    public UserNotFoundException(UserId userId) {
        super("Could not find user with id " + userId);
    }
}

and you will get a response from the REST service like the following:

{
  "code": "USER_NOT_FOUND",
  "message": "Could not find user with id 123"
}

Nice, right?

And thsi is not only valid for business errors that lead to exceptions, but also for invalid requests. You can in fact add the @Valid annotation to your request parameters to require object validation (if a validator is on the classpath), and if something is not right, the library will handle the details for you.

Take the following example from the documentation, where an object is annotated with validation requirements:

public class ExampleRequestBody {
    @Size(min = 10)
    private String name;
    @NotBlank
    private String favoriteMovie;

    // getters and setters
}

and then used as a parameter for a REST service like the following:

import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.validation.Valid;

@RestController
@RequestMapping("/example")
public class MyExampleController {

    @PostMapping
    public MyResponse doSomething(@Valid @RequestBody ExampleRequestBody requestBody ) {
        // ...
    }
}

if the requestBody param is not valid, for example if the fields are empty, you will get a detailed response explaining what's wrong in your request:

{
  "code": "VALIDATION_FAILED",
  "message": "Validation failed for object='exampleRequestBody'. Error count: 2",
  "fieldErrors": [
    {
      "code": "INVALID_SIZE",
      "property": "name",
      "message": "size must be between 10 and 2147483647",
      "rejectedValue": ""
    },
    {
      "code": "REQUIRED_NOT_BLANK",
      "property": "favoriteMovie",
      "message": "must not be blank",
      "rejectedValue": null
    }
  ]
}

And you can also customize other details of the response like the status code just by annotating the custom Exception with the @ResponseStatus() like in the following example:

package com.company.application.user;

import org.springframework.http.HttpStatus;
import org.springframework.web.bind.annotation.ResponseStatus;

@ResponseStatus(HttpStatus.NOT_FOUND) (1)
public class UserNotFoundException extends RuntimeException {
    public UserNotFoundException(UserId userId) {
        super("Could not find user with id " + userId);
    }
}

There are a lot of other features in the library that are explained in the documentation that you can find here:

Error Handling Spring Boot Starter

Wim Deblauwe

Using the Error Handling library in Reactive projects

As I said at the beginning of this article, my initial intention was to create a demo project on how to write reactive REST APIs with Spring Boot WebFlux, based on Project Reactor with support for R2DBC (the reactive database access alternative to JDBC) and error handling managed by the aforementioned library, but... I discovered that it was only designed to work with Spring Web, that is the standard Servlet way of creating applications.

I also had a look at the repository to see if there was some Pull Request or Issue related to this requirement, and in fact I found a couple of issues on the topic, but no PRs.

Having access to the source code of a library is an invaluable thing both for work and as a learning tool, because you can understand how things work under the hood and, in case of a problem, what is going wrong. The other great value in Open Source is that you can fix or enhance things when you find something wrong or need something that is missing.

That said, I started looking to the Spring Boot documentation to see how error handling was supposed to be done in a Reactive project and how it was currently handled inside the library, and after a couple of hours of documenting and writing code I completed the addition of the support of the error handler for Reactive projects.

The great part of the addition is that it works seamlessly with the original code and transparently inside the Spring Boot application, so for you the developer, the user experience is exactly the same as before, just add the library dependency to your project and create your custom exception, the error handling is managed the same as in a traditional servlet application.

You can find my pull request here or you can use my fork of the library while waiting for the PR to be accepted.

Update: The PR has been accepted and the reactive support will be included in version 3.0.0 of the error-handling-spring-boot-starter library

Blending two worlds: Apache AGE with MyBatis (part 2)

Fabio Marini — Sun, 13 Mar 2022 23:02:41 GMT

In my previous post I talked about Apache AGE and how to use it in a Spring Boot application together with MyBatis. In this new article I will make things a little bit more interesting by explaining how to use dynamic queries and passing parameters and data to the database.

In the previous post, I showed how to get data from the graph database using a combination of custom Java objects and a JsonTypeHandler for MyBatis. In the example I used only static queries, meaning that no parameters nor data was passed to the query to get results.

Since no application can be really useful just by using static queries, we will now have a look at how to interact with the database in a more dynamic way, by passing data and parameters to the Cypher code.

Querying with Cypher

As previously discussed, to interact with Apache AGE we need to use the cypher function inside the FROM clause of the SQL query. The function takes three parameters:

the name of the graph
the Cypher query
an optional parameter object - more on this later

The example query used in my previous post was the following:

SELECT * FROM cypher('${schema}', $$
    MATCH (i:Item)
    RETURN id(i), properties(i)
$$) AS (id agtype, properties agtype)

Retrieve all the Item nodes

This just means: "match all the nodes tagged as Item and return their id and custom properties".

Cypher is the query language created by Neo4J to interact with graph databases in a clear and easy way. Apache AGE uses a dialect of Cypher named OpenCypher, but at the time of writing this article it only supports a subset of the complete language specification.

That said, in Cypher, like in SQL, there is often more than one way to do the same query, and you are free to choose the one that is best suited to the specific needs of your application.

Let's say that instead of all of the nodes tagged with Item, we just want one of them, the one with the custom property code with value 1001. This can be done in two ways:

by adding the custom property filter inside the MATCH instruction
by using a WHERE clause

The first way, using the MATCH filter is like the following:

SELECT * FROM cypher('${schema}', $$
    MATCH (i:Item { code: "1001" })
    RETURN id(i), properties(i)
$$) AS (id agtype, properties agtype)

MATCH with custom property

As you can see the parameter is used as a template inside the MATCH instruction. This way we can retrieve all nodes tagged with the Item label AND having a custom property named code with value 1001, just as desired, but we can only match by equality of the exact value of the property.

The second one using WHERE is the following:

SELECT * FROM cypher('${schema}', $$
    MATCH (i:Item)
    WHERE i.code = "1001"
    RETURN id(i), properties(i)
$$) AS (id agtype, properties agtype)

Using the WHERE clause

With the WHERE clause we can use more expressive conditions like in a traditional SQL query, using both operators (like =, <, > or =~ for like) and functions.

Passing parameters to the query

Aside writing static queries like the previous ones, Apache AGE supports the use of dynamic parameters, but in an impractical way in my opinion. The documentation explains how to use prepared statements to pass parameters to Cypher queries, but the only way is by explicitly defining the prepared statement like in the following example:

PREPARE find_item_query(agtype) AS
SELECT *
FROM cypher('graph', $$
    MATCH (i:Item) 
    WHERE i.code = $code //Cypher parameter
    RETURN i
$$, $1) //An SQL Parameter must be placed in the cypher function call
AS (i agtype);

To run the query, you then need to execute it with the EXECUTE statement, like the following example:

EXECUTE find_item_query('{"code": "1001"}');

This way you can pass a parameter object of type agtype (the custom JSON superset introduced by AGE), and it surely works if you call the query by hand, but it still has a couple of drawbacks:

first, you have to define all the queries before you can use them, and you are not allowed to call the PREPARE statement with the same name twice of course. If you need to change the query you first need to DEALLOCATE the prepared statement and them redefine it.
second, it doesn't work with the way MyBatis handles the query and pass parameters to it with the standard #{param} notation.

A viable workaround

In addition to the #{param} notation to pass parameters, MyBatis offers an additional way to pass dynamic content to a query, that is the ${param} notation. This is not the way prepared statement uses to pass parameters to the query, instead it is a more basic text replacement. You read it correctly, it's text substitution, meaning that the raw value of the parameter is copied inside the query text replacing the ${param} placeholder.

This of course is not a good way to pass parameters coming from the outside world because it can lead to any sort of SQL Injection attack. But the fact that a raw text replacement is unsafe doesn't mean that the entire process must be unsafe.

Let me explain. When you insert a ${param} placeholder in your query, MyBatis takes the param variable from the inputs to the query, gets it string content and replaces placeholder with the text. If you noticed, I used this exact same notation to pass the name of the graph to the cyhper function in my example query. I can do it safely because the schema variable comes from the application settings, so it is handled by me and I can be quite sure about its value.

So how can you make a totally unsafe text replacement into a safe alternative to pass parameters to a query?

As I said, to replace the placeholder with the final value, MyBatis takes the parameter and gets its string representation. This means that if you pass an object that is not a String, MyBatis invokes the toString() method to get the final String representation.

This means that we can create a wrapper object that takes the value of our parameter and translates it into its safe string representation. And since AGE internally uses the agtype for all the inputs and outputs we can build our wrapper on top of it.

The AgtypeWrapper

The idea is pretty simple, take the following code as an example

public class AgtypeWrapper {
    private final T object;

    public AgtypeWrapper(T object) {this.object = object;}

    public static  AgtypeWrapper from(T object) {
        return new AgtypeWrapper<>(object);
    }

    // body of the class skipped for brevity
    
    @Override
    public String toString() {
        if (object instanceof String
                || object instanceof Integer || object instanceof Long
                || object instanceof Double || object instanceof Float
                || object instanceof BigInteger
                || object instanceof BigDecimal
                || object instanceof Boolean) {
            return serializeAgtype(object);
        } else {
            return serializeAgtype(makeAgtypeFromObject(object));
        }
    }
}

This is the main Idea behind this wrapper, take the content of an object and convert it into its agtype representation, then pass it to the MyBatis query as text.

I know it can be a raw approach but it is very similar to the way TypeHandlers work when converting output data, and it works.

The following is an example of the usage of the wrapper:

FindItemParams params = new FindItemParams();
params.setCode(code);
GraphItem item = itemsMapper.findItemByParams(AgtypeWrapper.from(params))

example usage of the AgtypeWrapper inside a MyBatis Mapper

You define your parameter object, in this example by declaring a variable of type FindItemParams. To pass it to the mapper you then need to call the from(object) builder method of the AgtypeWrapper. Since the wrapper is not a string, MyBatis will then call the toString() method to get the string representation of the value, and this triggers the internal converison of our param into its agtype equivalent.

In my case the FinditemParams class only contains the code parameter, so the agtype representation is the following:

{
  code: "value"
}

agtype representation of the params variable

This looks like JSON but it doesn't have the double quotes around the property name, similar to a JavaScript object definition.

The query invoked by the mapper is the following:

SELECT * FROM cypher('${schema}', $$
    MATCH (i:Item ${params})
    RETURN id(i), properties(i)
$$) AS (id agtype, properties agtype);

where our params are passed as ${params} without the need of additional escaping.

The same can be done for basic types like String, Integer, Double, Boolean and so on since the wrapper will automatically escape the value while converting it to agtype.

To pass a string for example you can use the wrapper like this:

AgtypeWrapper.from("myvalue")

since the from method is generic and automatically creates the AgtypeWrapper wit the correct representation

public static  AgtypeWrapper from(T object) { /*...*/}

the AgtypeWrapper from method

How the wrapper works

The principle around which the wrapper works is pretty simple: when someone calls the toString method, it takes the internal value, converts it to agtype and then serializes the value to string.

The serializeAgtype method of the wrapper is a slightly modified version of the method with the same name inside the AgtypeUtil class you can find in the Java driver inside the AGE GitHub repository.

incubator-age/AgtypeUtil.java at master · apache/incubator-age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL. - incubator-age/AgtypeUtil.java at master · apache/incubator-age

GitHubapache

The makeAgtypeMapFromObject method uses Java reflection to get all the properties of the wrapped object and build an agtype representation. This can be a simple value if the object is of basic type, like a string, integer, boolean and so on, or an AgtypeMap or AgtypeList if it is a complex object or an array or collection.

The wrapper implementation currently handles the following property types:

String
Integer / Long
Float / Double
BigInteger
BigDecimal
Boolean
Array
Collection (like List or Set)
Map
Custom object following the JavaBean standard

You can find the AgtypeWrapper inside my age-utils repository:

GitHub - fabiomarini/age-utils: Apache AGE utilities for usage with Spring and MyBatis

Apache AGE utilities for usage with Spring and MyBatis - GitHub - fabiomarini/age-utils: Apache AGE utilities for usage with Spring and MyBatis

GitHubfabiomarini

Blending two worlds: Apache AGE with MyBatis (part 1)

Fabio Marini — Sun, 06 Mar 2022 21:55:22 GMT

In my previous post I introduced Apache AGE and the possibility to use Graph Databases in an existing project while at the same time keeping the relational structures in place.

As previously discussed, if you have an existing project with a lot of data structures already in place it would be quite expensive and dangerous to completely rewrite everything in a big bang approach using a completely different technology.

One of the advantages of using Apache AGE, is based on the fact that it's a PostgreSQL extension, so you can keep using your previous database and libraries while gradually adopting the new technology.

On the other side, a graph database is a very different world compared to the relational model, so you have to expect some changes to be required during the transition.

MyBatis

To build my projects I almost always use Java as the go-to technology together with the Spring Boot framework and a bunch of other libraries depending on the specific needs of the application. To access databases my preferred library is MyBatis, because it is rock solid and slim, it is perfectly integrated with the Spring Boot ecosystem and it allows me to decouple SQL from Java code through the use of mapper files (XML files with the SQL code).

I know you could think that there are other solutions like Hibernate that can hide the SQL details and bring some magic to data access, but I don't like some technical details of Hibernate like the use of proxies on the data objects and the use of session to "transparently" access relations.

MyBatis surely needs a bit more boilerplate configurations and code (there are IDE plugins and tools to generate them) when compared to Hibernate, but in return it gives more flexibility while keeping the door open to the possibility of tweaking the queries and use native features of databases (like AGE) that cannot be used otherwise.

In addition to this, I work on many enterprise projects when it is often required to interface with existing legacy databases, not designed for the requirements of tools like Hibernate, so MyBatis is a better solution to me because it allows me to map any existing database very easily.

The mapper files

In MyBatis you can use the @Query annotation to add your query on top of a mapper interface, but I strongly prefer to separate the SQL code from the Java sources for readability. The IDE plugins are more than good enough to help you move from Java code to the linked XML query location without messing up your source files.

Here is an example XML mapper file that contains a couple of queries, as you can see it is very easy to create these files and they help you separate application code from database code. It is also possible to use different XML files to add compatibility to other databases if you need.

The mapper tag has an attribute that refers to a Java Interface that is the one you will use in your code to access database, and the select tags (or insert, update and delete ones) refers to single queries. The id attribute of each query refers to a method of the mapper interface, while the parameters passed to the SQL are enclosed in #{} or ${} tags, like #{username} or ${schema}. These are placeholders that will be replaced at runtime by the real value of the parameters.

There are a lot of other features related to XML mapper files, like collection mapping, conditional blocks and so on. You can find the documentation of all the features and the XML syntax in the MyBatis Mapper documentation page.

When used in combination with the Spring Boot library, MyBatis becomes a little bit more "magic", meaning that you just need to provide the XML files with the queries and declare a Java Interface with the correct names and ids, while the glue code needed to make things work will be generated by MyBatis at runtime during the boot of the application.

The example interface for the previous mapper file is the following:

package net.fm.server.mapper.app;

import net.fm.server.model.app.User;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;

import java.util.List;

@Mapper
public interface UserMapper {
    User findById(@Param("id") String id);
    User findByEmail(@Param("email") String email);
    User findByUsername(@Param("username") String username);

    List findAuthoritiesByUserId(@Param("userId") String userId);
}

As you can see, you have to annotate the interface with @Mapper, match the interface name and package with the namespace attribute of the mapper tag inside the XML and the methods name with the id attribute of each query block. Then you can map the method parameters to the query parameters usin the @Param annotation and match the method return type with the resultType of the query blocks of the mapper file.

That's it, you can then autowire (aka Inject) the mapper interface in your Services and Components and you are ready to invoke your queries in pure Java code while forgetting the details of SQL Sessions, connections and so on.

Here is an example service using the UserMapper interface, as you can see the use of the database becomes transparent to the application business logic.

@Service
public class MyServiceImpl implements MyService {
    private final UserMapper userMapper;
    
    public MyServiceImpl(UserMapper userMapper) {
    	this.userMapper = userMapper;
    }
    
    @Override
    public User findUserByEmail(String userEmail) {
    	return userMapper.findByEmail(userEmail);
    }
}

Apache AGE with MyBatis mappers

The AGE extension adds the power of Graph Databases to PostgreSQL and introduces a new datatype called agtype that is a superset of JSONB. Internally the graph structures are stored inside a schema with the name of the graph and a bunch of tables with the ids and labels of edges and vertices, with a lot of agtype columns containing the additional information like the custom properties of edges and vertices. Even the return type of every query is in fact of type agtype.

This means that to use an AGE Cypher query inside of MyBatis we first need to find a way to map parameters and results to and from agtype.

Let's start with the output parameters, that are the results of the graph query. As discussed, every output in AGE will have type agtype but since it is a superset of JSON, it means that we are getting a JSON string as our output type.

Take the following Cypher query as an example, where Item is the label of our nodes and each node has a set of properties, mapped as a dynamic agtype/JSON object. The output will be a list of records with the ids of the nodes labelled with Item and the properties of each

SELECT * FROM cypher('${schema}', $$
	MATCH (i:Item)
	RETURN id(i), properties(i)
$$) AS (id agtype, properties agtype)

The example result is the following:

id	properties
844424930131985	{"code": "1001", "name": "Item 1", "code2": "P1001"}
844424930131986	{"code": "1002", "name": "Item 2", "code2": "P1002"}
844424930131987	{"code": "1003", "name": "Item 3", "code2": "P1003"}

Where the id field is a long value and the properties field is a JSON string.

To help us map from the internal representation to the Java object that should contain it, MyBatis uses TypeHandlers that are objects that map to and from parameters inside Prepared Statements.

To play with that query I created the following MyBatis Mapper interface:

package net.fm.server.mapper.graph;

import net.fm.server.datatype.graph.GraphItem;
import net.fm.server.datatype.graph.ItemProperties;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;

import java.util.List;

@Mapper
public interface ItemMapper {

    List> findItems();

    List> findItem(@Param("code") String code);

}

The expected result of the findItems query is List>. The List is handled transparently by MyBatis when there is more than one result in the output. The inner type, that is the GraphItem, is a simple generic bean that holds the id of the node and a generic properties object that map to the custom properties of the node or relationship

package net.fm.server.datatype.graph;

public class GraphItem {

    private long id;
    private T properties;

    public long getId() {
        return id;
    }

    public void setId(long id) {
        this.id = id;
    }

    public T getProperties() {
        return properties;
    }

    public void setProperties(T properties) {
        this.properties = properties;
    }
}

The properties object, in this example, is represented by the ItemProperties type that is another simple JavaBean that maps the output of the properties JSON.

package net.fm.server.datatype.graph;

public class ItemProperties {

    private String code;
    private String code2;	
    private String name;

    public String getCode() {
        return code;
    }

    public void setCode(String code) {
        this.code = code;
    }

    public String getCode2() {
        return code2;
    }

    public void setCode2(String code2) {
        this.code2 = code2;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

To correctly map the query and the results we need the XML mapper file that is the following:

In this example you can see that at the top of the file there is a new tag, the resultMap tag. This is another powerful feature of MyBatis that allows us to specify the details of how we want the output to be mapped to the Java counterpart. In our case we have a GraphItem that has two properties:

the id property mapped with the tag (this is an optimization for caching reasons)
the properties property that is mapped via the tag to an ItemProperties object through a custom JsonTypeHandler class.

The latter is where the "magic" happens, meaning theat when the MyBatis gets the results of the query, for each record it invokes the type handler to map the JSON structure to our ItemProperties class and then sets it and the id inside the corresponding properties of the GraphItem object, then it collects all the results inside a list and returns them, reflecting the exact signature of our mapper method.

The custom JsonTypeHandler relies on the Jackson library to convert to and from JSON, and is defined as follows:

package net.fm.server.mapper.typehandler;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;
import org.apache.ibatis.type.BaseTypeHandler;
import org.apache.ibatis.type.JdbcType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.CallableStatement;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Map;

public class JsonTypeHandler extends BaseTypeHandler {
    private static final Logger logger = LoggerFactory.getLogger(JsonTypeHandler.class);
    private static final ThreadLocal objectMapper = ThreadLocal.withInitial(() -> {
        ObjectMapper objectMapper = new ObjectMapper();
        objectMapper.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false);
        return objectMapper;
    });
    private final Class type;

    public JsonTypeHandler(Class type) {this.type = type;}
    public JsonTypeHandler() {this.type = (Class) new Object().getClass();}

    @Override
    public void setNonNullParameter(PreparedStatement preparedStatement, int i, T t, JdbcType jdbcType) throws SQLException {
        try {
            preparedStatement.setString(i, objectMapper.get().writeValueAsString(t));
        } catch (JsonProcessingException e) {
            throw new SQLException(e);
        }
    }

    @Override
    public T getNullableResult(ResultSet resultSet, String columnName) throws SQLException {
        String jsonString = resultSet.getString(columnName);
        return readFromString(jsonString);
    }

    @Override
    public T getNullableResult(ResultSet resultSet, int columnIdx) throws SQLException {
        String jsonString = resultSet.getString(columnIdx);
        return readFromString(jsonString);
    }

    @Override
    public T getNullableResult(CallableStatement callableStatement, int columnIdx) throws SQLException {
        String jsonString = callableStatement.getString(columnIdx);
        return readFromString(jsonString);
    }

    private T readFromString(String jsonString) throws SQLException {
        if (jsonString == null) {
            return null;
        }
        try {
            if (type == null) {
                return (T) objectMapper.get().readValue(jsonString, Map.class);
            } else if (type.equals(String.class)) {
                return (T) jsonString.replaceAll("^\"|\"$", "");
            } else {
                return objectMapper.get().readValue(jsonString, type);
            }
        } catch (Exception e) {
            logger.error("Error converting JSON value", e);
            return null;
        }
    }
}

This allows the mapping of the JSON object to our Java counterpart, skipping properties that we don't need or know. In case we need to map all the properties we can use a Java Map object that is the most generic way to represent a JSON object in Java.

MyBatis parameters in Cypher queries

If you noticed, inside the mapper there is also another method, findItem that takes a code parameter to look for a specific item inside the graph.

Unfortunately, at the time of writing, AGE doesn't fully support Prepared Statement's parameter mapping, so we need a workaround to let MyBatis pass the parameter value to the query.

This can be done using the text replacement feature of MyBatis, that is the ${param} representation (instead of the #{param} one), but it must be used wisely since it is a plain replacement of the value, in fact I had to wrap the parameter with the single quotation mark ('${code}'). This can cause SQL Injection problems if the value is not correctly handled and escaped so we need to address it.

In the next article we will look at how to solve this issue with parameters mapping and how to pass complex objects to the CREATE and other statements to build more useful Cypher queries using MyBatis. The Cypher syntax is in fact slightly different from JSON when you need to represent the internal properties of a node (it doesn't wrap the property names in double quotes), so we need a different component to translate a complex object to the corresponding Cypher representation.

Going multi-model with PostgreSQL and Apache AGE: experimenting with Graph Databases

Fabio Marini — Sun, 27 Feb 2022 18:24:02 GMT

The Database landscape

When talking about Databases, for the majority of people the first thought goes to classic Relational Databases, or the SQL Databases as they are often referred to. In these Databases data is stored in rows and columns inside Tables that could be related via foreign keys, that are one or more columns that link back to the primary key (aka the ID) of the rows in another table.

The Relational Databases have been the norm for the last fifty years almost, but in recent years other kind of Databases have emerged to tackle with specific use cases in which the relational model could show its limits (mainly with performance and scalability).

In the last fifteen years many different type of Databases have been created in the form of Columnar Databases (like AWS Dynamo DB or Apache Cassandra for example), Document Databases (MongoDB or CouchDB) and Graph Databases (Neo4J).

Each of these new forms of Database have been created to better suit the needs of specific application domains and to increase the performance and scalability of traditional relational systems.

Why a Graph Database

Relational database are good for modeling pretty much every application domain if we talk about storing the data and being able to query it. But there are contexts in which the relational model is not the best choice in terms of performance or complexity. These are the contexts in which the relationships have the same or even greater meaning that the entities themselves.

Let's borrow an example from the book Graph Databases 2nd edition (you can grab a copy of it from the Neo4J website) about the context of a datacenter domain.

A Datacenter ER example model

This is still a pretty simple domain context but it has a sufficient number of entities and relationships to start making the queries on a traditional datbase big and slow.

Here is a possible modeling of the keys and relations on the previous ER model

The tables an relationships for the datacenter model

Now let's try to imagine a query that looks for the users impacted by the failure of one of the components of the infrastructure (Server, Load Balancer, Database and so on). You could end up with a monster query with many JOINs or with a lot of small queries each looking to a particular asset inside the model. Of course we will be able to get the answer to our question, but probably performance could become an issue, especially then the number of assets starts to grow.

Now let's try to model the previous domain as a Graph with assets and relations between them.

Example graph model for the Datacenter context

The first thing you can notice is that now relationships have become "first class citizens" inside the model, meaning that now every arrow has a specific meaning like HOSTED_BY, RUNS_ON, USES, USER_OF and so on. This allows the model to be more expressive just by looking to the diagram. In the relational model you only know that a table is linked to another but the meaning of that relationship could be difficult to get.

If now for example user User 3 of the previous image reports a problem, we can query the graph to find any possible asset involved by issuing the following query:

MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = 'User 3' AND asset.status = 'down'
RETURN DISTINCT asset

Wait... what kind of query is that?!? It's a query language called Cypher and introduced by Neo4J that is designed to query graphs in an easy way. It's defined as "some sort of ASCIIart to navigate the graph".

The previous query translated in plain english says the following: Look for any User named 'User 3' connected by one to five degrees of relationship with an Asset whose state is 'down' then return the list of distinct assets found.

If you think to the same in SQL... is it even possible? Probabily yes, but not in such a short and expressive way. Now you should be starting to get the point of a graph model.

Neo4J

We previously talked about Neo4J as a graph database and for a good reason, it is the most widely known graph database on the market and it has been around for more than ten years now, so it is pretty solid and battle tested.

But if I have to find something I don't like in Neo4J, it's Graph Only, so there is no "smooth transition" from a traditional relational Database, and if you have some piece of software already written you have to rewrite everything to use the new data model. In addition to this, Neo4J is written in Java, and even if I use Java everyday for application development, I'm biased to think that a Database written in Java could be a bit memory hungry.

Don't get me wrong, Neo4J is still the best option if you need to go for a graph only data model, it has a lot of features and if you need support there is enterprise support from the company that develops it.

An hybrid approach

I want to introduce the power of a graph database inside my application, but as everything, there is no one-size-fits-all, and graph database are no exception, they are good for graph traversal but they are not the best option for every possible scenario.

The requirements of an application can span widely sometimes and it could be useful to have the best of all worlds when talking about database, so one part of the data model as a graph, another as a group of tables and maybe another inside a document collection. The problem is that I don't like the idea of having many different database engines in the same application.

Could there be a solution to that?

PostgreSQL and Apache AGE

PostgreSQL is a well known name in the database landscape and for many good reasons. To name a few, it is one of the most powerful Open Source databases available and it is rock solid and performant at the same time. There are examples of PostgreSQL databases of tens of TeraBytes.

One of the nice aspects of PostgreSQL is that it allows the use of extensions to augment the capabilities of the database itself, and there are a lot of powerful extensions that allow PostgreSQL to span many different contexts. Example of extensions are:

TimescaleDB : an extension to deal with time series and windowing
PostGIS : an extension to work with Geographical Information Systems (GIS), that allows to work with coordinates an geospatial calculations
ZomboDB : an extension to enhance full text search capabilities of PostgreSQL while integrating with Elasticsearch

Apache AGE (in which AGE stands for "A Graph Extension") is another of these extensions and it's a sort of bridge between worlds of relational, document and graph databases, leveraging the stability and solidity of the PostgreSQL database while enhancing its capabilities with the power of a Graph Database.

AGE is still in Apache Incubator, meaning it's still in the early phases of development (the current version i 0.7), but it is based on the work done by BitNine's AgensGraph even if following a different approach. AgensGraph is a commercial graph database built on a fork of PostgreSQL 10, while Apache AGE is built as an extension to the standard PostgreSQL (at the moment of writing it supports version 11, but support for newer releases is expected in 2022) so it can be used as an "upgrade" to the standard features without loosing the rest.

The good about this approach, is that you can mix the features of a relational model with the ones of a graph databasee, meaning that you can mix together SQL queries on tables with OpenCyhpher queries on graphs.

If you think that PostgreSQL also has support for JSON and JSONB datatypes so that you can also have part of the features of a Document Database, you can understand why this hybrid solution could be a good option.

Let's see how we can start playing with Apache AGE with Docker, then in the next post we will see some of the features of OpenCyhper and the integration with the standard relational world.

Getting started with Apache AGE using Docker

The fastest way to get started with Apache AGE is to run it in a prebuilt Docker container, and you can do it with the following command:

docker run -it -e POSTGRES_PASSWORD={MyPassword} -p {HostPort}:5432 sorrell/apache-age

This is the "official" Docker image but at the time of writing it is three months old, so it doesn't have all the recently added features and bug fixes. To test all of the features let's build a new image using the latest sources from the GitHub repo.

Building the latest release from sources using Docker

GitHub - apache/incubator-age: Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.

GitHubapache

You can think that building a Docker image from the sources could be a complex process, but in reality it is very easy, just clone the repository on your machine with the following commands:

git clone https://github.com/apache/incubator-age.git
cd incubator-age

and tell Docker to build the image:

docker build -t apache/age .

After a couple of minutes (depending on the speed of your network connection and the power of your computer) the process should complete.

Launching an AGE instance with Docker

To start an instance of the newly built image with PostgreSQL 11 and the AGE extension you can run the following command:

docker run -it -e POSTGRES_PASSWORD=mypassword -p 5432:5432 apache/age

As soon as the container starts you are ready to connect to the database and start playing with SQL and OpenCypher.

If you prefer to have some graphical representation of the output of your queries, you can also experiment with the AGE Viewer, a node.js application (in early stages of development) that allows you to query and navigate the results in a graphical or tabular way directly inside the browser.

GitHub - apache/incubator-age-viewer: Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL. - GitHub - apache/incubator-age-viewer: Graph database optimized for fast ana...

GitHubapache

Connecting to the database

Another nice aspect of using Apache AGE is that you are daling with a standard PostgreSQL database, so to connect to it you can use the standard psql command line client

psql -h 0.0.0.0 -p 5432 -U postgres

or your preferred SQL client, like DBeaver.

In case it's not automatically loaded at startup, you can enable the AGE extension by issuing the following SQL commands:

CREATE EXTENSION IF NOT EXISTS age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

Now you are ready to start using your new graph database.

Create a Graph

The first thing you need to do to start playing with a grap model is... to create a graph. You can do it with the create_graph('GRAPH_NAME') function:

SELECT create_graph('my_graph_name');

Once the graph is created you can execute Cypher commands using the cypher('GRAPH_NAME', QUERY) function like the following:

SELECT * from cypher('my_graph_name', $$
  CypherQuery
$$) as (a agtype);

For example to create a node (or Vertex in the graph language) you can run:

SELECT * from cypher('my_graph_name', $$
  CREATE (a:User { firstName: 'Fabio', lastName: 'Marini'})
  RETURN a
$$) as (a agtype);

This creates a vertex with a label User and the firstName and lastName properties. Another good aspect of a graph database is that both Vertexes and Edges can contain properties that you can dynamically add and query.

We have just scratched the surface of what we can do with a graph database, you can have a look at the AGE's documentation to understand the features and the syntax of the OpenCypher query language

Apache AGE’s documentation — Apache AGE master documentation

Next steps

In the next article we will see how to maipulate the graph model and how to query it.

Make your site faster and more secure with AWS CloudFront CDN and HTTPS

Fabio Marini — Sat, 19 Feb 2022 22:03:34 GMT

In previous post we saw the basics of publishing a static website using AWS S3, but we ended up with a couple of raw edges that need smoothing for a good user experience.

The two main issues we have right now are the following:

the website is served from the public S3 URL
the pages can be viewed only via plain HTTP connection without encryption (no HTTPS)

We need to solve both of them to get the most out of our site, otherwise all the efforts spent into creating the site and the content inside it will be useless.

Moving to your custom domain

First of all to better promote your website you definitely need your own domain. Serving a website as a subdomain of some cloud provider won't help you much with SEO, so choose your own domain name and buy it from some provider, just search on Google for "buy domain" and you will be flooded by options like GoDaddy or NameCheap. You can buy a domain for a few dollars per year but choose a good provider with fast DNS servers, otherwise you could still end with a slow connection, and Google doesn't like slow sites, nor do users.

You don't need to buy any hosting service if you are already using AWS S3, so just buy the DNS management service for your domain.

If you prefer to go "all in" with AWS you can even use Amazon's Route 53 DNS Service to buy the domain and manage the DNS records. It should cost you less than 10 USD per year plus the cost of the domain. The latter depends on different factors like the first level domain you choose (.com, .org, .blog and so on).

Amazon Route 53 - Amazon Web Services

Amazon Web Services, Inc.

There is also a cost per query that you pay for, but it is so low (about 0,40 USD per million of queries) that at least to get started with a new site it should add up to a couple of dollars per year at most.

Adding HTTPS

Now that you have your shiny new personal domain name, you need to grab an SSL certificate to enable HTTPS, and for this you can use another AWS service, that is AWS Certificate Manager.

AWS Certificate Manager - Amazon Web Services (AWS)

AWS Certificate Manager from Amazon Web Services (AWS)

Amazon Web Services, Inc.

Using this service is a good option because it is of course well integrated in the AWS ecosystem and it is also free, which is not bad at all.

To create a new certificate open the AWS Console and select the Certificate Manager service from the main menu (or search it in the top bar).

AWS Certificate Manager console

In the right panel, choose the Request a certificate button to start the request wizard.

Request certificate procedure - Step 1

Select the Request a public certificate and clik Next

Request certificate procedure - Step 2

Insert the fully qualified domain name you bought before, like fabiomarini.net in my case, then click the Add another name to this certificate and add the same domain again with the www prefix (like www.fabiomarini.net), then leave the other options to their default values and click the Request orange button at the bottom.

Once the request process is completed, you have to go back to the AWS Certificate Manager console, select the new certificate you just created and scroll down to the Domains panel. Here, in the third column of the table you will find the validation entries that you need to add to your DNS to get the validation for the certificate.

Request certificate procedure - Step 3

The verfication records are CNAME entries that you need to register for every domain inside the certificate to confirm that you are the owner of that domain name.

Once you have your validation entries configured in your DNS it will usually take a few minutes for the new records to propagate on the Internet and for AWS to get them. Once everything is ok you will get a Success in the status column for every domain in the Domains panel.

Now you are ready to complete the setup with CloudFront CDN using the S3 bucket you previously created as a source for your website, and the new domain protected by HTTPS.

CloudFront Setup

To setup CloudFront, from the AWS Management Console, select CloudFront from the menu or search it from the search bar, then from the Distributions submenu, click the Create distribution button.

Cloudfront Create distribution - Step 1

In the Origin domain input field paste the public domain of your S3 bucket. Be careful not to select the S3 bucket from the list otherwise you will not be able to access your website. Instead go back to your S3 bucket and copy the URL from the Static website hosting panel at the bottom of the page in the Properties tab.

In the Origin path - optional field you can add /index.html or leave it blank, then in the Name field enter a name that remembers you the purpose of this distribution.

Scroll down a bit and in the Default cache behavior panel, under the Viewer protocol policy select Redirect HTTP to HTTPS. This is not strictly needed but it will drive all users to use HTTPS instead of HTTP, and this is a best practice.

Cloudfront Create distribution - Step 2

Scroll down a little bit again and look for the Custom SSL certificate - optional field, then click it and select the previously created certificate to enable HTTPS for your custom domain.

Cloudfront Create distribution - Step 3

Confirm the creation of the new distribution with the Create distribution button at the bottom of the page to get back to the main CloudFront console. Wait for the new distribution to complete the creation process and open it by clicking on its name to see the details.

In the General tab, the top most panel is the Details panel, in which you can find the domain address of the distribution. Copy it and add a new CNAME record in your DNS for the main domain and the www subdomain pointing to this address.

In case of the second level domain (the one without www), most DNS provider don't allow you to add a CNAME record, so you need to add one or more A records pointing to some IP address.

To get the IP addresses of your distribution open a shell on your computer and use dig or ping. Dig is better because it let you look into all the DNS records of the CloudFront distribution, while ping will only get one address at a time, but calling it a few times will give you all of the IP address of the distribution. Usually every CloudFront distribution should have at least 3 IP addresses so try to get them all.

Relying on the IP addresses of the CloudFront distribution is not the most reliable option because they could change in case of problems with the AWS datacenters, but they are stable enough to be used for non critical websites.

Checking the results

If you got to this point you should now be able to have your website served by the CloudFront CDN from the AWS endpoint closer to your location, so it should have the lowest latency.

You now have your static Ghost site served from AWS S3 through CloudFront using your custom domain and protected by HTTPS. And it should be fast, I mean, very fast.

To check this, try the Google Lighthouse benchmark after completing all the steps, but just wait a few minutes after completing the creation of the CloudFront distribution to let AWS distribute the content across its datacenters and get the best performance.

Serving a static website with AWS

Fabio Marini — Sat, 12 Feb 2022 18:37:41 GMT

In previous posts we saw how to use Ghost to build a personal blog or website and how to make a static version of it to make it faster and almost free to publish online. In this article we will have a look at how to use Amazon Web Services (aka AWS) to publish the site.

A brief introduction to AWS

In case you have never heard about AWS, it is the Amazon Cloud Computing branch, or a sort of supermarket for cloud computing services.

It started back in 2006 with the idea of selling internal experience that Amazon grew by managing their online store and datacenters while serving an always growing customer base.

It was one of the first if not the first modern Infrastructure as a Service provider and it has grown in the last fifteen years with an infinite catalog of services, covering almost every computing need you can have, from virtual servers to databases, storage, network, internet of things, artificial intelligence, and so on.

One of the advantages and disadvantages of these cloud services is pricing. AWS in particular has an aggressive and flexible pricing policy for some of its services that starts with on-demand pricing billed by the second of usage, to saving plans based on reservations if you know that you will be using one or more specific services for at least one year.

The dark side of cloud service pricing is that they often depends on so many variables that knowing the exact cost of the entire infrastructure can be compared to rocket science. But you have a near real-time view into the daily spending by service, so you can immediately know if you are misusing some of the resources or if you can use reservations to save more.

When talking about the cloud bill, Amazon is also a good advisor, because it ofter suggests you the best way to save money on your resource. I know it could look strange that a company that makes money on the resources you use suggests you how to spend less, but the true story is that by having every customer use the "right" amount of resources it can also serve more customers. So the money you save are the ones another customer could spend without Amazon having to spend more in infrastructure.

That said, to serve a static site you don't need that much infrastructure, and you can start with the simplest publishing solution, based on AWS Simple Storage Service, or S3.

Simple Storage Service (AWS S3)

S3 is one of the oldest services in AWS catalog, not meaning that is less useful or powerful, but only that is one of the first services offered to the customers together with Elastic Compute Cloud (or EC2).

S3 belongs to the category of Object Storage, that is a sort of cloud hard drive but you access it only via APIs and URLs. You can think of objects like files and folders on your computer, in fact they are very similar in the way they are used. The main difference is that you cannot access an S3 "folder" or bucket as it is named, like a real directory on your local hard drive. To do that you can use tools like Cyberduck that let you browse and edit objects like files while hiding the complexity of invoking the S3 APIs.

The good reasons to use S3 to serve a static website are mainly two:

the cost of the service
the speed and ease of access

Cost of Amazon S3

Talking about the cost of the S3 service, as previously discussed, there are multiple variables to consider to calculate the total cost, and they are mainly the amount of space used, the number of requests and the amount of traffic generated.

Storage cost

Starting with the cost of storage, the standard tier of S3 costs $ 0.023 per GB per month (at least while writing and in the eu-west-1 region, located in Ireland), and it is billed calculating the real daily used space. This means for example that if you use 10 GB for one day and then 1 GB for the rest of the month, you are paying only about $ 0.0307 for the month. Consider that your website will only take a few MegaBytes, so the space used is not an issue.

S3 Storage costs in eu-west-1 region

Requests cost

In addition to the storage cost, you have to calculate the amount of requests (the reads and writes to every object) that your website will generate.
Also for this cost the price is very low as you can see from the following screen of the pricing page.

S3 Requests cost in eu-west-1 region

While the site is new and you have a few users also this cost is not an issue.

Data transfer cost

The third main variable to keep in mind when calculating the price of S3 storage is the amount of traffic generated. In this case all the inbound traffic (the uploads to S3) are free, while the outbound traffic has a price per GB/month.

S3 Data Transfer cost in eu-west-1 region

The free tier

Free Cloud Computing Services - AWS Free Tier

Gain hands-on experience with the AWS platform, products, and services for free with the AWS Free Tier offerings. Browse 100 offerings for AWS free tier services.

Amazon Web Services, Inc.

To get started with its services, AWS gives all new customers a Free Tier that allows them to test the services for twelve months. This Free Tier also includes S3 so if you are just starting out with an AWS account you are likely not spending a cent for the entire first year.

AWS S3 Free Tier features

How to publish a static site to S3

To publish a static website to AWS S3 you have to access the Management Console and log into your account. If you don't have one you can create it on the fly from the login page.

The AWS Console login page

After signing into the account you can search the S3 service form the search bar at the top or browse the list of available services in the Services menu.

Accessing the S3 service from the AWS console

After accessing the S3 console, it is possible to create the new bucket that will host the website. Give it a name and choose the region in which you want to create the bucket. It should be the closest to your location to reduce the latency but also the closest to your desired users for the same reason.

So if you want to serve a website for US users you should choose one of the US regions, if you want to better serve European users you should choose an European region an so on. This is the base rule, but in the next article we will see how to serve globally with the best performance using AWS CloudFront Content Delivery Network.

Creating a bucket

In the S3 console, click the orange Create bucket button at the top, then, in the create bucket page, after you specified the name and chose the region, deselect the "Block Public Access settings for this bucket" option and confirm with the orange Create bucket button at the bottom of the page to complete the creation.

From the S3 console screen now click on the name of the newly created bucket to access additional settings, and select the Permissions tab, scroll to the Access control list (ACL) panel and click edit, then select the "Everyone (public access)" under the "Objects" column.

Then select the Properties tab and scroll down to the end of the page. There you find the Static website hosting settings, click Edit.

In the page, enable the Static website hosting by clicking on the Enable option and specify the index.html value inside the Index document field, then click the Save changes button at the bottom.

Enable static website hosting options

Static website hosting panel

Now you can see the public URL of your static website. If you click it a new tab opens in your browser showing you the content of the site that is still empty. To load the content and publish the website go back to the Objects tab of the bucket settings page and click the Upload button.

You can click the Add files or Add folders button to upload new files but you can also drag the files from your local folder and drop them inside the web page to upload them.

To start the upload press the Upload button at the bottom of the page and after completing the file transfer you should be able to see your new website from the public URL in the Properties tab.

If you have any issues with the site just published you can have a look at the AWS documentation page related to S3 website hosting.

Hosting a static website using Amazon S3 - Amazon Simple Storage Service

Host a static website on Amazon S3 by configuring your bucket for website hosting and then uploading your content to the bucket.

Amazon Simple Storage Service

The website should now be published and available to your users, but there are still a few raw edges to smooth. At the moment for example, the public URL is still the AWS one, while you should serve your site from your own domain.

In the next post we will see how to serve the site with HTTPS encryption and with the improved performance given by CloudFront, all almost for free.

Book Review: Show your work!

Fabio Marini — Sun, 06 Feb 2022 16:44:12 GMT

This book from Austin Kleon revolves around a simple concept: whatever work you are doing, start showing it with the world.

This seems like a controversial argument, because many of us prefer to keep our work concealed from the public, fearing that others will steal our work and our knowledge, but in reality the opposite is often true.

Even if you are a "master" in something, it doesn't mean that just by teching what you do to others, they are able to steal your work or beat you at your own game.

If a renowed chef makes cooking tutorials and show how he or she makes his masterpieces, it doesn't mean that whoever watches that tutorial is automatically a chef master himself or herself. It takes years of practice an errors before you can replicate the success of others and rarely (maybe never) success happens overnight.

Starting from this advice, the author invites us to share our daily work or passion, because by sharing freely with the world you can start making followers that share our same passion or do our same work, and this can grow overtime and even create links in real life instead of just virtual connections.

Another advice from the author is to be cautious with sharing your work or thoughts only via social networks, because the followers you get from that platforms is never really yours. They are ofter users lent from the platform to you based on algorithms that could change overnight depending on the interests of the owning company, to stimulate user attention and attract more money.

The real option is to own your content, buying your own domain name and starting a blog or website that is yours and represents your brand and your thoughts. This, coupled with the use of an old but always effective method to get your user up to date on important news. That old piece of technology is the newsletter, because one way or another, if you subscribe to it chances are that you are really interested in getting updates on the topic.

You should not get feared by the idea of sharing something not good enough. Everybody when starting will probably make mediocre content, but with time and practice the quality will improve and you will grow more followers, and this in the long time will help people find you and it could also become a full time job, like it is for many YouTubers and content creators around the world.

One thing to be careful about is feedback, because Internet (and the real world) is full of haters and dumb people ready to insult you and to show how stupid a human being could be. But this should not stop you, you must grow the ability to filter evil comments from real ones from people really interested in commenting and continuing the conversation.

In the end Show Your Work! is a relly nice small book that you should read if you are interested in becoming a content creator or you just want to share your passion with others. It's a quick read (about 130 pages) and it's packed with valuable advices and quotes that are useful to guide you through the path of of sharing your work or passion online.

Freezing the Ghost site to serve it statically

Fabio Marini — Sat, 05 Feb 2022 17:23:01 GMT

In previous post we saw how to build a blog or small website using Ghost as a content management platform of choice. In this article we will see how to build a static version of the entire site to prepare it for publishing.

Why a static site?

One of the first questions you can ask is: why making a static version if you already have Ghost serving the site?
Good question, let me explain the pros and cons of it, starting with the latter

CONS

need to rebuild the static version to release new content - still good for "infrequent" updates
no dynamic content - but it is possible to have client side dynamic integrations
no interactive features like search or membership area

PROS

the site is fast (I mean, really fast) - you are serving HTML files - and this helps with Google ranking
no need for powerful servers - no need for a Database or application servers, just a plain web server
no attack surface - there are no backends to compromise, just a folder with HTML files and images
nearly zero hosting fees - we'll see how to use AWS S3 to publish the site almost for free

If the cons aren't a stopper for you, a static site is still a very good option, in fact there are a lot of tools to generate static sites like Hugo, Zola, Jekyll, Eleventy (aka 11ty) and so on.

I still prefer the hybrid approach and the simplicity of Ghost to edit my articles instead of using Markdown files directly, so lets see how to generate a static version of a Ghost site and then how to publish it to AWS S3 and benefit from CloudFront to serve it as fast as possible for every user around the world.

The speed of a website is very important nowdays because search engines like Google decided to promote faster websites and have them ranked higher in the results, and this is crucial to the success of the site iself because "it only exists if it fits into page one of the search results".

Page two in Google search results is something like a mith, almost nobody has ever seen it. Did you?

Ghost Static Site Generator

To generate the static content from the Ghost site, being it running locally or on another location, I use Ghost Static Site Generator.

GitHub - Fried-Chicken/ghost-static-site-generator: Generate a static site from ghost and deploy using a CI

Generate a static site from ghost and deploy using a CI - GitHub - Fried-Chicken/ghost-static-site-generator: Generate a static site from ghost and deploy using a CI

GitHubFried-Chicken

Ghost Static Site Generator (or gssg that is the shell command name) is an open source tool built with Node.js around the wget command and works in a pretty simple way: given the URL of the homepage of a Ghost site, it scans the entire content and downloads every page and resource file locally.

It really is a bit more than that because it does a few more things that are mandatory to make the static content work correctly:

downloads every page as an html file
downloads every linked stylesheet (CSS) and javascript connected to the page
downloads the images inside every page and makes them into various optimized resolutions
rewrites the internal links in every page with the target URL

The last step is the most important, since you are not going to serve your site from localhost or some other local address, but from a public domain, and you need to have the links pointing to the correct page or resource in every page relative to that domain.

Installing gssg

To install gssg you need to have wget available locally, if you don't have it you can install it using Homebrew on a Mac ( brew install wget ) or Chocolatey on Windows ( choco install wget ). On a Linux machine wget is usually already installed by default, if missing you can install it with the specific package manager of your distribution, like apt-get, yum, dnf or others.

Once you have wget and a recent Node.js release installed and working locally, to install gssg you can use the following command:

npm install -g ghost-static-site-generator

After the installation completes you should have the gssg command available, as soon as you run it you will see the output of the tool starting to download from the default location (out of the box it is https://www.fabiomarini.net, or the default local Ghost URL) and the resulting files stored inside a static sub folder of the current path.

Command line parameters

You can customize a few details when running the gssg command to tell the tool which is the output domain and other settings.

Set the output domain, if you want for example to publish on https://www.mysite.com:

gssg --productionDomain http://www.mysite.com

Set the source domain, if you are not hosting the source Ghost locally, you need to tell gssg which is the source address:

gssg --sourceDomain http://myhiddenghostinstall:2368

Change the output directory. If you don't want your output to be stored inside the local ./static directory, you can change it with the option:

$ gssg --dest output-dir

Preview the site after it has been generated:

gssg --preview

There are another couple of options like silent mode (no shell output) --silent or fail on error --fail-on-error, you can read more about the details on the README of the project on GitHub.

After setting the correct options for your needs you should be able to get a static version of your Ghost site running locally and to preview it using the --preview option or with a tool like serve(https://github.com/vercel/serve#readme).

When everything is fine you are ready to publish your new statically built Ghost site to your preferred hosting servie. We will see how to do this using AWS S3 and CloudFront to get a super fast site almost for free in the next article.

Getting started with a Ghost site in 2022

Fabio Marini — Sat, 29 Jan 2022 17:47:11 GMT

To get started using Ghost we first need to have access to an instance of it. In this post we are going to have a look to the steps required to install Ghost locally on our machine.
I'm using a Mac so the instructions will be for MacOS, but you can find the specific instructions for Linux or Windows on the Ghost getting started page.

Installation of the prerequisites

Ghost is build using Node.js, so the first thing we need is a Node.js installation. You can install it directly on your machine but I highly suggest using Homebrew, a package manager for installing open source software (and not only that) on MacOS and to use NVM to manage the Node.js installations. So lets start by installing Homebrew.

Homebrew

The Missing Package Manager for macOS (or Linux).

Homebrew

To install Homebrew you just have to open your terminal application and paste the command you find in the homepage of Homebrew.
I prefer using iTerm2 instead of MacOS Terminal, but you can use your preferred terminal application.

❯ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

The install command for Homebrew

Once the installation is complete, you can start using the brew command to manage the software on your machine.
To check if everything is fine in your installation run:

❯ brew doctor

Have a look to the command output and check if other commands are required to fix the installation, for example installing XCode command line tools, otherwise you should get a confirmation that everything is fine and you can start using Homebrew to install new packages.

Installing NVM and Node.js

GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions - GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active nod...

GitHubnvm-sh

Node.js

Node.js® is a JavaScript runtime built on Chrome’s V8 JavaScript engine.

Node.jsNode.js

To install a version of Node.js locally, I suggest using NVM (that stands for Node Version Manager) that is a tool that allows you to have multiple versions of node installed and switch from one to the other with a simple command.
To install NVM we can now use our fresh installation of Homebrew with the command:

❯ brew install nvm

after brew does all its magic and installs the tool locally, we can run the following command to list the possible versions of Node to use.

❯ nvm list
        v8.17.0
->     v14.17.5
         system
default -> lts/fermium (-> N/A)
iojs -> N/A (default)
unstable -> N/A (default)
node -> stable (-> v14.17.5) (default)
stable -> 14.17 (-> v14.17.5) (default)
lts/* -> lts/gallium (-> N/A)
lts/argon -> v4.9.1 (-> N/A)
lts/boron -> v6.17.1 (-> N/A)
lts/carbon -> v8.17.0
lts/dubnium -> v10.24.1 (-> N/A)
lts/erbium -> v12.22.8 (-> N/A)
lts/fermium -> v14.18.2 (-> N/A)
lts/gallium -> v16.13.1 (-> N/A)

Listing installed and available node releases

In my case I have two releases managed by NVM itself (v8.17.0 and v14.17.5) and another one directly installed by MacOS. The arrow points to the version currently in use, to change the release in use we can run the following command:

❯ nvm use v14.17.5

❯ nvm use lts/fermium

Below the list of installed releases is the list of available releases that can be installed with codenames related to a specific version. The releases marked as LTS are the ones suggested because are the Long Term Support releases, or the ones that will get updates for longer period of time, so they are safer to use in production.

The release recommanded by Ghost is the v14 release, so looking to the output of the nvm list command we can find this as the stable or as the lts/fermium release. Lets choose the latter and install it with the following command:

❯ nvm install lts/fermium

To complete the installation of the prerequisites, we also need the Node.js Package Manager, that is required to install every node.js application dependencies.
We can install it with the following command:

❯ brew install npm

Ok, now that we have all the required tools we can get started with the Ghost installation.

Installing Ghost

After the installation of NVM and Node.js we can go on and install the Ghost Command line tools (ghost-cli) with the following command

❯ npm install ghost-cli@latest

and then we can finally install our local development version of Ghost

❯ mkdir ghost
❯ cd ghost
❯ ghost install local

Here is the installation steps in a fancy asciinema movie version

After completing these steps you can point your browser to the URL that you find at the end of the installation procedure and start playing with your new Ghost installation.

The Ghost welcome screen after a fresh install

Note: The steps shown in this tutorial install a local development instance of Ghost backed by a SQlite database, if you need a real production installation follow the procedure on the Ghost Getting Started page and install it with a MySQL or MariaDB database and configure the environment settings acconding to your local system.

The technology that runs the site

Fabio Marini — Sun, 23 Jan 2022 22:31:41 GMT

After finding the motivation to build my own website as discussed in the previous post, I had two main areas to focus on:

planning the content to share
choose the technology to use as a foundation for the website

On the content side I started brainstorming some possible ideas for articles and posts, sketched a few concepts for tutorials and projects to show the tools and techniques I use everyday and so on. But before I can share any content I need a platform on which to share it.

On the technology side I know I'm starting from scratch and with an audience to build, so investing money before even knowing if someone could be interested in what I'm creating can be both good and bad at the same time. It could be good because it would push me to create content regularly and to share more ideas. But it could also be bad because rushing, especially in the beginnings when I'm new in this world of content creation, could mean throwing away money (even if we are talking about a risible amount) and creating bad content. Which would lead to a dead end very quickly.

I have a few requirements to focus on. To start I'm looking for:

a content plaform easy to maintain so that I can spend more time creating and less doing maintenance on the platform itself
a tool with an easy to use editor, possibly a WYSIWYG one, to help me visualize the output while writing
a platform that can start small and scale up or out to the need
something based on a recent technological stack and that is of course mobile first and that can be customized and extended
but most of all, a techonogy to create the fastest possible webiste, since I don't like slow things

Which platform for a new website in 2022?

With these requirements in mind I started looking around for possible candidates, considering both cloud services and on premise solutions, without discarding the possibility to build something myself as a side project.

After a brief investigation online I restricted the choice to the following three options:

Wordpress - it is as old as the blog concept itself, written in PHP, it is a sort of de facto standard for building small sites and blogs, and it is the straight choice for beginners, or at least it was a few years ago. It is available as a SaaS solution or self hosted, and there are even plenty of hosting services that support Wordpress at almost every price range.

WordPress.com: Fast, Secure Managed WordPress Hosting

Create a free website or build a blog with ease on WordPress.com. Dozens of free, customizable, mobile-ready designs and themes. Free hosting and support.

WordPress.com

Medium - an interesting SaaS solution that allows to start for free and to grow with membership options. A lot of blogs both personal and corporate ones are hosted on Medium and it can be an easy platform to start with.

Medium – Where good ideas find you.

Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.

Medium

Ghost - a more recent project, based on Node JS and available in SaaS or self hosted, basically similar to Wordpress in features but more interesting on the technical side as I'm already working with Javascript and node.js tools on a daily basis.

Ghost: Turn your audience into a business

The world’s most popular modern publishing platform for creating a new media platform. Used by Apple, SkyNews, Buffer, OpenAI, and thousands more.

Ghost - The Professional Publishing Platform

Long story short, after a quick comparison and after reading a few reviews of the three options, my choice was Ghost. The cloud service is very interesting and reasonably priced, but... since I'm a Software Engineer I wanted to play with the tool myself and dirt my hands building something on top of it instead of just subscribing to the cloud service and choosing a nice theme.

To be honest there was also a fourth option that was to use a headless CMS to host the content and build the entire frontend on top of it as a React application, but I temporarily discarded it to shorten the time required to build the website (not really discarded because Ghost can also serve as a headless CMS as I will discuss in a moment).

To keep things small but with a bit of customization, I opted to make a static site from the content coming from Ghost, so I had to choose how to transform the Ghost frontend to a static site that is FAST, but also nice and clean. The options were:

Next.js - The React framework to build static and dynamic sites with all the features you can imagine and with impressive speed.
Gatsby - a framework to build static and dynamic sites taking content from a headless CMS or other sources. The resulting site has impressive performance and the optimizations applied to images are among the best possible.
Fried Chicken's Ghost Static Site Generator - a tool that basically crawls the Ghost site and generate a static version of it replacing URLs and optimizing images.

Next.js

Next.js by Vercel - The React Framework

Production grade React applications that scale. The world’s leading companies use Next.js by Vercel to build static and dynamic websites and web applications.

The React Framework

I started playing with Next as my first option since taking content from Ghost and serving on a customized React frontend is pretty straightforward and you can get a working prototype in a matter of hours or even minutes if you already know the framework itself.

The first prototype was working pretty well and it was fast since the build task generated all the static pages upfront, ready to be served by a web server. But there was still something I didn't like: the optimized images plugin requires that the app is served from a running Next instance, so not that static anymore.

This requires a hosting supporting Node.js and makes part of the efforts of building the static version of the content upfront useless. Sure, Vercel (the company behind Next) has a cloud service to host Next (and other frameworks) on an optimized platform with a dedicated CDN (Content Delivery Network) and all the usual things you want for a top quality service, but this means that I also need to serve Ghost on a public service to let the image optimizer get the original content.

Instead of having to serve Ghost alone now I have two different services to manage... red light. This surely is a nice option for a bigger project, maybe for a big client, but for my little tiny newborn website is an overkill.

Gatsby

The Fastest Frontend Framework for Headless CMS’s

Gatsby is a React-based open source framework with performance, scalability and security built-in. Collaborate, build and deploy 1000x faster with Gatsby Cloud.

Gatsby

I already played with Gatsby a few months ago for a proof of concept project of a headless e-commerce platform (more about that on another post), and I loved it, so building a prototype with it should not be that difficult.

I started again with a new Gatsby project, added the Ghost source (there is an official plugin for that) and played a littte bit with the theme. Again, in a matter of a few hours I had a working prototype with a nice looking theme and the content served from Ghost. This time the build also generated every possible optimized version of every image so that it can be served in the most appropriate variant for each client. That means that the same image could be served as a jpg, png, webp or other format depending on the browser the user is using. Sure a top feature to get the best speed results, with the only drawback of taking up more disk space, but this is not a problem for me due to the fact that the site is very small at this stage.

I was quite impressed by the prototype performance and quality but... the internal links inside a Ghost post still refers to the Ghost URL, and the HTML structure of the content is deeply linked to the CSS classes of the Ghost content blocks. To be clear this is not a bug or a Gatsby defect (and also Next has the same problem), it is related to the Ghost API and its data structure.

The URL problem is not a difficult one to solve, since there is another plugin to handle this and rewrite to the correct addresses, but the HTML structure and CSS classes require me to rewrite the entire Ghost theme inside Gatsby. This is a one-time task, ok, but maybe it can create problems with future updates of Ghost and it also has the side effect of generating two different outputs of the same content, the Ghost preview based on the Ghost theme in use, and the final output based on the website styles.

We were so close to the target...

Using Gatsby is still an option but it requires more effort than expected for now, so yellow light, lets see if there is another option left.

Fried Chicken's Ghost Static Site Generator

GitHub - Fried-Chicken/ghost-static-site-generator: Generate a static site from ghost and deploy using a CI

Generate a static site from ghost and deploy using a CI - GitHub - Fried-Chicken/ghost-static-site-generator: Generate a static site from ghost and deploy using a CI

GitHubFried-Chicken

Ok, so lets try to keep it even simple and start with the easiest possible solution, we'll build a static version of the Ghost site using the Ghost Static Site Generator (gssg) open source tool.

gssg is a node.js command line utility that uses wget to save all the pages it can reach from the home URL of your Ghost site and process every page with some filters to replace urls and optimize images and dependencies. In its simplicity it does what it promises, that is, it generates a static version of the Ghost site inside a local folder that can be uploaded to your preferred hosting service and be served without external dependencies.

Since it does what I need, to get started with this site it is the easiest tool to use and the quickest solution to my needs, so... green light on it!

Next steps

Now that I have all that I need to get started, the next step is to start making content to share with you, so see you soon with the next article and if you want to get in touch with me, have a look at the contact page on how to contact me.