Latest posts

RabbitMQ plugins in Elixir

RabbitMQ plugins in Elixir

As RabbitMQ is gradually adopting Elixir (i.e. for the next generation of its CLI tools), it's natural that one would also want to use Elixir for writing plugins. There is a blog post from 2013 on that topic and most things there are still relevant - except the build instructions, as there was a complete revamp of build system in RabbitMQ. And I'm going to fill that gap.

RabbitMQ plugin development guide suggests that the easiest way to start a new plugin is to copy the simplest existing plugin rabbitmq_metronome. And I've re-implemented it in Elixir at https://github.com/binarin/rabbitmq_metronome_elixir. You can just fork it and start hacking (note that it only works with what is going to be a 3.7.0 release of rabbit). Or read further down about some details that make this possible.

Suppose we've created an elixir scaffolding using mix new. RabbitMQ uses erlang.mk as its build system, so our first task is to integrate some mix commands into Makefile. Here is the snippet that hooks mix into build process and which needs to be added to Makefile of the original metronome plugin:

elixir_srcs := mix.exs \
               $(shell find config lib -name "*.ex" -o -name "*.exs")

app:: $(elixir_srcs) deps
     $(MIX) deps.get
     $(MIX) deps.compile
     $(MIX) compile

Running mix 3 times in a row is a bit expensive, so it's advisable to add some aliases to mix.exs:

[
  make_all: [
    "deps.get",
    "deps.compile",
    "compile",
  ],
]

Then we can replace 3 mix calls with a single one in our Makefile:

app:: $(elixir_srcs) deps
     $(MIX) make_all

Another thing is that we can drop PROJECT_DESCRIPTION, PROJECT_MOD and PROJECT_ENV variables from Makefile, as erlang.mk uses them only to generate an .app file, and mix is already handling this task.

erlang.mk is the primary build system for RabbitMQ, so we need to tell mix that it shouldn't fetch or build dependencies that are managed by erlang.mk. For rabbit and rabbit_common which are always the direct dependencies we add this:

[
  {
    :rabbit_common,
    # `deps` is erlang.mk directory with dependencies
    path: "deps/rabbit_common",
    compile: "true",
    override: true
  },
  {
    :rabbit,
    path: "deps/rabbit",
    compile: "true",
    override: true
  },
]

There can be an additional trouble when we use some libraries that have transient dependencies on RabbitMQ sub-projects. E.g. this is the case with amqp elixir library, which depends on amqp_client and in turn on rabbit_common. If we don't explicitly specify that this depency is managed by erlang.mk, mix will try to fetch it - and it can fetch a version incompatible with our rabbit_common.

And that's it. There are some thing that I haven't figured out yet, like writing CT suites in Elixir; or that sometimes I need to delete plugins and _build folders to make my changes active. But other than that everything is fine.

RabbitMQ net-split messages explanation

RabbitMQ net-split messages explanation

After experiencing a network problem RabbitMQ writes a record to logs that looks like this:

=INFO REPORT==== 30-Jan-2017::19:04:04 ===
node [email protected]' down: connection_closed

In this case the reason is connection_closed. But sometimes it may not be evident what this actually means or what could have caused this error. Especially in some outright bizzare situations. Here I'm trying to document all the reasons that I've seen and how you can reproduce them.

connection_closed

This happens any time when a connection is closed using "normal" mechanisms. Some ways to reproduce it:

  • Stop a remote RabbitMQ node
  • Send RST from a remote node, e.g. using iptables
  • Attach to a running ErlangVM with gdb and do call close(some-fd) here

net_tick_timeout

Any time when a remote node stops responding - for sender it looks like blackholing. Some reasons:

  • Loss of network connectivity between 2 machines
  • Death of a remote machine
  • Firewall rule that drops packets
  • Somebody is sending a very big chunk of data through RabbitMQ cluster channel. E.g. such a big AMQP messages that it's enough to saturate network for at least net_tick_timeout.

disconnect

Explicit disconnect performed using erlang:disconnect_node/1. Either by some internal RabbitMQ mechanism or by somebody messing with rabbitmqctl eval.

etimedout

Another quite interesting reason. I believe that this can happen only when OS TCP stack is tuned in a such way that TCP timeout is less than net_tick_timeout. On Linux this can be reproduced with some extreme tuning:

cd /proc/sys/net/ipv4
echo 2 > tcp_keepalive_intvl
echo 1 > tcp_keepalive_probes
echo 2 > tcp_keepalive_time
echo 1 > tcp_retries1
echo 2 > tcp_retries1

econnreset

This is the most strange of all reasons which I've seen only in production logs but can't reproduce myself. One very probable explanation is that RST packet has arrived with an exceptionally bad timing - just after a socket was returned from epoll as a ready one, but before read/write operation on it actually started.