Using Amazon SQS as a Messaging Bus

In a previous blog post I wrote about how we use a Message Queue server in a Message Bus design. Since that post was written we’ve moved from ActiveMQ to RabbitMQ and now we are moving to Amazon Web Service Simple Queue Service (SQS). I’m going to cover:

  • What is SQS?
  • The Message Bus design Pattern
  • How we’re using SQS to integrate with outside systems.
  • Bonus: I’ll also include how we’re looking at using SNS for notifications.

Intro to AWS SQS

Amazon Web Services are a suite of hosted services run by Amazon that are available on a pay per use basis. SQS (short for Simple Queue Service) is their messaging queue service. SQS operates much like other Queue servers, you can:

  • Create queues
  • Place messages on queues
  • Read messages off of queues.

One important difference with SQS is that there is no guarantee about the order that messages are delivered in. In most other stand alone queue servers messages are delivered in a First-In, First-Out (FIFO) manner. The primary reason that messages can be delivered out of order with SQS is a feature called message visibility. When a message is read from SQS it is no longer visible on the queue. This prevents multiple clients from getting the same message to process. But unlike some other queues, reading a message from the queue does not remove it. A client must explicitly acknowledge a message for it to be removed from the queue. If a client reads a message, but doesn’t acknowledge it, the message will become visible again a short time later. While the message is not visible other messages can be processed, and thus completed before the message becomes visible again.

What is a Message Bus

A Message Bus is a software design pattern that allows for message senders and receivers to be decoupled. This means that message senders and receivers can be added and removed without effecting other senders and receivers attached to the bus.

While we started with a message bus in our design for using SQS as a queue, we are not currently taking full advantage of the Message Bus design pattern as we use one sender class and one process for reading and distributing messages to the correct subscribers.

How we are using SQS

Now that we’ve covered SQS and the basic concept of a messaging bus, lets look at how we put it all together at MerchantOS.

Generating Messages

We have many different message types that are generated and placed on our queue (Item, Customer, Integration Specific messages, etc). Since we are generating all of these messages within our own application we have simplified our message generation code to one class that is used across our system. Using one central message producer allows us to reduce the overall architectural complexity of generating messages and placing them on the queue. This also allows use to integrate other message generators in the future as we need or want them.

Receiving Messages

On the receiving side we have used a few techniques to improve the reliability of receiving and dispatching of messages

Using a CRON to receive messages

To receive messages we use one CRON process that wakes up and forks child processes. These child processes are each responsible for taking one message off of the queue and forwarding it onto its appropriate destination. We have previously used separate processes for each endpoint that needed a message. The downside to this technique was again complexity of monitoring each process and diagnosing when issues arose.

Reducing Coupling

While using one parent process that creates worker processes has potential pitfalls as a point of failure, we have mitigated those by reducing the coupling of the worker processes to the endpoints that they call. One technique we used was to have the workers call the endpoint (typically an RPC endpoint on our API), post it’s message payload and disconnect without waiting for a response. Each endpoint becomes responsible for logging its success or failure. By keeping the workers short lived we reduce the overall impact of one worker failing.

Routing Messages

Another way that our queue usage has evolved is by consolidating all messages into one queue. Previously we used per account queues, which allowed for segregating accounts that might be generating many queue messages from those that only generate a few. In the long run we found that it failed to give us any practical benefit.

We now use a two queue system where all messages are first placed on a “priority queue”. These messages are read off and dealt with by the worker processes described above. If one account is generating many messages, or if we have determined that the endpoint they are calling is saturated and cannot process any more messages, they are re-routed to a delayed queue. Messages on the delayed queue are read once the message volume has reduced or the endpoint is ready to receive messages again.

This arrangement of a limited number of message producers, sending to a limited number of queues, being read by a limited number of consumers, has helped to reduce our queue complexity, improve the transparency of the message flow through the system, and given us better monitoring of our queue status.

Integrating SNS

Our next steps with SQS and our queue development is to create the ability to generate webhook callbacks that can be notified about messages on the queue. We are currently evaluating Amazon SNS among other options to provide a way to generate notifications. With a general notification framework in place it will allow for more complete workflows that include MerchantOS as one component, including potentially being able to create websockets for a richer client interface.

Wrap-up

Our queueing infrastructure has evolved over time to help us create decoupled integrations while reducing our application complexity. Amazon SQS is an integral piece in our ability to reduce our application complexity while also allowing us to scale as our user base grows and our messages per user increases. We’ve got some exciting changes coming to our queuing infrastructure that will allow for richer interactions with API clients. Watch our blog for more posts as we release these changes.

2 thoughts on “Using Amazon SQS as a Messaging Bus

  1. Tony said...

    I am wondering if this can be used for the following setup. Lets say we have multiple routers out on the field and the router needs to be able to communicate with the main configuration server. Meaning the server needs to be able to tell a specific device out there on an outside network to ping the server and ask if there is any new information for it available.

    I am looking for a solution where the server would be able to ping any specific internet device in real time or with a really SHORT delay to ask if there is any new configuration available for it. I am intending to use this on a new product I am working on. The product has a website where a user can configure bunch of settings in the cloud. The web application needs to be able to communicate this back to any specific device with a unique ID.

    On
  2. I don’t think a queue is right solution for what you are describing. In theory it could work: the server could put updates for a device onto a queue which the device would poll and if there was an update do something about it. But it seems simpler to just have the device poll a service that tells it what the latest version (or update) is and then react accordingly.
    We’ve been looking into Cloud Print to do something like this. The way they solve this problem is to establish an XMPP connection (the protocol Jabber chat uses). The server can then signal to the clients there is some work to do over that protocol. This is instead of doing rapid polling.

    On

Leave a Reply

Your email address will not be published. Required fields are marked *