Microservices using Thrift RPC, Golang, and Nodejs (and GraphQL)

15.05.2020 — Microservices — 19 min read

What YAMLs are to Kubernetes, RPCs are to Microservices

RPC is the best way for inter-service communication in a microservices architecture. It works similarly to how a local procedure would do. Simply call the function!

Apache Thrift

Apache Thrift is a complete RPC framework developed by Facebook and later came under the umbrella of the Apache Foundation. The most notable software which uses Apache Thrift (simply "thrift" henceforth) is Cassandra. Being a complete RPC framework, it provides both Interface Definition Language (IDL) in form of .thrift files and the communication protocol for defining the services.

gRPC is another (more) popular one, but it is not a complete RPC framework, as gRPC deals with just the communication aspect of services while the IDL is defined through Protobufs.

I chose thrift because I have worked with it extensively on a year-long project of mine (will update the blog with it soon). There are some pros and cons of using gRPC over Apache Thrift and vice versa, but that's a story for another day.

Why Golang "AND" Nodejs

I recently began learning Golang and wanted to expand my horizons by building some projects with it. Nodejs just because I am quite adept at it (I'd like to believe). Also, Nodejs has the best support for GraphQL because of Apollo GraphQL. The main advantage of using RPCs is that they enable communication between different languages, and I found them to be suitable to experiment with for the new language I am learning.

Directory structure

The project is out there on Github on branch v1.

- thrift-graphql-demo/
  - service-gql/
    - build/          // script to generate client and server stubs in Nodejs
    - deployments/    // contains Dockerfile
    - src/            // main application code including .thrift generated stubs, GraphQL schema, resolvers, etc
    - package.json
  - service-user/
    - build/          // script to generate client and server stubs in Golang
    - deployments/    // contains Dockerfile
    - cmd/            // main application code - server implementation
    - internal/       // .thrift file and generated stubs
    - go.sum
    - go.mod
  - service-post/
    - build/          // script to generate client and server stubs in Golang
    - deployments/    // contains Dockerfile
    - cmd/            // main application code - server implementation
    - internal/       // .thrift file and generated stubs
    - go.sum
    - go.mod
  - k8s/              // Bonus section - Deployment using Kubernetes
  - docker-compose.yaml

Thrift IDL

The idea is simple:

service-user stores user-related information
service-post stores posts from each user
service-gql acts as an aggregator and API Gateway

Let's start by defining a simple implementation for each of the 2 thrift servers. The following is the IDL for the service-user. It returns a PingResponse to the ping() procedure when called.

1namespace go user
2namespace js user
3
4typedef i32 int
5
6struct PingResponse {
7  1: optional string message
8  2: optional int version
9}
10
11service UserService {
12  PingResponse ping(),
13}

Similarly, the service-post IDL can be defined (just replace "user" with "ping").

Generating stubs

The stubs can be generated by running...

./build/thrift-gen.sh

Which internally just has the following command with some helpers to create the directory beforehand.

thrift -r -out ./internal/thrift-rpc --gen go ./internal/thrift/user.thrift

This generates the following files and directory under the internal directory.

- internal/
  - thrift/           //  Contains .thrift file
  - thrift-rpc/user/
    - user.go         //  the service definition, as in UserService, PostService...
    - user-consts.go  //  the structs and types defined in the IDL, as in PingResponse, int...

For Golang specifically, 2 additional files are generated - user_service-remote.go which is a client. Pretty handy for testing purposes, though our client implementation will actually be in Nodejs in another directory.

In the service-gql, the same bash script will generate the stubs for Nodejs in the same way.

Golang implementation

Now that our stubs are generated, we can implement them. For the simple IDL that we have defined, it is quite easy. Inside the cmd directory, create the following folders and files...

- cmd/
  - impl/             //  implementation of the service procedure
    - handler.go      //  barebones handler - UserHandler, PostHandler
    - ping.go         //  implementation of the ping() procedure
  - server/           //  thrift server, which will listen to incoming connections will reside
    - server.go       //  `RunServer` which takes in the protocols, transports, and handlers to create the socket
  - thrift/           //  "package main"
    - main.go         //  takes in the address to listen on a flag and runs the `server.go`

Handler

The handler is a simple struct to which the procedure implementations are added.

1// handler.go
2package impl
3
4type UserHandler struct{}
5
6func NewUserHandler() *UserHandler {
7  return &UserHandler{}
8}

ping() implementation

ping() is a simple function that takes in no parameters and returns a response of type PingResponse

1// ping.go
2package impl
3
4var ApiVersion user.Int
5
6func (p *UserHandler) Ping(ctx context.Context) (*user.PingResponse, error) {
7  message := "ping"
8  log.Println(message)
9
10  PingResponse := user.NewPingResponse()
11  PingResponse.Message = &message
12  PingResponse.Version = &ApiVersion
13
14  return PingResponse, nil
15}

Note that ApiVersion is of type user.Int. This comes from the fact that we had declared int32 of all programming languages to be of type int in the IDL. I set its value in the main.go file.

Golang server

The server takes in transport (TTransportFactory), protocol (TProtocolFactory), address (string), and TLS enable (boolean) to create a socket listening on the port provided while running the program (next up).

Transport abstracts away the Serialisation and Deserialisation which happens in the system. Protocol is also an abstraction that defines the mechanism through which Serialisation and Deserialisation are done. To know more about the transport and protocol, I recommend going over to the official docs.

1//server.go
2package server
3
4func RunServer(transportFactory thrift.TTransportFactory, protocolFactory thrift.TProtocolFactory, addr string, secure bool) error {
5  transport, err := thrift.NewTServerSocket(addr)
6  if err != nil {
7    return err
8  }
9
10  handler := impl.NewUserHandler()
11  processor := user.NewUserServiceProcessor(handler)
12  server := thrift.NewTSimpleServer4(processor, transport, transportFactory, protocolFactory)
13
14  log.Printf("Starting simple thrift server on: %s\n", addr)
15  return server.Serve()
16}

I have skipped enabling TLS in the server for this demo.

Running the Golang server

Now that everything is set up and ready to run. All we have to write is our main file. I take some flags to pass in the address to listen dynamically, which is defaulted to localhost:9090.

1// main.go
2package main
3
4var (
5  addr = flag.String("addr", "localhost:9090", "Thrift Address to listen on")
6)
7
8var ApiVersion = 1
9
10func main() {
11  flag.Parse()
12
13  impl.ApiVersion = user.Int(ApiVersion)
14
15  transportFactory := thrift.NewTBufferedTransportFactory(8192)
16  protocolFactory := thrift.NewTCompactProtocolFactory()
17
18  if err := server.RunServer(transportFactory, protocolFactory, *addr, false); err != nil {
19    log.Println("error running thrift server: ", err)
20  }
21}

Wow, that was super easy 🤪 (No?)! All that we have to do now is run the commands from the CLI.

Say the magic words after me! GO... RUN ...

$ go run ./cmd/thrift/main.go
2020/05/15 00:33:15 Starting simple thrift server on: localhost:9090

Now we replicate the same behaviour in service-post (for simplicity, just copy over the same files and replace "User" with "Post").

Apollo GraphQL

Now that our Thrift server is up and running. We have to implement the Client for it. We will use Nodejs for implementing GraphQL, which will act as an orchestrator for other services and should be the only exposed service. Let us begin!

I just want to point out that I have been using Nodejs for quite a few years now and I am a big fan of doing things in a "High cohesion and Low coupling" manner (or simply put - very modular).

Inside the src folder of service-gql, create the following files and folders...

- src/
  - resolvers/         //  GraphQL resolvers
    - post/            //  resolver for service-post
      - post.js        //  aggregator for all PostService procedures
      - postPing.js    //  implementation of ping() for service-post
    - user/            //  resolver for service-post
      - user.js        //  aggregator for all UserService procedures
      - userPing.js    //  implementation of ping() for service-user
    - resolvers.js     //  aggregator for resolvers of all services
  - thrift/            //  generated thrift IDLs
  - thriftClients/     //  UserService and PostService clients
  - typeDefs           //  schema definition by using `gql` string templates
    - user.js          //  schema definition for UserService
    - post.js          //  schema definition for PostService
    - typeDefs.js      //  aggregator for resolvers of all typeDefs
  - utils/             //  helper functions like logger
  - app.js             //  the "main" file to run

Schema definition

Let's get started with the typeDefs. I use Apollo GraphQL's gql string templates to create schema as it keeps things extremely modular which enables easy modification at a later point in time.

1// typeDefs.js
2const { gql } = require("apollo-server-express");
3const user = require("./user");
4const post = require("./post");
5
6const base = gql`
7  type PingResponse {
8    message: String
9    _version: Int
10  }
11
12  type Query {
13    ping: PingResponse!
14  }
15`;
16
17module.exports = [base, user, post];

This is the "base" of the schema, things common to all services or otherwise should go here (like PingResponse). Unless PingResponse is going to change in either IDL, they should go into base, otherwise into their own files.

Let us extend this base to include the UserService and PostService details.

1// post.js
2const { gql } = require("apollo-server-express");
3
4module.exports = gql`
5  extend type Query {
6    postPing: PingResponse!
7  }
8`;

// user.js
const { gql } = require("apollo-server-express");

module.exports = gql`
  extend type Query {
    userPing: PingResponse!
  }
`;

Pretty simple huh? As the name suggests, the extend keyword in GraphQL includes it within the Query type of base. Now that we are done with defining our schema, let's get on with our resolvers.

Thrift Client implementation

The thrift IDLs are the same as those in other services (simply copied over) under the thrift/ directory. The stubs for Nodejs are generated by running the following command from the root of service-gql directory.

$ ./build/thrift-gen.sh

The underlying command is similar to how it was for Golang. Now that the stubs are generated, let's proceed with the client implementation.

I will demonstrate UserService's client; PostService's is the same just with some variable name differences.

1// userClient.js
2const thrift = require("thrift");
3const UserService = require("../thrift/user/UserService");
4const logger = require("../utils/logger");
5
6const SERVER_HOST =
7  process.env.SERVICE_USER_HOST || process.env.SERVICE_USER_CLUSTER_IP_SERVICE_SERVICE_HOST;
8const SERVER_PORT = parseInt(process.env.SERVICE_USER_PORT);
9
10logger.info(`userClient: ${SERVER_HOST} ${SERVER_PORT}`);
11
12const thriftOptions = {
13  transport: thrift.TBufferedTransport,
14  protocol: thrift.TCompactProtocol
15};
16
17const connection = thrift.createConnection(SERVER_HOST, SERVER_PORT, thriftOptions);
18let client;
19
20connection.on("error", err => {
21  logger.error(`userClient: ${JSON.stringify(err)}`);
22});
23
24connection.on("connect", () => {
25  client = thrift.createClient(UserService, connection);
26  logger.info("userClient: Connected to thrift server!");
27});
28
29connection.on("close", () => {
30  logger.info("userClient: Disconnected from thrift server!");
31  process.exit(1);
32});
33
34process.on("SIGTERM", connection.end);
35
36/**
37 * thrift user client
38 * @param {string} func thrift function to call
39 * @param {object[]} params params to pass to the thrift function
40 */
41const userClient = (func, params) =>
42  new Promise((resolve, reject) => {
43    client[func](...params)
44      .then(resolve)
45      .fail(reject);
46  });
47
48module.exports = userClient;

That is some client, right?! I shall explain...

I fetch the SERVER_HOST and SERVER_POST from the environment variables. The noticeable variable is process.env.SERVICE_USER_CLUSTER_IP_SERVICE_SERVICE_HOST I presume. This is set in Kubernetes which is a part of the Bonus! section towards the end of the blog.

Otherwise, it is actually pretty straightforward, use the same transport and protocol as the thrift server, create a connection, and log valuable messages according to whether the connection was established or not. The reconnect strategy that I have chosen is - nothing! I hope that the application is deployed using Docker-Compose or Kubernetes which will make use of the process.exit(1); to trigger a rerun of the container in which this service resides.

The magic part here is the userClient variable. Do you know that everything is an object in Nodejs (almost everything)? I just make use of that. I wanted to keep things clean and as reusable as possible. Simply pass in the function name as a string and the params (just like in Golang) and call the function. Luckily, this looks easier and/or messier in Javascript. Its usage will be more clear in the next section.

Just a side note that Apache Thrift uses Q library which is archaic! From a time before Promises were natively part of Javascript Language Specification. So it actually wraps the Q promise returned by Thrift into a Native Promise. Neat right?

Resolvers implementation

The aggregator file resolvers.js goes like this...

1// resolvers.js
2const _ = require("lodash");
3const user = require("./user/user");
4const post = require("./post/post");
5
6const base = {
7  Query: {
8    ping: () => ({ message: "ping", _version: 1 })
9  }
10};
11
12module.exports = _.merge(base, user, post);

I use lodash's merge to combine Objects with same keys as we will see next in our implementation of UserService's resolver.

The UserService's user.js aggregator looks like this...

1// user.js
2const userPing = require("./userPing");
3
4module.exports = {
5  Query: {
6    userPing
7  }
8};

Since this is the aggregator, the implementation actually lies in another file.

1// userPing.js
2const userClient = require("../../thriftClients/user");
3
4module.exports = async () => {
5  try {
6    const { message, version: _version } = await userClient("ping", []);
7    return { message, _version };
8  } catch (e) {
9    throw e;
10  }
11};

Here we make use of our thriftClient which I have described in the previous section. Since ping() does not take in any param, I have passed an empty array. Otherwise, I would have passed an equal number of arguments as defined in the IDL (wrapped in desired class) AND in the same order.

You might wonder why try-catch when I am just throwing it right? It is just because in case I wanted to do some rollbacks in case of multi-service operations, I could depend upon the error message. That will be a blog post for another day though.

Now go on and do the same for PostService (just copy the files over and replace "user" with "post").

The legendary app.js

Now that all our services and their implementations have been defined. Let's get on with the implementation of the HTTP server. It uses express and apollo-server-express to run the server. Here goes the implementation...

1require("dotenv/config");
2const express = require("express");
3const compression = require("compression");
4const bodyParser = require("body-parser");
5const helmet = require("helmet");
6const { ApolloServer } = require("apollo-server-express");
7const typeDefs = require("./typeDefs/typeDefs");
8const resolvers = require("./resolvers/resolvers");
9const logger = require("./utils/logger");
10
11const app = express();
12
13app.set("port", process.env.GRAPHQL_PORT);
14app.use(compression());
15app.use(bodyParser.json());
16app.use(bodyParser.urlencoded({ extended: true }));
17app.use(helmet());
18
19app.get("/healthz", (_req, res) => {
20  res.json({
21    health: "ok",
22    version: 1
23  });
24});
25
26const apolloServer = new ApolloServer({
27  typeDefs,
28  resolvers,
29  formatResponse: response => {
30    logger.info(JSON.stringify(response));
31    return response;
32  },
33  formatError: error => {
34    logger.error(JSON.stringify(error));
35    return error;
36  }
37});
38
39apolloServer.applyMiddleware({ app, path: "/" });
40
41/** Start Express server. */
42const server = app.listen(app.get("port"), () => {
43  console.log(
44    "  App is running at http://localhost:%d in %s mode",
45    app.get("port"),
46    app.get("env")
47  );
48  console.log("  Press CTRL-C to stop\n");
49});
50
51module.exports = { app, server };

If you are familiar with express then this will be self-explanatory. I will only point out the fact that to run the server, necessary environment variables like GRAPHQL_PORT have to be defined. To make life easier, I make use of the dotenv package which loads environment variables at runtime into the application from a special .env file. So let's define that in the root of service-gql directory.

1GRAPHQL_PORT=3000
2SERVICE_USER_HOST=localhost
3SERVICE_USER_PORT=9090
4SERVICE_POST_HOST=localhost
5SERVICE_POST_PORT=9091

This considers that the service-user is running on localhost:9090 and service-post is running on localhost:9091.

Now start the thrift servers and then let's run this server...

$ npm start # essentially running app.js
> service-gql@0.0.1 start /home/nsinvhal/Workspace/Go/src/github.com/ashniu123/thrift-graphql-demo/service-gql
> node src/app.js

(59254) [2020-05-15T17:29:27.651Z] info: userClient: localhost 9090
(59254) [2020-05-15T17:29:27.668Z] info: postClient: localhost 9091
  App is running at http://localhost:3000 in development mode
  Press CTRL-C to stop

(59254) [2020-05-15T17:29:27.752Z] info: userClient: Connected to thrift server!
(59254) [2020-05-15T17:29:27.753Z] info: postClient: Connected to thrift server!

Voila! All services are connected and running, exposed through a GraphQL server.

Since this is running in development mode. We can go ahead and check the Apollo GraphQL playground and see it in action!

Here is a screenshot of it in action!

Go ahead and try it yourself!

Final Words

This has been quite a long post and might seem too much just to implement a simple ping() function between 2 services. I would not disagree, but consider the following fact - the skeleton has been laid out! Now adding more procedures anywhere is a piece of cake! All you have to do is to do the following...

Update the .thrift file and generate stubs through the thrift-gen.sh file
Copy the thrift definition over to service-gql and run the same file as above
Add server implementation in Golang
Add typeDefs and resolvers in Nodejs

That's it! And this is replicable for any other service(s) you might want to add.

Bonus: Deployment

Now let's get onto the deployment part. How do we deploy these microservices? I'll show 2 ways - one with Docker-Compose and the other with Kubernetes.

Bonus: Deploy using Docker-Compose

Docker-Compose is an amazing tool that can be used to quickly test out interconnected Docker containers. Although Docker Swarm can be used to mimic this in production, with the arrival of Kubernetes, this strategy is not popular.

Dockerfile

First we have to define the Dockerfile. It lies in the deployments folder of each service.

The Dockerfile for the service-user/post goes like this...

1FROM golang:alpine
2
3RUN apk add --no-cache git gcc musl-dev
4
5WORKDIR /usr/app
6
7COPY . .
8
9RUN go mod download && go build -o main.o ./cmd/thrift
10
11EXPOSE 9090
12
13ENTRYPOINT ["/usr/app/main.o"]

Pretty simple right? Well, the image size after this is a mind-boggling 450MiB. To decrease it, we can use multi-stage builds with the actual image using scratch image which contains just the binary from the builder stage. Something to try out yourself.

Similarly, we define the Dockerfile for our service-gql.

1FROM node:alpine
2
3WORKDIR /usr/app
4
5COPY ./package.json .
6
7COPY ./package-lock.json .
8
9RUN npm install --production
10
11COPY ./src ./src
12
13EXPOSE 3000
14
15CMD ["npm", "start", "--production"]

This image makes good use of the caching strategy of Docker for subsequent builds.

Now let's define our docker-compose.yaml file in the root of the project directory.

1version: "3"
2services:
3  service-user:
4    build:
5      context: ./service-user
6      dockerfile: ./deployments/Dockerfile
7    restart: unless-stopped
8    command: -addr service-user:9090
9    ports:
10      - 9090:9090
11
12  service-post:
13    build:
14      context: ./service-post
15      dockerfile: ./deployments/Dockerfile
16    restart: unless-stopped
17    command: -addr service-post:9090
18    ports:
19      - 9091:9090
20
21  service-gql:
22    build:
23      context: ./service-gql
24      dockerfile: ./deployments/Dockerfile
25    restart: unless-stopped
26    ports:
27      - 3000:3000
28    depends_on:
29      - service-user
30      - service-post
31    environment:
32      - GRAPHQL_PORT=3000
33      - SERVICE_USER_HOST=service-user
34      - SERVICE_USER_PORT=9090
35      - SERVICE_POST_HOST=service-post
36      - SERVICE_POST_PORT=9090

I pass arguments to Golang servers since they have to connect to the default network Docker-Compose creates, and add a dependency for service-gql on other one's services with proper environment variables.

Now let's give it a shot!

$ docker-compose up

In the very first run, it will build the images, which will be used in subsequent runs. Here is a screenshot of how it looks in action.

Bonus: Deploy using Kubernetes

This is yet another section for deploying microservices using Kubernetes. To keep things easy, I will be using minikube.

As soon as you start minikube, be sure to enable the ingress addon with the following command...

$ minikube addons enable ingress

This will enable ingress and will enable us to use the ingress-nginx load-balancer to route external traffic to GraphQL. At the same time, be sure to install the ingress-nginx load-balancer from here.

Here is what the cluster should look like at the end of our deployments...

Now let's get started.

Ingress

Ingress is used to manage the flow of external traffic to services within our cluster. We will define a simple ingress-service.yaml which will direct the flow of all traffic ending with /graphql to service-gql on port 3000. Other services (user and post) will not be accessible outside directly, but only through GraphQL (as intended).

1# ingress-service.yaml
2apiVersion: networking.k8s.io/v1beta1
3kind: Ingress
4metadata:
5  name: ingress-service
6  annotations:
7    kubernetes.io/ingress.class: nginx
8spec:
9  rules:
10    - http:
11        paths:
12          - path: /graphql
13            backend:
14              serviceName: service-gql-cluster-ip
15              servicePort: 3000

The annotation implies that we wish to use the infamous Nginx load-balancer.

Defining Deployments

Deployment is the go to way of deploying Pods in a k8s cluster. Let's start by defining the YAML for service-user. The same YAML can be used for service-post too (just be sure to change the names).

1# service-user-deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: service-user-deployment
6spec:
7  replicas: 1
8  selector:
9    matchLabels:
10      app: service-user
11  template:
12    metadata:
13      labels:
14        app: service-user
15    spec:
16      containers:
17        - name: service-user
18          image: ashniu123/thrift-graphql-demo-service-user:v1
19          args:
20            - -addr=:9090
21          resources:
22            limits:
23              memory: "128Mi"
24              cpu: "250m"
25          ports:
26            - containerPort: 9090

Please note that k8s does not build containers like docker-compose does, so you will need to push a built image somewhere (popularly on Docker Hub).

Now let's create service-gql's Deployment. The only difference is that it will include the environment variables that we had passed in the .env file or otherwise. And the fact that the host of the services won't be localhost anymore but the ClusterIP Service we are going to define in the next section.

1# service-gql-deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: service-gql-deployment
6spec:
7  replicas: 1
8  selector:
9    matchLabels:
10      app: service-gql
11  template:
12    metadata:
13      labels:
14        app: service-gql
15    spec:
16      containers:
17        - name: service-gql
18          image: ashniu123/thrift-graphql-demo-service-gql:v1
19          resources:
20            limits:
21              memory: "512Mi"
22              cpu: "250m"
23          ports:
24            - containerPort: 3000
25          env:
26            - name: GRAPHQL_PORT
27              value: "3000"
28            - name: SERVICE_USER_HOST
29              value: service-user-cluster-ip
30            - name: SERVICE_USER_PORT
31              value: "9090"
32            - name: SERVICE_POST_HOST
33              value: service-post-cluster-ip
34            - name: SERVICE_POST_PORT
35              value: "9090"

Defining ClusterIP Service

ClusterIP Service is used to expose a Pod into the cluster so that it is accessible by other services (as seen in the YAMLs above). The YAML's structure is the same for all services. Below is the service-gql ClusterIP Service.

1# service-gql-cluster-ip.yaml
2apiVersion: v1
3kind: Service
4metadata:
5  name: service-gql-cluster-ip
6spec:
7  type: ClusterIP
8  ports:
9    - port: 3000
10      targetPort: 3000
11  selector:
12    app: service-gql

The port is what will be exposed within the Cluster and targetPort is the port in the container which the ClusterIP has to expose.

Now let's go up a directory and apply all YAMLs at once.

$ kubectl apply -R -f k8s/

Here is a screenshot of all Objects running in my minikube.

And here is working with Insomnia client. Be sure to run minikube ip to get the ip of the Cluster as localhost:3000 will not show anything.

Amazing right?

If you liked the post, please share it!