nisheet sinvhal

Microservices using Thrift RPC, Golang, and Nodejs (and GraphQL)

2020-05-15microservicesrpc

What YAMLs are to Kubernetes, RPCs are to Microservices

RPC is the best way for inter-service communication in a microservices architecture. It works similar to how a local procedure would do. Simply call the function!

Apache Thrift

Apache Thrift is a complete RPC framework developed by Facebook and later came under the umbrella of the Apache Foundation. The most notable software which uses Apache Thrift (simply “thrift” henceforth) is Cassandra. Being a complete RPC framework, it provides both Interface Definition Language (IDL) in form of .thrift files and the communication protocol for defining the services.

gRPC is another (more) popular one, but it is not a complete RPC framework, as gRPC deals with just the communication aspect of services while the IDL is defined through Protobufs.

I chose thrift because I have worked with it extensively on year long project of mine (will update the blog with it soon). There are some pros and cons of using gRPC over Apache Thrift and vice versa, but that’s a story for another day.

Why Golang “AND” Nodejs

I recently began learning Golang and wanted to expand my horizons by building some projects with it. Nodejs just because I am quite adept at it (I’d like to believe). Also, Nodejs has the best support for GraphQL because of Apollo GraphQL. The main advantage of using RPCs is that they enable communication between different languages, and I found it to be a suitable to experiment for the new language I am learning.

Directory structure

The project is out there on Github on branch v1.

- thrift-graphql-demo/
  - service-gql/
    - build/          // script to generate client and server stub in Nodejs
    - deployments/    // contains Dockerfile
    - src/            // main application code including .thrift, generated stubs, and graphql schema, resolvers, etc
    - package.json
  - service-user/
    - build/          // script to generate client and server stub in Golang
    - deployments/    // contains Dockerfile
    - cmd/            // main application code - server implementation
    - internal/       // .thrift file and generated stubs
    - go.sum
    - go.mod
  - service-post/
    - build/          // script to generate client and server stub in Golang
    - deployments/    // contains Dockerfile
    - cmd/            // main application code - server implementation
    - internal/       // .thrift file and generated stubs
    - go.sum
    - go.mod
  - k8s/              // Bonus section - Deployment using Kubernetes
  - docker-compose.yaml

Thrift IDL

The idea is simple:

  • service-user stores user related information
  • service-post stores posts from each user
  • service-gql acts as aggregator and API Gateway

Let’s start by defining a simple implementation for each of the 2 thrift servers. The following is the IDL for service-user. It returns a PingResponse to the ping() procedure when called.

namespace go user
namespace js user

typedef i32 int

struct PingResponse {
  1: optional string message
  2: optional int version
}

service UserService {
  PingResponse ping(),
}

Similarly, the service-post IDL can be defined (just replace “user” with “ping”).

Generating stubs

The stubs can be generated by running…

$ ./build/thrift-gen.sh

Which internally just has the following command with some helpers to create the directory beforehand.

thrift -r -out ./internal/thrift-rpc --gen go ./internal/thrift/user.thrift

This generates the following files and directory under internal directory.

- internal/
  - thrift/           //  Contains .thrift file
  - thrift-rpc/user/
    - user.go         //  the service definition, as in UserService, PostService...
    - user-consts.go  //  the structs and types defined in the IDL, as in PingResponse, int...

For Golang specifically, 2 additional files are generated - user_service-remote.go which is a client. Pretty handy I for testing purposes, though our client implementation will actually be in Nodejs in another directory.

In the service-gql, the same bash script will generate the stubs for Nodejs in the same way.

Golang implementation

Now that our stubs are generated, we can implement them. For the simple IDL that we have defined, it is quite easy. Inside the cmd directory, create the following folders and files…

- cmd/
  - impl/             //  implementation of the service procedure
    - handler.go      //  barebones handler - UserHandler, PostHandler
    - ping.go         //  implmentation of the ping() procedure
  - server/           //  thrift server, which will listen to incoming connection will reside
    - server.go       //  `RunServer` which takes in the protocols, transports, handlers to create the socket
  - thrift/           //  "package main"
    - main.go         //  takes in the address to listen on flag and runs the `server.go`

Handler

The handler is a simple struct to which the procedure implementations are added.

// handler.go
package impl

type UserHandler struct{}

func NewUserHandler() *UserHandler {
	return &UserHandler{}
}

ping() implementation

ping() is a simple function which takes in no parameters and returns a response of type PingResponse

// ping.go
package impl

var ApiVersion user.Int

func (p *UserHandler) Ping(ctx context.Context) (*user.PingResponse, error) {
	message := "ping"
	log.Println(message)

	PingResponse := user.NewPingResponse()
	PingResponse.Message = &message
	PingResponse.Version = &ApiVersion

	return PingResponse, nil
}

Note that ApiVersion is of type user.Int. This comes from the fact that we had declared int32 of all programming languages to be of type int in the IDL. I set its value in the main.go file.

Golang server

The server takes in transport (TTransportFactory), protocol (TProtocolFactory), address (string), and TLS enable (boolean) to create a socket listening on the port provided while running the program (next up).

Transport abstracts away the Serialisation and Deserialisation which happens in the system. Protocol is also an abstraction which defines the mechanism through which Serialisation and Deserialisation is done. To know more about the transport and protocol, I recommend going over to the official docs.

//server.go
package server

func RunServer(transportFactory thrift.TTransportFactory, protocolFactory thrift.TProtocolFactory, addr string, secure bool) error {
	transport, err := thrift.NewTServerSocket(addr)
	if err != nil {
		return err
	}

	handler := impl.NewUserHandler()
	processor := user.NewUserServiceProcessor(handler)
	server := thrift.NewTSimpleServer4(processor, transport, transportFactory, protocolFactory)

	log.Printf("Starting simple thrift server on: %s\n", addr)
	return server.Serve()
}

I have skipped enabling TLS in the server for this demo.

Running the Golang server

Now that everything is setup and ready to run. All we have to write is our main file. I take some flags to pass in the address to listen on dynamically, which is defaulted to localhost:9090.

// main.go
package main

var (
	addr = flag.String("addr", "localhost:9090", "Thrift Address to listen on")
)

var ApiVersion = 1

func main() {
	flag.Parse()

	impl.ApiVersion = user.Int(ApiVersion)

	transportFactory := thrift.NewTBufferedTransportFactory(8192)
	protocolFactory := thrift.NewTCompactProtocolFactory()

	if err := server.RunServer(transportFactory, protocolFactory, *addr, false); err != nil {
		log.Println("error running thrift server: ", err)
	}
}

Wewf, that was super easy 🤪 (No?)! All that we have to do now is run the commands from the CLI.

Say the magic words after me! GO… RUN …

$ go run ./cmd/thrift/main.go
2020/05/15 00:33:15 Starting simple thrift server on: localhost:9090

Now we replicate the same behaviour in service-post (for simplicity, just copy over the same files and replace “User” with “Post”).

Apollo GraphQL

Now that our Thrift server is up and running. We have to implement the Client for it. We will use Nodejs for implementing GraphQL, which will acts as an orchestrator for other services and should be the only exposed service. Let us begin!

I just want to point out that I have been using Nodejs for quite a few years now and I am a big fan of doing things in “High cohesion and Low coupling” manner (or simply put - very modular).

Inside the src folder of service-gql, create the following files and folders…

- src/
  - resolvers/         //  graphql resolvers
    - post/            //  resolver for service-post
      - post.js        //  aggregator for all PostService procedures
      - postPing.js    //  implementation of ping() for service-post
    - user/            //  resolver for service-post
      - user.js        //  aggregator for all UserService procedures
      - userPing.js    //  implementation of ping() for service-user
    - resolvers.js     //  aggregator for resolvers of all services
  - thrift/            //  generated thrift IDLs
  - thriftClients/     //  UserService and PostService clients
  - typeDefs           //  schema definition by using `gql` string templates
    - user.js          //  schema definition for UserService
    - post.js          //  schema definition for PostService
    - typeDefs.js      //  aggregator for resolvers of all typeDefs
  - utils/             //  helper functions like logger
  - app.js             //  the "main" file to run

Schema definition

Lets get started with the typeDefs. I use the Apollo GraphQL’s gql string templates to create schema as it keeps things extremely modular which enables easy modification at a later point in time.

// typeDefs.js
const { gql } = require("apollo-server-express");
const user = require("./user");
const post = require("./post");

const base = gql`
  type PingResponse {
    message: String
    _version: Int
  }

  type Query {
    ping: PingResponse!
  }
`;

module.exports = [base, user, post];

This is the “base” the schema, things common to all services or otherwise should go here (like PingResponse). Unless PingResponse is going to change in either IDL, they should go into base, otherwise into their own files.

Let us extend this base to include the UserService and PostService details.

// post.js
const { gql } = require("apollo-server-express");

module.exports = gql`
  extend type Query {
    postPing: PingResponse!
  }
`;
// user.js
const { gql } = require("apollo-server-express");

module.exports = gql`
  extend type Query {
    userPing: PingResponse!
  }
`;

Pretty simple huh? As the name suggests, the extend keyword in GraphQL includes it within the Query type of base. Now that we are done with defining our schema, lets get on with our resolvers.

Thrift Client implementation

The thrift IDLs are same as those in other services (simply copied over) under thrift/ directory. The stubs for Nodejs are generated by running the following command from root of service-gql directory.

$ ./build/thrift-gen.sh

The underlying command is similar to how it was for Golang. Now that the stubs are generated, let’s proceed with the client implementation.

I will demonstrate UserService’s client; PostService’s is same just with some variable name differences.

// userClient.js
const thrift = require("thrift");
const UserService = require("../thrift/user/UserService");
const logger = require("../utils/logger");

const SERVER_HOST =
  process.env.SERVICE_USER_HOST || process.env.SERVICE_USER_CLUSTER_IP_SERVICE_SERVICE_HOST;
const SERVER_PORT = parseInt(process.env.SERVICE_USER_PORT);

logger.info(`userClient: ${SERVER_HOST} ${SERVER_PORT}`);

const thriftOptions = {
  transport: thrift.TBufferedTransport,
  protocol: thrift.TCompactProtocol
};

const connection = thrift.createConnection(SERVER_HOST, SERVER_PORT, thriftOptions);
let client;

connection.on("error", err => {
  logger.error(`userClient: ${JSON.stringify(err)}`);
});

connection.on("connect", () => {
  client = thrift.createClient(UserService, connection);
  logger.info("userClient: Connected to thrift server!");
});

connection.on("close", () => {
  logger.info("userClient: Disconnected from thrift server!");
  process.exit(1);
});

process.on("SIGTERM", connection.end);

/**
 * thrift user client
 * @param {string} func thrift function to call
 * @param {object[]} params params to pass to the thrift function
 */
const userClient = (func, params) =>
  new Promise((resolve, reject) => {
    client[func](...params)
      .then(resolve)
      .fail(reject);
  });

module.exports = userClient;

That is some client, right?! I shall explain…

I fetch the SERVER_HOST and SERVER_POST from environment variables. The noticeable variable is process.env.SERVICE_USER_CLUSTER_IP_SERVICE_SERVICE_HOST I presume. This is set in Kubernetes which is a part of the Bonus! section towards the end of the blog.

Otherwise, it is actually pretty straightforward, use the same transport and protocol as the thrift server, create a connection, log valuable messages according to whether the connection was established or not. The reconnect strategy that I have chosen is - nothing! I hope that the application is deployed using Docker-Compose or Kubernetes which will make use of the process.exit(1); to trigger rerun of the container in which this service resides.

The magic part here is the userClient variable. Do you know that everything is an object is Nodejs (almost everything)? I just make use of that. I wanted to keep things clean and as reusable as possible. Simply pass in the function name as string and the params (just like in Golang) and call the function. Luckily, this looks easier and/or messier in Javascript. Its usage will be more clear in the next section.

Just a side note that Apache Thrift uses Q library which is archaic! From a time before Promises were natively a part of Javascript Language Specification. So it actually wraps the Q promise returned by Thrift into a Native Promise. Neat right?

Resolvers implementation

The aggregator file resolvers.js goes like this…

// resolvers.js
const _ = require("lodash");
const user = require("./user/user");
const post = require("./post/post");

const base = {
  Query: {
    ping: () => ({ message: "ping", _version: 1 })
  }
};

module.exports = _.merge(base, user, post);

I use lodash’s merge to combine Objects with same keys as we will see next in our implementation of UserService’s resolver.

The UserService’s user.js aggregator looks like this…

// user.js
const userPing = require("./userPing");

module.exports = {
  Query: {
    userPing
  }
};

Since this is the aggregator, the implementation actually lies in another file.

// userPing.js
const userClient = require("../../thriftClients/user");

module.exports = async () => {
  try {
    const { message, version: _version } = await userClient("ping", []);
    return { message, _version };
  } catch (e) {
    throw e;
  }
};

Here we make use of our thriftClient which I have described in the previous section. Since ping() does not take in any param, I have passed an empty array. Otherwise, I would have passed equal number of arguments as defined in the IDL (wrapped in desired class) AND in the same order.

You might wonder why try-catch when I am just throwing it right? It is just because in case I wanted to do some rollbacks in case of multi-service operations, I could depending upon the error message. That will be a blog post for another day though.

Now go on and do the same for PostService (just copy the files over and replace “user” with “post”).

The legendary app.js

Now that all our services and its implementations have been defined. Let’s get on with the implementation of the HTTP server. It uses express and apollo-server-express to run the server. Here goes the implementation…

require("dotenv/config");
const express = require("express");
const compression = require("compression");
const bodyParser = require("body-parser");
const helmet = require("helmet");
const { ApolloServer } = require("apollo-server-express");
const typeDefs = require("./typeDefs/typeDefs");
const resolvers = require("./resolvers/resolvers");
const logger = require("./utils/logger");

const app = express();

app.set("port", process.env.GRAPHQL_PORT);
app.use(compression());
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));
app.use(helmet());

app.get("/healthz", (_req, res) => {
  res.json({
    health: "ok",
    version: 1
  });
});

const apolloServer = new ApolloServer({
  typeDefs,
  resolvers,
  formatResponse: response => {
    logger.info(JSON.stringify(response));
    return response;
  },
  formatError: error => {
    logger.error(JSON.stringify(error));
    return error;
  }
});

apolloServer.applyMiddleware({ app, path: "/" });

/** Start Express server. */
const server = app.listen(app.get("port"), () => {
  console.log(
    "  App is running at http://localhost:%d in %s mode",
    app.get("port"),
    app.get("env")
  );
  console.log("  Press CTRL-C to stop\n");
});

module.exports = { app, server };

If you are familiar with express then this will be self-explanatory. I will only point out the fact that to run the server, necessary environment variables like GRAPHQL_PORT have to be defined. To make life easier, I make use of dotenv package which loads environment variables at runtime into the application from a special .env file. So let’s define that in the root of service-gql directory.

GRAPHQL_PORT=3000
SERVICE_USER_HOST=localhost
SERVICE_USER_PORT=9090
SERVICE_POST_HOST=localhost
SERVICE_POST_PORT=9091

This considers that the service-user is running on localhost:9090 and service-post is running on localhost:9091.

Now start the thrift servers and then lets run this server…

$ npm start # essentially running app.js
> service-gql@0.0.1 start /home/nsinvhal/Workspace/Go/src/github.com/ashniu123/thrift-graphql-demo/service-gql
> node src/app.js

(59254) [2020-05-15T17:29:27.651Z] info: userClient: localhost 9090
(59254) [2020-05-15T17:29:27.668Z] info: postClient: localhost 9091
  App is running at http://localhost:3000 in development mode
  Press CTRL-C to stop

(59254) [2020-05-15T17:29:27.752Z] info: userClient: Connected to thrift server!
(59254) [2020-05-15T17:29:27.753Z] info: postClient: Connected to thrift server!

Voila! All services are connected and running, exposed through a GraphQL server.

services running in terminal

Since this is running in development mode. We can go ahead and check the Apollo GraphQL playground and see it in action!

Here is a screenshot of it in action!

GraphQL Playground

Go ahead and try it yourself!

Final Words

This has been quite a long post and might seem too much just to implement a simple ping() function between 2 services. I would not disagree, but consider the following fact - the skeleton has been laid out! Now adding more procedures anywhere is a piece of cake! All you have to do is to do the following…

  1. Update .thrift file and generate stubs through the thrift-gen.sh file
  2. Copy the thrift definition over to service-gql and run the same file as above
  3. Add server implementation in Golang
  4. Add typeDefs and resolvers in Nodejs

That’s it! And this is replicable for any other service(s) you might want to add.

Bonus: Deployment

Now lets get onto the deployment part. How do we deploy these microservices? I’ll show 2 ways - one with Docker-Compose and the other with Kubernetes.

Bonus: Deploy using Docker-Compose

Docker-Compose is an amazing tool which can be used to quickly test out interconnected Docker containers. Although Docker Swarm can be used to mimic this in production, with the arrival of Kubernetes, this strategy is not popular.

Dockerfile

First we have to define the Dockerfile. It lies in the deployments folder of each service.

The Dockerfile for the service-user/post goes like this…

FROM golang:alpine

RUN apk add --no-cache git gcc musl-dev

WORKDIR /usr/app

COPY . .

RUN go mod download && go build -o main.o ./cmd/thrift

EXPOSE 9090

ENTRYPOINT ["/usr/app/main.o"]

Pretty simple right? Well the image size after this is a mind boggling 450MiB. To decrease it, we can use multi-stage builds with the actual image using scratch image which contains just the binary from the builder stage. Something to try out yourself.

Similarly, we define the Dockerfile for our service-gql.

FROM node:alpine

WORKDIR /usr/app

COPY ./package.json .

COPY ./package-lock.json .

RUN npm install --production

COPY ./src ./src

EXPOSE 3000

CMD ["npm", "start", "--production"]

This image makes good use of caching strategy of Docker for subsequent builds.

Now let’s define our docker-compose.yaml file in the root of the project directory.

version: "3"
services:
  service-user:
    build:
      context: ./service-user
      dockerfile: ./deployments/Dockerfile
    restart: unless-stopped
    command: -addr service-user:9090
    ports:
      - 9090:9090

  service-post:
    build:
      context: ./service-post
      dockerfile: ./deployments/Dockerfile
    restart: unless-stopped
    command: -addr service-post:9090
    ports:
      - 9091:9090

  service-gql:
    build:
      context: ./service-gql
      dockerfile: ./deployments/Dockerfile
    restart: unless-stopped
    ports:
      - 3000:3000
    depends_on:
      - service-user
      - service-post
    environment:
      - GRAPHQL_PORT=3000
      - SERVICE_USER_HOST=service-user
      - SERVICE_USER_PORT=9090
      - SERVICE_POST_HOST=service-post
      - SERVICE_POST_PORT=9090

I pass arguments to Golang servers since they have to connect to the default network Docker-Compose creates, and add dependency for service-gql on other ones services with proper environment variables.

Now let’s give it a shot!

$ docker-compose up

In the very first run, it will build the images, which will be used in subsequent runs. Here is a screenshot of how it looks in action.

docker-compose up

Bonus: Deploy using Kubernetes

This is yet another section for deploying microservices using Kubernetes. To keep things easy, I will be using minikube.

As soon as you start minikube, be sure to enable ingress addon with the following command…

$ minikube addons enable ingress

This will enable ingress and will enable us to use the ingress-nginx loadbalancer to route external traffic to GraphQL. At the same time, be sure to install the ingress-nginx loadbalancer from here.

Here is how the cluster should look like at the end of our deployments…

k8s architecture

Now let’s get started.

Ingress

Ingress is used to manage flow of external traffic to services within our cluster. We will define a simple ingress-service.yaml which will direct flow of all traffic ending with /graphql to service-gql on port 3000. Other services (user and post) will not be accessible outside directly, but only through graphql (as intended).

# ingress-service.yaml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-service
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
    - http:
        paths:
          - path: /graphql
            backend:
              serviceName: service-gql-cluster-ip
              servicePort: 3000

The annotation implies that we wish to use the infamous nginx loadbalancer.

Defining Deployments

Deployment is the “goto” way of deploying Pods in a k8s cluster. Let’s start by defining the YAML for service-user. The same YAML can be used for service-post too (just be sure to change the names).

# service-user-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-user-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: service-user
  template:
    metadata:
      labels:
        app: service-user
    spec:
      containers:
        - name: service-user
          image: ashniu123/thrift-graphql-demo-service-user:v1
          args:
            - -addr=:9090
          resources:
            limits:
              memory: "128Mi"
              cpu: "250m"
          ports:
            - containerPort: 9090

Please note that k8s does not build containers like docker-compose does, so you will need to push a built image somewhere (popularly on Docker Hub).

Now lets create service-gql’s Deployment. The only difference is that it will include the environment variables which we had passed in the .env file or otherwise. And the fact that the host of the services won’t be localhost anymore but the ClusterIP Service we are going to define in the next section.

# service-gql-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-gql-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: service-gql
  template:
    metadata:
      labels:
        app: service-gql
    spec:
      containers:
        - name: service-gql
          image: ashniu123/thrift-graphql-demo-service-gql:v1
          resources:
            limits:
              memory: "512Mi"
              cpu: "250m"
          ports:
            - containerPort: 3000
          env:
            - name: GRAPHQL_PORT
              value: "3000"
            - name: SERVICE_USER_HOST
              value: service-user-cluster-ip
            - name: SERVICE_USER_PORT
              value: "9090"
            - name: SERVICE_POST_HOST
              value: service-post-cluster-ip
            - name: SERVICE_POST_PORT
              value: "9090"

Defining ClusterIP Service

ClusterIP Service is used to expose a Pod into the cluster so that it is accessible by other services (as seen in the YAMLs above). The YAML’s structure is same for all services. Below is the service-gql ClusterIP Service.

# service-gql-cluster-ip.yaml
apiVersion: v1
kind: Service
metadata:
  name: service-gql-cluster-ip
spec:
  type: ClusterIP
  ports:
    - port: 3000
      targetPort: 3000
  selector:
    app: service-gql

The port is what will be exposed within the Cluster and targetPort is the port in the container which the ClusterIP has to expose.

Now let’s go up a directory and apply all YAMLs at once.

$ kubectl apply -R -f k8s/

Here is a screenshot of all Objects running in my minikube.

kubectl minikube

And here is its working in Insomnia client. Be sure to run minikube ip to get the ip of the Cluster as localhost:3000 will not show anything.

insomnia kubernetes

Amazing right?