05/10/20

Git clone over gRPC

Clone git repositories over gRPC to simplify your backend infrastructure

9 Min Read

Git supporting HTTPS and SSH for cloning always felt sufficient, until it didn't. It works perfectly fine for developers cloning a repository! But when we were building the infrastructure to power Encore's new web-based editor, we wanted to programmatically clone Git repositories, that plugged easily into the rest of our infrastructure.

Like many backends, Encore uses gRPC internally for cross-service communication. This makes it trivial to do end-to-end authentication and encryption. We really didn't want to build a whole separate authentication mechanism with SSH keys just to be able to interact with Git repositories.

Enter git-remote-helpers

Git provides a mechanism called git remote helpers that delegates communication between the git client and server to an external program. So how difficult is it be to build a git remote helper that uses gRPC instead? Very easy, it turns out!

In summary, when git encounters a remote URL called <transport>://<address> it looks for an executable named git-remote-<transport> on the $PATH. It invokes it with either one or two arguments, where the first argument is the remote name (if available), and the second argument is <address> (if available). If either argument is unavailable that argument is elided, which while annoying is not too difficult to handle.

Git remote helpers can support different capabilities and protocols for communicating with the remote server, but for our purposes the most useful (and flexible) capability is connect. The connect command is responsible for connecting to the remote server and invoking either git-receive-pack or git-upload-pack on the remote end, and connect them to stdin and stdout on the client side. This way we can let git handle the application protocol (which is the packfile transfer protocol – a complex beast) while we only have to implement the transport protocol.

Implementing git-remote-grpc

So with this functionality we can now build our own git-remote-grpc executable that communicates over gRPC.

If you wish to skip ahead and view the full solution, view the full source code on GitHub.

Defining the schema

The first part is to define the gRPC schema using protobufs. Since we're implementing the connect capability, which requires bidirectional communication, gRPC streams are a perfect fit.

A slightly tricky aspect is that we need to send some additional metadata to establish the connection. This includes what "service" to invoke (git-receive-pack or git-upload-pack) as well as an identifier for the repository. This is easiest to do with header metadata for the stream, so these will not be part of the protobuf schema. The schema then looks like this:

syntax = "proto3"; import "google/protobuf/empty.proto"; package gitpb; service Git { rpc Connect (stream Data) returns (stream Data); } message Data { bytes data = 1; }

We can generate the associated .go file with:

$ cd path/to/repo $ protoc -I . --go_out=plugins=grpc,paths=source_relative:./ gitpb/gitpb.proto

Implementing the client

Next let's start writing the main function to our program. Since we're only supporting the connect capability we'll implement a simple switch.

func main() { if err := run(os.Args); err != nil { fmt.Fprintf(os.Stderr, "%s: %v\n", os.Args[0], err) os.Exit(1) } } func run(args []string) error { stdin := bufio.NewReader(os.Stdin) stdout := os.Stdout // Read commands from stdin. for { cmd, err := stdin.ReadString('\n') if err != nil { return fmt.Errorf("unexpected error reading stdin: %v", err) } cmd = cmd[:len(cmd)-1] // skip trailing newline switch { case cmd == "capabilities": if _, err := stdout.Write([]byte("*connect\n\n")); err != nil { return err } case strings.HasPrefix(cmd, "connect "): service := cmd[len("connect "):] return connect(args, service, stdin, stdout) default: return fmt.Errorf("unsupported command: %s", cmd) } } }

This simply parses the commands and responds accordingly. When the connect command is invoked it delegates to the connect function, which we'll define below.

The connect function is responsible for parsing the remote address, establishing a gRPC connection, and proxying stdin/stdout data over the connection. We pass the "service" to call (git-upload-pack or git-receive-pack) as well as which repository to connect to as gRPC headers.

We'll start by defining a helper function to dial the gRPC server and set up the stream.

// grpcConnect parses the remote address from the args // and invokes the Connect endpoint. func grpcConnect(args []string, svc string) (gitpb.Git_ConnectClient, error) { var ( addr *url.URL err error ) switch { case len(args) >= 3: addr, err = url.Parse(args[2]) case len(args) == 2: addr, err = url.Parse(args[1]) default: err = fmt.Errorf("no address given") } if err != nil { return nil, fmt.Errorf("parsing remote address: %v", err) } // Connect to gRPC. Here is where you would add authentication logic. // If testing locally without TLS certificates, you can add grpc.WithInsecure() to bypass TLS. conn, err := grpc.Dial(addr.Host, grpc.WithBlock()) if err != nil { return nil, fmt.Errorf("dial %s: %v", addr.Host, err) } gitClient := gitpb.NewGitClient(conn) // Identify the repository by the address path. repoPath := strings.TrimPrefix(addr.Path, "/") ctx := metadata.AppendToOutgoingContext(context.Background(), "service", svc, "repository", repoPath, ) return gitClient.Connect(ctx) }

Finally we're ready to implement connect(). It's a bit more complex than the previous segments, but I've tried to make everything as understandable as possible through comments.

// connect implements the "connect" capability by copying data // to and from the remote end over gRPC. func connect(args []string, svc string, stdin io.Reader, stdout io.Writer) error { stream, err := grpcConnect(args, svc) if err != nil { return err } // Communicate to Git that the connection is established os.Stdout.Write([]byte("\n")) // writerErr captures the error from the writer side. // nil indicates stdin was closed and everything was successful. // io.EOF indicates a failure to send the message on the stream. writerErr := make(chan error, 1) // Writer goroutine that reads from stdin and writes to the stream. // Sends on writeErr when done sending. nil indicates success and // a non-nil error indicates something went wrong. go func() { var buf [1024]byte for { n, readErr := stdin.Read(buf[:]) if n > 0 { if err := stream.Send(&gitpb.Data{Data: buf[:n]}); err != nil { writerErr <- fmt.Errorf("write: %v", err) return } } if readErr != nil { // We couldn't read from stdin. io.EOF indicates that stdin was closed, // which is expected and translates into closing the write end of the stream. if readErr == io.EOF { stream.CloseSend() writerErr <- nil } else { writerErr <- fmt.Errorf("reading stdin: %v", readErr) } return } } }() // Read from the stream and copy it to stdout. // If the reads complete successfully it waits // for the write end to complete before returning. for { msg, err := stream.Recv() if err == io.EOF { // No more data from the server. // Wait for the write end to complete. if err := <-writerErr; err != nil { return err } return nil } else if err != nil { return fmt.Errorf("read: %v", err) } else { if _, err := stdout.Write(msg.Data); err != nil { return fmt.Errorf("writing stdout: %v", err) } } } }

That's all we need on the client side! Hooray! Now, over to the server side.

Implementing the gRPC server

The server side is even easier than the client side. What it will do is receive gRPC requests, validate the service and repository headers, and invoke the native git-receive-pack and git-upload-pack executables that are distributed with git with the gRPC stream hooked up as stdin and stdout.

First we'll define the main Connect method on the server. It will use a few helpers that we will define later.

type Server struct { } // Assert that Server implements gitpb.GitServer. var _ gitpb.GitServer = (*Server)(nil) func (s *Server) Connect(stream gitpb.Git_ConnectServer) error { // Parse the repository path and service to invoke from the gRPC header. // Note that the svc is invoked directly as an executable, so parseHeader // must validate these very carefully! svc, repoPath, err := parseHeader(stream) if err != nil { return err } // Run the service, using two helper types for streaming stdin/stdout over gRPC. var stderr bytes.Buffer cmd := exec.Command(svc, repoPath) cmd.Stdin = &streamReader{stream: stream} cmd.Stdout = &streamWriter{stream: stream} cmd.Stderr = &stderr if err := cmd.Run(); err == nil { return nil } return status.Errorf(codes.Internal, "%s failed: %s", svc, stderr.Bytes()) }

Next we'll implement the parseHeader function that was referred to above:

// parseHeader parses the gRPC header and validates the service and repository paths. func parseHeader(stream gitpb.Git_ConnectServer) (service, repoPath string, err error) { ctx := stream.Context() md, ok := metadata.FromIncomingContext(ctx) if !ok { return "", "", status.Error(codes.InvalidArgument, "missing stream metadata") } repo := md.Get("repository") svc := md.Get("service") if len(repo) != 1 || len(svc) != 1 { return "", "", status.Errorf(codes.InvalidArgument, "invalid repository (%v) or service (%v)", repo, svc) } // DANGER: Check the service name against the whitelist to guard against remote execution. if svc[0] != pushSvc && svc[0] != fetchSvc { return "", "", status.Errorf(codes.InvalidArgument, "bad service: %s", svc[0]) } // TODO: Change this to your own validation logic to make sure the repository is one // you want to expose. if true { return "", "", status.Errorf(codes.InvalidArgument, "unknown repository: %s", repo[0]) } return svc[0], repo[0], nil }

Reading and writing from the stream

Finally, we can write helper types that implement io.Writer and io.Reader using the gRPC stream. They look like this:

// streamWriter implements io.Writer by sending the data on stream. type streamWriter struct { stream gitpb.Git_ConnectServer } func (sw *streamWriter) Write(p []byte) (int, error) { err := sw.stream.Send(&gitpb.Data{Data: p}) if err != nil { return 0, err } return len(p), nil }
// streamReader implements io.Reader by reading from stream. type streamReader struct { stream gitpb.Transfer_ConnectServer buf []byte } func (sr *streamReader) Read(p []byte) (int, error) { // If we have remaining data from the previous message we received // from the stream, simply return that. if len(sr.buf) > 0 { n := copy(p, sr.buf) sr.buf = sr.buf[n:] return n, nil } // No more buffered data, wait for a new message from the stream. msg, err := sr.stream.Recv() if err != nil { return 0, err } // Read as much data as possible directly to the waiting caller. // Anything remaining beyond that gets buffered until the next Read call. n := copy(p, msg.Data) sr.buf = msg.Data[n:] return n, nil }

Putting it all together

That's it! By installing git-remote-grpc into $PATH and running the gRPC server, we can now clone over gRPC.

Let's set up a demo repository:

$ mkdir ~/tmp/demo && cd ~/tmp/demo $ git init Initialized empty Git repository in ~/tmp/demo/.git/ $ echo "Hello, world!" > example.txt $ git add -A && git commit -m 'Initial commit'

We can then run the gRPC server like so:

$ cd $HOME/tmp && grpc-server -addr=localhost:8080 2020/05/11 12:00:00 listening for grpc on localhost:8080

And finally we can clone over gRPC:

$ git clone grpc://localhost:8080/demo remote: Enumerating objects: 3, done. remote: Counting objects: 100% (3/3), done. remote: Total 3 (delta 0), reused 0 (delta 0) Receiving objects: 100% (3/3), done. $ cat example.txt Hello, world!

And there you have it.

Encore

This blog is presented by Encore, the Development Platform for startups building event-driven and distributed systems.

Like this article?
Get future ones straight to your mailbox.

You can unsubscribe at any time.