Dlog

Memory Efficient Content Hash Calculation In Go With io.MultiWriter and Streaming

📖 3 min read

Introduction

In modern web applications, efficiently processing and storing uploaded files is a common requirement. One important aspect is calculating a hash of the file content to ensure data integrity, verify uniqueness, and optimize storage. In this blog post, we will explore how to achieve this in Go using the io.MultiWriter function.

io.MultiWriter

Let’s start by examining the code snippet below, which demonstrates the usage of io.MultiWriter in an HTTP handler that expects file uploads:

package main

import (
	"crypto/sha256"
	"fmt"
	"io"
	"net/http"
	"os"
)

func main() {
	http.HandleFunc("/upload", func(w http.ResponseWriter, r *http.Request) {
		var err error
		var tf *os.File

		tf, err = os.CreateTemp(os.TempDir(), "upload-*")
		if err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			return
		}
		defer func() {
			tf.Close()
			os.Remove(tf.Name())
		}()

		hash := sha256.New()
		mw := io.MultiWriter(tf, hash)

		_, err = io.Copy(mw, r.Body)
		if err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			return
		}

		// close the temp file so that it can be moved via renaming
		err = tf.Close()
		if err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			return
		}

		err = os.Rename(tf.Name(), fmt.Sprintf("%x", hash.Sum(nil)))
		if err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			return
		}
	})
	http.ListenAndServe(":1337", nil)
}

Here’s a breakdown of what’s happening in the code snippet:

  1. We define an HTTP handler function for the “/upload” endpoint, which is responsible for handling file uploads.
  2. Inside the handler function, a temporary file (tf) is created using os.CreateTemp to temporarily store the uploaded file.
  3. To calculate the hash of the request body content, a sha256 hash digest is created using sha256.New().
  4. The io.MultiWriter function is used to create a writer that duplicates its writes to all provided writers (our temp file tf and the sha256 hash digest), similar to the Unix tee(1) command.
  5. The io.MultiWriter (mw) is passed to io.Copy to efficiently copy the request body content to the temporary file and the hash digest.
  6. After successfully copying the upload request body, the temporary file is closed to ensure all data is flushed and ready to be moved via renaming.
  7. Finally, the temporary file is renamed using the calculated hash as the new filename. This approach guarantees file uniqueness and integrity. This step can be swapped out for copy the temp file contents into a longer term storage destination.

The incremental nature of hashing functions is one of the factors in this code that enables memory efficiency. The hash function (sha256 in this case) does not need to store all of the contents that have been written to it. Instead, it processes the data in chunks as it is received, keeping memory usage low and allowing efficient handling of large files.

The line fmt.Sprintf("%x", hash.Sum(nil)) specifically handles the conversion of the hash value to a hexadecimal string representation:

  • hash.Sum(nil) returns the hash value as a byte slice. The Sum method is called on the hash object, which in this case is an instance of sha256.Hash (*sha256.digest).
  • The nil argument passed to Sum indicates that we want the final hash value and do not want to append any additional data.
  • The Sum method calculates the hash based on the data that has been written to it using io.MultiWriter, which, in this case, is the request body content.
  • The result of hash.Sum(nil) is a byte slice representing the hash value.
  • To convert the byte slice to a hexadecimal string representation, the fmt.Sprintf function is used:
    • The %x verb in the format string specifies that the byte slice should be formatted as a hexadecimal string.
    • fmt.Sprintf returns the formatted string representation of the byte slice.

Conclusion

By following this approach, you can efficiently calculate the hash of the request body content and store the uploaded files. Utilizing io.MultiWriter allows simultaneous writing to multiple destinations, making it a memory-efficient and convenient solution for content hashing in Go. Remember to handle errors appropriately to ensure a robust and reliable file upload process in your application.