Memory Efficient Content Hash Calculation In Go With io.MultiWriter and Streaming
Introduction
In modern web applications, efficiently processing and storing uploaded files is a common requirement. One important aspect is calculating a hash of the file content to ensure data integrity, verify uniqueness, and optimize storage. In this blog post, we will explore how to achieve this in Go using the io.MultiWriter
function.
io.MultiWriter
Let’s start by examining the code snippet below, which demonstrates the usage of io.MultiWriter
in an HTTP handler that expects file uploads:
package main
import (
"crypto/sha256"
"fmt"
"io"
"net/http"
"os"
)
func main() {
http.HandleFunc("/upload", func(w http.ResponseWriter, r *http.Request) {
var err error
var tf *os.File
tf, err = os.CreateTemp(os.TempDir(), "upload-*")
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
return
}
defer func() {
tf.Close()
os.Remove(tf.Name())
}()
hash := sha256.New()
mw := io.MultiWriter(tf, hash)
_, err = io.Copy(mw, r.Body)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
return
}
// close the temp file so that it can be moved via renaming
err = tf.Close()
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
return
}
err = os.Rename(tf.Name(), fmt.Sprintf("%x", hash.Sum(nil)))
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
return
}
})
http.ListenAndServe(":1337", nil)
}
Here’s a breakdown of what’s happening in the code snippet:
- We define an HTTP handler function for the “/upload” endpoint, which is responsible for handling file uploads.
- Inside the handler function, a temporary file (
tf
) is created usingos.CreateTemp
to temporarily store the uploaded file. - To calculate the hash of the request body content, a
sha256
hash digest is created usingsha256.New()
. - The
io.MultiWriter
function is used to create a writer that duplicates its writes to all provided writers (our temp filetf
and the sha256hash
digest), similar to the Unix tee(1) command. - The
io.MultiWriter
(mw
) is passed toio.Copy
to efficiently copy the request body content to the temporary file and thehash
digest. - After successfully copying the upload request body, the temporary file is closed to ensure all data is flushed and ready to be moved via renaming.
- Finally, the temporary file is renamed using the calculated hash as the new filename. This approach guarantees file uniqueness and integrity. This step can be swapped out for copy the temp file contents into a longer term storage destination.
The incremental nature of hashing functions is one of the factors in this code that enables memory efficiency. The hash function (
sha256
in this case) does not need to store all of the contents that have been written to it. Instead, it processes the data in chunks as it is received, keeping memory usage low and allowing efficient handling of large files.
The line fmt.Sprintf("%x", hash.Sum(nil))
specifically handles the conversion of the hash value to a hexadecimal string representation:
hash.Sum(nil)
returns the hash value as a byte slice. TheSum
method is called on thehash
object, which in this case is an instance ofsha256.Hash
(*sha256.digest
).- The
nil
argument passed toSum
indicates that we want the final hash value and do not want to append any additional data. - The
Sum
method calculates the hash based on the data that has been written to it usingio.MultiWriter
, which, in this case, is the request body content. - The result of
hash.Sum(nil)
is a byte slice representing the hash value. - To convert the byte slice to a hexadecimal string representation, the
fmt.Sprintf
function is used:- The
%x
verb in the format string specifies that the byte slice should be formatted as a hexadecimal string. fmt.Sprintf
returns the formatted string representation of the byte slice.
- The
Conclusion
By following this approach, you can efficiently calculate the hash of the request body content and store the uploaded files. Utilizing io.MultiWriter
allows simultaneous writing to multiple destinations, making it a memory-efficient and convenient solution for content hashing in Go. Remember to handle errors appropriately to ensure a robust and reliable file upload process in your application.