Every software engineer has at some point in their career been asked to build a file upload. Perhaps it was to let a user upload an avatar, to import a bunch of contacts into a CRM, or to attach images or legal documents to a real estate property.
On the surface level, uploading a file is pretty straightforward: You add a form in your application, upload it to the server, and display it somewhere else. Done! But then bug reports start coming in from customers about files they cannot upload, your security officer mentions a harmful file was uploaded, or the infrastructure team notices the server’s disk space is running low.
If you want to avoid situations like this in the future read on to learn about which things you can consider to make your next file upload a success.
Consider abstracting your filesystem
Most programming languages have a low-level API to write files to the server. PHP has file_put_contents. If you’re building an MVP or writing a one-off script then by all means keep it simple and use these!
When an application starts to scale across multiple servers you should however consider using a filesystem such as S3 or Google Cloud storage. By abstracting your filesystem you can easily switch out the underlying service. An added bonus is that you make your code more testable by using an in-memory filesystem.
A popular package in PHP that does this is league/flysystem.
Consider that users cannot be trusted
Let’s say that you want to provide a way for users to upload an avatar for their profile. For the sake of this example, we only want to allow PNG files to be uploaded.
Your initial thought could be to validate the file based on the extension and only allow it to be uploaded when it ends with .png. But how would the system react when someone tries to upload a text file of which they changed the extension to .png?
A safer way is to validate it based on the file’s mime-type. So you change your validation rule to only allow files of which the mime-type is image/png
. This is a solid approach, but it highly depends on how you determine the mime-type of a file. In PHP any uploaded file is accessible through the $_FILES
variable:
var_dump($_FILES['file']);
^ array:5 [
"name" => "image.png"
"type" => "image/png"
"tmp_name" => "/tmp/phpEpywP6"
"error" => 0
"size" => 14136
]
This includes a type
property that supposedly contains the mime-type we’re looking for. But what happens when we try with the text file we renamed to .png
?
var_dump($_FILES['file']);
array:5 [
"name" => "hello_world.png"
"type" => "image/png"
"tmp_name" => "/tmp/phpcOkmH0"
"error" => 0
"size" => 11
]
You can see that it also says image/pn
g even though we know that it is a text file. The reason for that is the value of the property contains the type detected by the browser. Most browsers do this based on the file extension and not the file contents and it can also be affected by certain client configurations (for example: a change in the Windows Registry).
To correctly detect the mime-type of a file you should always look at the contents of that file. PHP has an API for this called finfo that will return the mime-type based on certain byte sequences of the file’s content. For our previous example this results in the following:
$finfo = new \\finfo(FILEINFO_MIME_TYPE);
var_dump($finfo->file($_FILES['file']['tmp_name']));
// string(10) "text/plain"
Most frameworks (such as Laravel) come with a mime-type validation rule that already makes use of this. If you’re rolling your own there are various options such as league/mime-type-detection and symfony/mime which implement the finfo
API.
Consider that users will upload large files
One of the most common bugs encountered with file uploads is limitations imposed by the web server. Nginx has client_max_body_size
that allows you to limit the maximum size of a file; PHP has post_max_size
and upload_max_filesize
.
Typically when you encounter a problem like this your gut reaction is to increase the limit to something higher (until you run into server timeouts or memory problems because it takes too long to process the file). Another solution would be to force users to stay within a certain limit.
But there are situations where you would want users to upload large files (when you’re working with video or audio processing software for instance). These files could be several hundred megabytes in size.
One popular solution for this is to allow chunked uploads. The idea is that your 100MB file is cut into several smaller chunks of 1MB. These chunks are uploaded to the server and once the last chunk is uploaded you re-assemble the individual chunks into a single file and you can start using it.
This approach does come with a lot of accidental complexity. There are open source solutions such as ResumableJS and Tus.io that can help you with this.
Conclusion
File uploads are one of those things that can easily become complex. If your software heavily relies on them I would suggest to take these considerations into account early on because refactoring comes with a cost.
Member discussion