Tags:
The IIS semicolon file extension issue prompted me to jot down some of the rules to implement file uploads securely. This is in particular complex as there is usually no easy way to validate the content of the file.
The overall goal is to build a set of defensive layers that tightly control the process of uploading the file and later retrieval of the file. The user will always interact indirectly with the file and never directly access the file system without application control.
1. Create a new file name
Do not use the user supplied file name as a file name on your local system. Instead, create your own unpredictable file name. Something like a hash (md5/sha1) works as it is easily validated (it is just a hex number). Maybe add a serial number or a time stamp to avoid accidental collisions. You may add a secret to the name to make it harder to guess the file name. If you need to keep the original file name: use a look-up table to link the validated user supplied file name to the server created name.
2. Store the file outside of your document root
If your document root is /var/www/html, create a directory /var/www/uploads and use it to store uploaded files. That way, an attacker will not be able to retrieve the file directly. This will allow you to provide fine grained access control. The file will not be parsed by the server's application language module but the source of the file will be streamed.
3. Check the file size
You should set a maximum file size in the upload form, but remember: It is just advisory. Make sure to check the file size after the upload completed. Be in particular careful if you allow the upload of compressed files and later uncompress them on the server. This scenario is very hard to secure.
4. Extensions are meaningless
The motivation for this post is the ';' issue in IIS. However, even Apache doesn't always behave the way you expect it to. Try 'something.php.x' in Apache and chances are that php code will be executed. Its a feature ? . If you stream a file back to the user, the extension isn't what matters, but the Content-Type header and the file's header. It is best to use the "file" command on unix to check the file type. But even this is not fool proof. It will just check the first few bytes. In PHP for example, a file may start with a GIF header, but later if the PHP engine sees a "<?php" tag, it will happily interpret an embedded PHP script.
5. Try a malware scan
The extension is right, and you checked that the file is actually a valid JPEG file per it's header. However, it could still be a malicious JPEG using one of the many image parser bugs to exploit clients downloading the file. There is no great defense against this as far as I am aware. One possible work around is to "rebuild" the file. Convert the JPEG to a GIF and back to a JPEG. This will likely strip out any malicious feature. But this technique could expose your servers to just the same image parser bugs.
6. Keep tight control of permissions
Any uploaded file will be owned by the web server. But it only needs read/write permission, not execute permissions. After the file is downloaded, you could apply additional restrictions if this is appropriate. Sometimes it can be helpful to remove the execute permission from directories to prevent the server from enumerating files.
7. Authenticate file uploads
File uploads, in particular if these files are viewable by others without moderator review, have to be authenticated. This way it is at least possible to track who uploaded an objectionable file.
8. Limit the number of uploaded files
Many developers limit the file size, but not all limit the number of files uploaded in a request. Make sure to apply reasonable limits. But be also ready for a DoS attack that just uploads a large number of small files. Pick an appropriate directory structure to limit the number of files per directory and pick an appropriate file system.
Conclusion
Let me know if you can think of other issues to consider. Some depend on the application, but the eight above are generic. For example, if you deal with XML files you can validate them against a schema. A text file can be validated based on dictionaries. Particular image formats can be analyzed more closely for malicious content. For PDFs, you can strip out javascript which often causes problems and for HTML you could use libraries like HTML purifier. Using a distinct upload partition can help against having a denial of service attack impact other parts of the system and it will also allow for additional access control. A human moderator may be advisable if inappropriate content is a problem.
Finally: Remember the #1 rule of good web application security. All users are evil!