Handling invalid url parameters with nginx
While improving the page indexing of our blog, I realized that Google still had old urls stored that aren't valid anymore. The challenge in this case: The urls in question could only be identified via specific get parameters.
One of the blog solutions we used in the past used url parameters to load the different pages. Urls looked like this back in the days:
https://blog.bitexpert.de/?s=some_page
Google had those urls still indexed and complained about duplicate content. Since we are using Docusaurus as a static site generator, I could not add a server-side check and return an error for this case. So my only chance was to add some logic to nginx to handle those cases.
While testing a few different approaches, this one worked best for this specific use case:
Since Docusaurus is handling 404 pages by itself, I needed a different HTTP response code to handle this scenario. I chose to use "410 Gone" as it felt the most appropriate.
First, we define the error page for the 410 response code:
error_page 410 /410.html;
location = /410.html {
root /usr/share/nginx/html;
internal;
}
Now that we have the error page in place, how can we configure nginx to detect the "wrong" GET parameters? Luckily nginx exposes GET parameters as $arg_
variables. Adding a check for the parameter is done like this:
location / {
# Clean up old invalid urls in Google cache
if ($arg_s) {
return 410;
}
try_files $uri $uri/index.html $uri/ =404;
root /usr/share/nginx/html;
index index.html index.htm;
}
If $arg_s
is set, nginx returns a 410 error. If $arg_s
is not set, Docusaurus is served as is. Problem solved.