XSS (Cross-Site Scripting) Attacks and Prevention

What are XSS vulnerabilities?

XSS (Cross-Site Scripting) vulnerabilities arise when untrusted data gets interpreted as code in a web context. They usually result from:

Generating HTML unsafely (parameterizing without encoding correctly).
Allowing users to edit HTML directly (WYSIWYG editors, for example).
Allowing users to upload HTML/SVG files and serving those back unsafely.
Using JavaScript unsafely (passing untrusted data into executable functions/properties).
Using outdated and vulnerable JavaScript frameworks.

How to prevent XSS vulnerabilities?

Follow these steps:

Generate HTML safely using a templating engine, or use a static JavaScript frontend to avoid HTML generation altogether.
If you display untrusted HTML content on your website, purify it first and contain it in a sandboxed frame.
Serve all downloads with a proper Content-Disposition header to prevent user-supplied HTML/SVG from being rendered in your origin.
Don't pass untrusted data into executable JavaScript functions/properties such as eval, innerHTML or href.
Use well-known components with a good security history and keep them up to date.
Implement a proper CSP (Content Security Policy).

What is untrusted data?

Before we begin, let's quickly touch on this point. For the sake of this article, anything that is not controlled by your web application is untrusted data.

User input is one clear example. But you should also consider any data retrieved from external sources, even your database or API, as potentially dangerous and render it with proper safety measures.

A good rule of thumb is that if it's not a static resource, then it's untrusted data, at least on some level.

Why are XSS vulnerabilities bad?

There is sometimes a misconception that XSS vulnerabilities are low severity bugs. They are not. The power to execute JavaScript code on a website in other people's browsers is equivalent to logging in to the hosting server and changing the HTML files for the affected users.

As such, XSS attacks effectively make the attacker logged in as the target user, with the nasty addition of tricking the user into giving some information (such as their password) to the attacker, perhaps downloading and executing malware on the user's workstation.

And it's not like XSS vulnerabilities only affect individual users. Stored XSS affects everyone who visits the infected page, and reflected XSS can often [spread like wildfire](https://en.wikipedia.org/wiki/Samy_(computer_worm).

1. Avoid XSS by generating HTML safely

A simple example

Here is a PHP script that is vulnerable to XSS:

echo "<p>Search results for: " . $_GET('search') . "</p>"

It is vulnerable because it generates HTML unsafely. The search parameter is not encoded correctly. An attacker can create a link such as the following, which would execute the attacker's JavaScript code on the website when the target opens it:

https://www.example.com/?search=
<script>
  alert('XSS')
</script>

Results in HTML like:

<p>
  Search results for:
  <script>
    alert('XSS')
  </script>
</p>

The importance of encoding

So how then can you safely display the value <script>alert("XSS")</script> in your HTML? The answer is: HTML entity encoding:

 & --> &amp;
 < --> &lt;
 > --> &gt;
 " --> &quot;
 ' --> &#x27;

PHP has a function called htmlspecialchars that performs this operation. So if we change our script a little bit (This is a horrible legacy approach, but it suffices now for demonstration), the resulting HTML will be safe.

echo "<p>Search results for: " . htmlspecialchars($_GET('search')) . "</p>"

Creates:

<p>Search results for: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;</p>

Encoding contexts

HTML entity encoding is suitable only when you want to put something inside HTML tags or quoted HTML attributes. If your variables go inside JavaScript variables or URL addresses, you need another encoding function to avoid XSS.

So make sure you are using the proper encoding for the context.

Quotes!

Don't forget to quote your HTML attributes and JavaScript variables or no encoding in the world will save you. The following PHP script is vulnerable to XSS because you can enter a value such as: foo onClick=alert(1).

echo '<input type="text" value=' . htmlspecialchars($_GET('search')) . '></input>';

Don't put untrusted data in executable fields

Also, never put untrusted data, encoded or not, within HTML attributes that execute something. These include onClick, onMouseEnter and friends, but also src and href because someone can put javascript:alert("XSS") for the value, and once again, you have an XSS vulnerability.

Scary, isn't it? But don't worry.

There are quite a few things that can go wrong, and OWASP has curated a nice list of them here: Cross-Site Scripting Prevention Cheat Sheet

However, I wouldn't advise you to focus on that too much. After all, are we thrilled with code like this?

echo "<p>Search results for: " . htmlspecialchars($_GET('search')) . "</p>"

No. That isn't very good. Mixing presentation and code is so 90's.

$ rm legacy.php

Template engines to the rescue

Instead, you should have your controller method somewhere render a template with the data you want to display. In the case of PHP, Twig is a good option. You would have search.html.twig with the following content:

<p>Search results for: {{search}}</p>

And Twig would automatically encode your search parameter due to the template engine's automatic escaping.

https://symfony.com/doc/current/templates.html#output-escaping

There are good template engines for all programming languages worth their salt. There's Jinja for Python, Thymeleaf for Java, and so on.

Just note that not all template engines are created equal. Some of them have an excellent and standard way to add HTML attributes, Thymeleaf's th:attr being one of them.

Then there are others where you have to be careful to quote your attributes.

And probably none of them will protect you from putting untrusted data into href, src, onclick, etc., so you still have to keep those in mind.

...or just don't generate HTML at all!

Another great way not to deal with XSS when generating HTML is not to generate HTML. You can do this by creating a static HTML/JavaScript frontend and perhaps a backend API. Try https://nextjs.org/, perhaps. It's pretty cool!

2. Avoid XSS by purifying and sandboxing untrusted content

There are scenarios where you might want to render content that you don't fully trust. Maybe you want your users to create HTML in a WYSIWYG editor, or perhaps you want to download an HTML response from a third party and display it to the user.

Whatever the use case, the solution is the same. Purify and sandbox.

Purify

Purifying is the act of removing any dangerous parts from an HTML string. You can do this on the client-side with DOMPurify or on the server-side with several tools such as the OWASP Java HTML sanitizer for Java or Mozzila's bleach for Python. Just pick a well-esteemed one.

Sandbox

Purifying is an excellent first step, but I wouldn't leave my website's security hanging on that alone. Luckily, there is a great control that we can use to display untrusted HTML content. May I present: sandboxed iframes!

https://www.w3schools.com/tags/att_iframe_sandbox.asp

Sandboxed iframes run by default in their own origin. That is, if anything goes south in the frame, the frame cannot access your website. Also, sandboxed iframes by default prevent script execution and even links. Very useful for our purposes!

Here is an example of a non-sandboxed frame. If you run it, you should see an alert box with the message evilness.

Here is another fiddle with the sandbox attribute specified. Notice that this time the script does not get executed.

Such is the magic of sandboxed frames.

3. Avoid XSS by serving downloads properly

When you allow users to upload files, there is a risk that they upload a malicious HTML, SVG, or similar file to your server. And suppose the file is then downloadable from your domain, and your web server serves it like any other HTML file. An attacker could upload a file with malicious JavaScript content and redirect unwitting users to the page.

To prevent this, serve all content that is not supposed to be rendered directly in a web browser with a proper Content-Disposition header. Like so:

Content-Disposition: attachment; filename="filename.jpg"

By specifying attachment, you tell browsers to show the save file dialog.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition

4. Avoid XSS by using JavaScript safely

Not all XSS vulnerabilities arise from unsafe HTML. Sometimes your JavaScript code can have XSS vulnerabilities in it.

To get one, all you have to do is pass untrusted data into a function or property that either executes something or changes the HTML, HREF, or SRC of something.

Here are a couple of examples just so that you get the idea.

Passing untrusted data to jQuery append (which writes HTML).

Passing untrusted data to a href attribute. This example demonstrates that not even React applications are safe from XSS if you don't know what you are doing (click the user's homepage link to see).

Passing untrusted data to eval This is an example of passing untrusted data to a function that executes something. This simple calculator will execute code if you enter values like ;alert('xss'); into one of the operands.

There is virtually an infinite list of functions and properties into which you shouldn't pass untrusted data. They include things like innerHTML, outerHTML, setTimeout and so on. And of course, the JavaScript libraries you use will have their own, just like the jQuery example above.

It's better to be safe than sorry, so check the documentation for the function/property before assigning/appending untrusted data into it.

5. Avoid XSS by using well-known JavaScript libraries and keeping them up to date

Don't use NPM packages with a small number of downloads because they are more prone to have vulnerabilities or even contain purposefully malicious code. Try to use well-known libraries with a decent security record instead.

Even the best of libraries have vulnerabilities now and then, so make sure to keep them up to date. You can use tools such as retire.js and npm audit to scan your web application for vulnerable outdated JavaScript libraries.

6. Avoid XSS by implementing a Content Security Policy

Content Security Policy (CSP) is a fantastic browser security feature that can armor your web application against XSS vulnerabilities. Use it, even if you already follow the other good practices in this article.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

CSP restricts your web application in what it can do in terms of, e.g., loading resources and executing scripts. Like the iframe sandbox described above, CSP also restricts everything by default. Then you can start adding exceptions for the resources that you do need.

Here is a good policy to get started with. It prevents lots of things, but most importantly:

It prevents eval and friends.
It prevents inline JavaScript tags.
It prevents loading JavaScript files from external domains.
It prevents javascript: URLs.

Content-Security-Policy: script-src 'self'; form-action 'self'; object-src 'none';

It would prevent all of the XSS vulnerabilities we have described so far from being meaningfully exploited. The form-action even prevents the attacker from inserting a fake form on the page asking, e.g., the victim's password and submitting it to the attacker's server.

So what's the catch? Your inline scripts won't work either. For CSP to work in this simple form, you will have to refactor your code so that you won't break your own rules.

You don't use inline scripts.
You don't use inline DOM event handlers (onClick, etc.).
You don't use eval and the scripts/frameworks that you don't use either.

Now you start adding exceptions. Suppose you absolutely must use a JavaScript framework that uses eval. In that case, you will have to specify unsafe-eval, which is not optimal but not the end of the world either. And if you want to load a script externally, add that URL into the script-src directive.

Please, whatever you do, do not specify script-src 'unsafe-inline' because then you will downgrade your CSP to the point where it's almost useless.

If you absolutely must have inline JavaScript tags, use CSP nonces or hashes to allow those specific tags. I won't go into detail about that in this article, but you can read more about this approach here:

When you have written your policy, you can use Google's CSP evaluator to check it.

Bonus: Avoid XSS by implementing Strict SameSite cookies

One more browser security feature that you can use to harden your application against (reflected) XSS attacks is strict SameSite cookies.

The crux of it is that you set your session cookies with the SameSite=Strict attribute, and web browsers will no longer send the user's session cookie in requests that originate from other websites, even if they are GET requests.

Set-Cookie: SessionId=123; SameSite=Strict

The catch is that links to your application will kind of break. If your user is logged in and clicks a link somewhere that points to your application, the user will be logged out in the tab/window that opens.

But if that doesn't bother you, do take advantage of strict SameSite!

You should at least use SameSite=Lax, which protects against CSRF (Cross-Site Request Forgery) vulnerabilities.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite

Conclusion

It is quite possible to avoid XSS vulnerabilities by using modern technology and knowing the pitfalls to avoid. Using a CSP as an additional layer of security is very highly recommended.