Hiding JSON-formatted data in the DOM with CSP enabled

Published 27th August 2013 · tagged with CSP, DOM, HTML, JavaScript, PHP, security

If Content Security Policy is enabled for protection against cross-site scripting attacks (i.e. the unsafe-inline option is not set), the use of inline <script>s is not allowed. In that case, how can we pass server-generated data to the front-end without negatively affecting load time and run-time performance?

Introduction

A common way to pass server-generated JSON-formatted data to the client so that it can be used in JavaScript is the following:

<!-- at the bottom of the HTML document, before `</body>` -->
<script>
	window.data = <?php
		// Note: this is potentially unsafe; see escaping instructions below.
		echo json_encode($data);
	?>;
</script>
<script src="process-data.js"></script>

This results in something like:

<!-- at the bottom of the HTML document, before `</body>` -->
<script>
	window.data = {"foo":"bar&baz","baz":42,"qux":"lorem isn't ipsum","waldo":"2<5"};
</script>
<script src="process-data.js"></script>

Note that in this case, any occurrences of </script in the JSON-formatted data should be escaped as <\/script to avoid closing the <script> element prematurely. Also, <!-- must be escaped as \u003C!--. Other than that, no special escaping is necessary. In PHP, this boils down to json_encode($data, JSON_HEX_TAG | JSON_UNESCAPED_SLASHES);.

The file process-data.js looks something like this:

(function() {

	var process = function(data) {
		// Do stuff with the data, e.g. render charts, modify the DOM, whatever.
	};

	process(window.data);

}());

This technique is simple to implement and doesn’t have a negative impact on the page’s performance. Only a single HTTP request is needed to load the external JavaScript file that processes the data. The data itself is inlined in the HTML so no additional HTTP requests are needed to download it. Moreover, it gets parsed as a JavaScript object right away by the browser’s JavaScript engine — there’s no need to call JSON.parse() at run-time.

However, if Content Security Policy is enabled for protection against cross-site scripting attacks (i.e. the unsafe-inline option is not set), the use of inline scripts is not allowed. In that case, how can we still pass the data along without negatively affecting load time and run-time performance?

Ajax

A naïve solution would be to load the JSON-formatted data through Ajax on page load. The HTML code then looks like this:

<script src="fetch-and-process-data.js"></script>

The file fetch-and-process-data.js looks something like this:

(function() {

	var getJSON = function(url, callback) {
		// Load the data using XHR, and call `callback(data)` when finished.
		// See https://mathiasbynens.be/notes/xhr-responsetype-json for an example implementation.
	};

	var process = function(data) {
		// Do stuff with the data, e.g. render charts, modify the DOM, whatever.
	};

	// Load the data, then process it.
	getJSON('data.json', process);

}());

However, this introduces another HTTP request to load the JSON-formatted data, which negatively impacts load-time performance. This leads to a slower user experience, especially in cases where the HTML document doesn’t really contain any information until the data is loaded (remember Twitter back in 2010?).

What are the alternatives?

A hidden element

We can still inline the JSON-formatted data in the HTML document without using <script> elements (to satisfy CSP) by using a hidden dummy element. Note that the <data> element shouldn’t be used for this purpose, as in this case no human-readable representation of the data is available. Let’s use a <div> element instead:

<div id="important-data" class="hidden-data">
	{"foo":"bar&amp;baz","baz":42,"qux":"lorem isn't ipsum","waldo":"2&lt;5"}
</div>
<script src="read-and-process-data.js"></script>

Note that in this case, the JSON-formatted data needs some extra escaping for security reasons:

Any occurrences of < should be escaped as < (at the HTML level) or \u003C (at the JSON level) to avoid accidentally creating unwanted HTML elements and to avoid closing the <div> element prematurely.
Any occurrences of & should be escaped as & (at the HTML level) or \u0026 (at the JSON level) to avoid issues with ambiguous ampersands or text that looks like HTML entities in the source data.

No other special escaping is necessary. In PHP, this boils down to json_encode($data, JSON_HEX_TAG | JSON_HEX_AMP | JSON_UNESCAPED_SLASHES);.

To prevent browsers from displaying the raw data, CSS can be used:

.hidden-data {
	display: none;
}

(Note that using the hidden HTML attribute instead wouldn’t be appropriate, since the raw data is never to be displayed.)

The file read-and-process-data.js looks something like this:

(function() {

	var getData = function() {
		// Read the JSON-formatted data from the DOM.
		var element = document.getElementById('important-data');
		var string = element.textContent || element.innerText; // fallback for IE ≤ 8
		var data = JSON.parse(string);
		// Clear the element’s contents now that we have a copy of the data.
		element.innerHTML = '';
		return data;
	};

	var process = function(data) {
		// Do stuff with the data, e.g. render charts, modify the DOM, whatever.
	};

	// Get the data, then process it.
	var data = getData();
	process(data);

}());

This approach can be used even when CSP is enabled. That’s a big plus! However, it comes with a few downsides compared to the inline <script>-based technique:

As mentioned, an extra round of escaping is needed, which increases the size of the JSON-formatted data.
It requires DOM operations (which are generally slow in JavaScript) to read out the data.
To convert the data back to a JavaScript object, JSON.parse() must be called at runtime. For backwards compatibility with older browsers, a JSON polyfill is needed.
It relies on CSS to hide the raw data.

Alternative: an inline `<script>` element with a non-JavaScript `type`

To get rid of the CSS dependency, we could use an element that browsers hide by default instead of a <div>, such as an inline <script> element with a non-JavaScript type attribute (so CSP doesn’t block it or trigger any warnings):

<script type="application/json" id="important-data">
	{"foo":"bar&baz","baz":42,"qux":"lorem isn't ipsum","waldo":"2<5"}
</script>
<script src="read-and-process-data.js"></script>

Note that in this case, any occurrences of </script in the JSON-formatted data should be escaped as <\/script to avoid closing the <script> element prematurely. Since script elements with unknown types are not executed by any browser, there should be no risk in leaving <! unescaped — but feel free to escape it as \u003C! just in case. Other than that, no special escaping is necessary. In PHP, this boils down to json_encode($data, JSON_HEX_TAG | JSON_UNESCAPED_SLASHES);.

The read-and-process-data.js script is exactly the same as in the previous example.

A custom `data-*` attribute

Another solution is to use a custom data-* attribute to hide the JSON-formatted data in the DOM. You could assign this attribute to the <html> or <body> element, but I’d suggest injecting the data before any <script> elements near the closing </body> tag, so that the rest of the content is downloaded and rendered first.

<div id="important-data" class="hidden-data" data-data='{"foo":"bar&amp;baz","baz":42,"qux":"lorem isn&#39;t ipsum","waldo":"2<5"}'></div>
<script src="read-and-process-data.js"></script>

Note that I’ve wrapped the attribute value in single quotes instead of double quotes to avoid having to escape the many double quotes in the JSON-formatted data. Still, the JSON-formatted data needs some extra escaping for security reasons:

Any occurrences of ' should be escaped as ' or ' (at the HTML level) or as \u0027 (at the JSON level) to avoid breaking out of the HTML attribute value. (If the attribute value is wrapped in double quotes, escape " as " or \u0022 instead.)
Any occurrences of & should be escaped as & (at the HTML level) or \u0026 (at the JSON level) to avoid issues with ambiguous ampersands or text that looks like HTML entities in the source data.

No other special escaping is necessary. In PHP, this boils down to json_encode($data, JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP | JSON_UNESCAPED_SLASHES);. Note that < and > don’t need escaping in quoted attribute values.

The file read-and-process-data.js looks something like this:

(function() {

	var getData = function() {
		// Read the JSON-formatted data from the DOM.
		var element = document.getElementById('important-data');
		// Note: in modern browsers, you could use `element.dataset.data` instead
		// of `getAttribute('data-data')`.
		var string = element.getAttribute('data-data');
		var data = JSON.parse(string);
		// Remove the attribute now that we have a copy of the data.
		element.removeAttribute('data-data');
		return data;
	};

	var process = function(data) {
		// Do stuff with the data, e.g. render charts, modify the DOM, whatever.
	};

	// Get the data, then process it.
	var data = getData();
	process(data);

}());

Just like the “hidden element” approach, this technique can also be used even when CSP is enabled. Another advantage: since the data is part of an attribute value, it won’t be visible by default.

However, it still has a few downsides compared to the inline <script>-based technique:

As mentioned, an extra round of escaping is needed, which increases the size of the JSON-formatted data.
It requires DOM operations (which are generally slow in JavaScript) to read out the data.
To convert the data back to a JavaScript object, JSON.parse() must be called at runtime. For backwards compatibility with older browsers, a JSON polyfill is needed.
CSS may be needed to hide the empty element (in case it inherits border styles, for example), although it won’t be necessary in most cases.

Conclusion

Passing server-generated JSON-formatted data to the client-side for use in JavaScript becomes a bit more complex when CSP is enabled, since inline <script>s are not an option in that case.

With security, performance, and flexibility in mind, the best solution is to hide the JSON-formatted data either in a custom data-* attribute on an element near the closing </body> tag, or in an inline <script> element with type="application/json".

Disclaimer: Enabling CSP is not enough to fully protect your site against client-side security vulnerabilities, although it certainly helps.

Comments

David Bruant wrote on 27th August 2013 at 17:29:

Random idea: what about base64-encoding data?

Would that save from escaping effort and the need to know for encyclopaedic knowledge of escape rules? (I feel it would, but miss the aforementioned encyclopaedic knowledge ;]) That would allow to easily switch between inline script/hidden element/attribute without worrying too much about escaping.

Will the gzip size of the base64-encoded data (which inflates the number of bytes by 33%) be roughly the same than the gzip version of the raw/escaped data? (I’m not too worried about the cost of base64 encoding/decoding, should I?)

For the hidden element technique, am I a bad person if I use <script type="application/json">?

To convert the data back to a JavaScript object, JSON.parse() must be called at runtime.

This cost has to be paid with the inline script technique too. It just happens as part of parsing the data as a JavaScript object. And I doubt the cost of a function call matters by comparison to the cost of parsing + allocating.

In the majority of cases I’ve encountered, the data just needs to be read once, so removing the hidden element or element can make a lot of sense if the data is big to prevent it from being duplicated between the DOM and JS memory (basically, this idea is just to move the data from the HTML-generated DOM to JS).

Great article as usual Mathias! :-)

Mathias wrote on 27th August 2013 at 18:02:

David: […] base64-encoding data […] would allow to easily switch between inline script/hidden element/attribute without worrying too much about escaping.

Fair point. Neil Matatall (who happens to have written a similar post a while back) suggested the escaping logic can be reduced to “HTML-entity-encode all the things”. I just wanted to be explicit about which escapes are necessary and which are extraneous in the cases presented. Especially the fact that </script> and <!-- must be escaped in the context of a <script> element is often forgotten.

Will the gzip size of the base64-encoded data be roughly the same than the gzip version of the raw/escaped data?

Good question. To test this, I downloaded https://html.spec.whatwg.org/entities.json, base64-encoded it, and then compared the file sizes before and after gzipping.

$ gz entities.json
orig: 145897 bytes
gzip: 20233 bytes (13.87%)

$ gz entities-base64.json
orig: 194533 bytes
gzip: 28436 bytes (14.62%)

In this case, gzip compression actually performs better for the base64-encoded JSON data, but still the gzipped file is much larger in size.

I’m not too worried about the cost of base64 encoding/decoding, should I?)

This would boil down to using JSON.parse(atob(data)) instead of just JSON.parse(data) to decode the base64-encoded JSON-formatted data. This introduces a dependency on an atob polyfill in older browsers. Also, it would be good to create a jsPerf test to see if the extra round of decoding has a non-negligible performance penalty.

In the majority of cases I’ve encountered, the data just needs to be read once, so removing the hidden element or element can make a lot of sense if the data is big to prevent it from being duplicated between the DOM and JS memory (basically, this idea is just to move the data from the HTML-generated DOM to JS).

Excellent point! I’ve tweaked the code examples accordingly.

Nicolas Gallagher wrote on 27th August 2013 at 18:45:

I don’t see why the hidden attribute couldn’t have been used in one of your examples. There’s nothing I could see in the spec to back up your reasoning.

FWIW, we use the script-with-different-type approach for Twitter Cards (Neil Matatall works at Twitter too).

Mathias wrote on 27th August 2013 at 19:00:

Nicolas: Quoting the spec:

When specified on an element, it indicates that the element is not yet, or is no longer, directly relevant to the page’s current state, or that it is being used to declare content to be reused by other parts of the page as opposed to being directly accessed by the user.

My interpretation of the spec is that the hidden attribute is intended for content that, at some point, becomes visible. In this case, the data will never be “directly relevant to the page” — it should always remain hidden, therefore the hidden attribute doesn’t seem like a good choice. But maybe I’m reading too much into it.

FremyCompany wrote on 27th August 2013 at 20:37:

I tend to use an XML Comment inside an HTML tag for this.

However, I was wondering if it wouldn’t be a good idea to use hidden inputs instead, because the input.value property is easy to read and scripts are expected to read it.

Neil Matatall wrote on 28th August 2013 at 20:52:

Um Mathias has pointed out a potential issue. Why this issue has not surfaced yet is escaping me (pun intended). If your <script type="application/json"> goes the HTML escaped route, it would need to be unescaped. http://jsbin.com/AxeR/2/edit

That being said, I’m very anti-contextual-encoding as it isn’t supported directly by all templating languages. But I’m wearing my security hat, not my developer hat. (True story, I used to suggest throwing the data in a span/div rather than a script tag!)

Mathias wrote on 29th August 2013 at 11:59:

Neil:

If your <script type="application/json"> goes the HTML escaped route, it would need to be unescaped.

Exactly. So why bother? Just use textContent to read out the unescaped contents, falling back to innerText if old IE is a concern.

Neil Matatall wrote on 29th August 2013 at 21:41:

Mathias: Yep. I was confused and went along happily mistaking this. This does support the “HTML-entity-encode all the things” which was my original point I guess.

Mike West wrote on 30th August 2013 at 09:39:

Note that one of the major features coming up in CSP 1.1 will be a whitelisting mechanism to enable specific inline blocks to execute. The WG is evaluating nonces (implemented behind a flag in Chrome), and hashes (still working out the details for an initial implementation).

I’d encourage anyone interested in that discussion to take a look at the 1.1 spec: https://dvcs.w3.org/hg/content-security-policy/raw-file/tip/csp-specification.dev.html and hop into the WG mailing list (public-webappsec@w3.org). Feedback is oh so welcome. :)

Lecky wrote on 12th September 2013 at 03:13:

For a custom data-* attribute, there’s no need to use JSON.parse(). Just use HTML5 dataset.

Mathias wrote on 12th September 2013 at 04:45:

Lecky: The example in the post mentions dataset. Unfortunately element.dataset.data is only a replacement for element.getAttribute('data-data'). Since it returns a string value, you’ll still need JSON.parse to turn the serialized string back into an object. Here’s an example: http://jsbin.com/AGAWUxA/1/edit?html,console

Olivier Mengué wrote on 13th September 2013 at 13:59:

Better than base64, use CDATA: you'll just have to replace ]]> with ]]>]]<![CDATA[>. https://en.wikipedia.org/wiki/CDATA

Mathias Bynens

Hiding JSON-formatted data in the DOM with CSP enabled

Introduction

Ajax

A hidden element

Alternative: an inline `<script>` element with a non-JavaScript `type`

A custom `data-*` attribute

Conclusion

Comments

Leave a comment

Introduction

Ajax

A hidden element

Alternative: an inline <script> element with a non-JavaScript type

A custom data-* attribute

Conclusion

Comments

Leave a comment

Alternative: an inline `<script>` element with a non-JavaScript `type`

A custom `data-*` attribute