Sanitization
I wish it were Santa-ization. Who wouldn't like that?
— Anonymous
Definition
Sanitization within the context of Bandicoot refers to cleaning an HTML string to remove potentially malicious content. Unsanitized HTML is vulnerable to cross site scripting (XSS) which is something you need to avoid in your code. Bandicoot does NOT automatically sanitize your HTML, however it provides a way for you to bring your own sanitizing function to the party.
Why you should care
After receiving incoming HTML, Bandicoot inserts the content into the DOM via React's dangerouslySetInnerHTML.
React calls it dangerous for good reason -- if the HTML string has a <script>
element or other javascript embedded into the dom elements,
that javascript will be executed during deserialization. This is a security risk known as
cross site scripting (XSS).
What this means for you
Sanitization
Since Bandicoot does NOT automatically sanitize incoming or outgoing HTML, it is up to you as the developer to make sure that sanitization happens. However, Bandicoot does expose a sanitizeHTML prop on the RichTextEditor component which allows you to provide a sanitization function.
If a sanitization function is provided through the sanitizeHTML
prop, Bandicoot will pass incoming HTML strings through this function before inserting the content into the DOM (see note 1). If provided, Bandicoot will also pass outgoing HTML through the sanitizeHTML
function after serialization (see note 2). If a sanitizeHTML
function is NOT provided, Bandicoot will simply pass through the unsanitized HTML string. In the case of inserting the content into the DOM this would mean using dangerouslySetInnerHTML
with unsanitized HTML - NOT RECOMMENDED!
Ultimately, the choice on what sanitization process to take or what library to use is up to you. If you are looking to do client-side sanitization, there are many third-party HTML sanitizers that already exist. One that works well is sanitize-html.
The sanitizeHTML
function
When Bandicoot calls the sanitizeHTML
function it passes two arguments:
suspectHTML
: An string of HTML content that potentially needs sanitizing.actionType
: A string indicating what action Bandicoot is about to take.
The actionType
options and when they occur are as follows:
setHTML
: Occurs right before inserting theinitialHTML
into the DOM as well as right before inserting content into the DOM through theeditorRef.current.setHTML()
andeditorRef.current.resetEditor()
functions.pasteHTML
: Occurs right before passing a paste operation's content to the pasteFn prop. Note that the returned HTML string from thepasteFn
function is immediately inserted into the DOM without further sanitization.insertContentEditableFalseHTML
: Occurs right before inserting content into the DOM using theuseContentEditableFalse
hook'sinsertContentEditableFalseElement
function.getHTML
: Occurs post serialization, right before returning content through thesave()
oreditorRef.current.getHTML()
functions.initialSetLastSavedHTML
: Occurs right before initially setting the RichTextEditor's internal lastSavedHTML state variable.
Bandicoot expects your sanitizeHTML
function to return the sanitized HTML string.
Example
import sanitizeHTML from 'sanitize-html'
import {RichTextEditor, RichTextContainer} from 'bandicoot'
function MyEditor() {
return (
<RichTextContainer>
<RichTextEditor sanitizeHTML={sanitizeHTML} />
</RichTextContainer>
)
}
function myOwnSanitizeHtmlFunction(suspectHTML, actionType) {
// Implement your logic to sanitize 'suspectHTML' into 'sanitizedHTML'.
// If desired, key off of the 'actionType' string. For example:
// - if 'pasteHTML', sanitize using rules X, Y, and Z.
// - else, sanitize using just rules X and Y.
return sanitizedHTML;
}
Notes
1) If you use Bandicoot's richTextContext.getContentEditableElement()
to interact with the HTML directly, be aware that the content does not pass through the sanitizeHTML
function since you are accessing the DOM directly.
2) Content being sent to a server from a client should never be considered sanitized since a bad actor could tamper with client-side code or bypass sanitization completely by sending malicious data directly to a server's endpoints. Server-side incoming sanitization should be the primary line of defense instead of client-side outgoing sanitization.