-
-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitise HTML in long description of enterprise #12459
Sanitise HTML in long description of enterprise #12459
Conversation
app/services/html_sanitizer.rb
Outdated
# We offer an editor which supports certain tags but you can't insert just any | ||
# HTML, which would be dangerous. | ||
class HtmlSanitizer | ||
def self.sanitize(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be providing default allowed tags and attributes ? I makes to be consistent across the code base, and if you need something different that the default, you can then needed pass arguments. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be good. At the moment, we use different editors and have different tags and attributes. I thought that we should first bring the common sanitiser in for all places and then standardise. But now I'm thinking that it's okay to sanitise with the union of all tags and attributes instead of allowing only the supported tags for each attribute. You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for thinking about the big picture and solving it the safest and most efficient way.
I think it's worth considering adding the sanitisation to the view first, and "testing" out in prod for a couple of weeks, before we make destructive changes though. I don't know what HTML people currently have, and what workarounds they might be employing to format their about pages. Or am I being overly cautious?
app/models/enterprise.rb
Outdated
# Remove any unsupported HTML. | ||
def long_description=(html) | ||
super(HtmlSanitizer.sanitize( | ||
html, tags: %w[h1 h2 h3 h4 p b i u a], attributes: %w[href target], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that Trix supports some other tags, and actually uses strong
and em
instead of b
and i
. Ugh, and so many br
s:
https://staging.openfoodnetwork.org.au/elmore-compost-and-organics/shop
<p class="text-small ng-binding" data-controller="add-blank-to-link" ng-bind-html="::product.description_html">
<div>
Biochar is a porous charcoal well recognised for being able to hold water, create great building blocks for biology and a store of plant carbon.<br><br><br>
<strong>
bold
</strong>
<br>
<em>italic</em>
<br>
<del>
strikethrough
</del>
<br>
<a href="https://staging.openfoodnetwork.org.au/" target="_blank">
link
</a>
</br>
</br>
</br>
</br>
</br>
</br>
</div>
<h1>
heading
</h1>
<blockquote>
quote
</blockquote>
<pre>code</pre>
<ul>
<li>
bullet list
<ul>
<li>
indented
</li>
</ul>
</li>
</ul>
<ol>
<li>
number list
<ol>
<li>
indented
</li>
</ol>
</li>
</ol>
<div>
<br>
horizontal rule:
<hr>
<br>
Pasted tables get reformatted by Trix I guess:
<br>
<br>
<strong>
CompanyContactCountry
</strong>
<br>
Alfreds Futterkiste | Maria Anders | Germany
<br>
Centro comercial Moctezuma | Francisco Chang | Mexico
<br>
Ernst Handel | Roland Mendel | Austria
<br>
Island Trading | Helen Bennett | UK
<br>
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada
<br>
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</hr>
</br>
</div>
</p>
I would err on the side of being more permissive (of safe elements) and allow the other tags in case Trix allows pasting of other html, or we use a different editor one day. Is there a comprehensive list we could copy from somewhere? If not, can we add all the above examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I just realised that the Enterprise description uses a different editor!
So your list here should be enough (though it would be worth checking for br
).
Still, if we extend this to products we'd need to expand the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a comprehensive list we could copy from somewhere?
Well, when we omit our list of safe tags then Rails comes with a comprehensive list already. We could use that to allow more pasted HTML. But then we have a discrepancy of pasted vs edited. And when it's about security, I usually err on the side of caution and be more restrictive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the full standard list: https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb
It would allow certain CSS as well. I don't think it's all safe in our case, it could mess up the layout or maybe more.
app/services/html_sanitizer.rb
Outdated
# We offer an editor which supports certain tags but you can't insert just any | ||
# HTML, which would be dangerous. | ||
class HtmlSanitizer | ||
def self.sanitize(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 👍
def up | ||
Enterprise.where.not(long_description: [nil, ""]).find_each do |enterprise| | ||
enterprise.update!(long_description: sanitize(long_description)) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is un-reversable of course, so I think it's a bit risky (what if we messed up formatting for lots of enterprises).
Maybe it would be fine, and I don't want to make more effort unnecessarily, but would consider first sanitising the view and wait until it's been released for a couple of weeks before applying the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should at least test it first. The br
tag is a good example for something I forgot and where I'm not sure it's covered.
which project should this be coded against @mkllnk |
I wanted to give production a few weeks to wait for any reports of broken text. Then we can do a migration PR. |
af88077
to
16d6986
Compare
16d6986
to
53286c2
Compare
Existing descriptions have been sanitised in a migration. New descriptions are sanitised when assigned. That should cover everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great 👍
Hey @mkllnk, I've verified different types of styles used on enterprise descriptions are displayed as before: After staging this PR - and a hard page refresh: Looking good! |
ℹ️ Please use project Discover Regenerative (Macdoch pt 2): 3. Open Source Tech Evolution to track work on this issue.
What? Why?
We are now sanitising attributes on the model level but this pull request introduces the first database migration to migrate existing data. We were holding off because it's destructive but so far everything has been fine. I'm doing it for just one attribute here but if everything is find, we can probably do all other attributes in one migration in the next PR.
What should we test?
Release notes
Note that this includes potential un-reversable data deletion. Seems like a good time for a minor version increment.
Changelog Category (reviewers may add a label for the release notes):
The title of the pull request will be included in the release notes.
Dependencies
Documentation updates