Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitise HTML in long description of enterprise #12459

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

mkllnk
Copy link
Member

@mkllnk mkllnk commented May 10, 2024

ℹ️ Please use project Discover Regenerative (Macdoch pt 2): 3. Open Source Tech Evolution to track work on this issue.

What? Why?

We are now sanitising attributes on the model level but this pull request introduces the first database migration to migrate existing data. We were holding off because it's destructive but so far everything has been fine. I'm doing it for just one attribute here but if everything is find, we can probably do all other attributes in one migration in the next PR.

What should we test?

  • Visit Admin, Enterprise settings, About and edit the long description with the editor.
  • Everything you do with the editor should save and show in the frontend.
  • Existing descriptions should look the same after deploying this PR.

Release notes

Note that this includes potential un-reversable data deletion. Seems like a good time for a minor version increment.

Changelog Category (reviewers may add a label for the release notes):

  • User facing changes
  • API changes (V0, V1, DFC or Webhook)
  • Technical changes only
  • Feature toggled

The title of the pull request will be included in the release notes.

Dependencies

Documentation updates

@mkllnk mkllnk self-assigned this May 10, 2024
@mkllnk mkllnk marked this pull request as ready for review May 10, 2024 05:53
# We offer an editor which supports certain tags but you can't insert just any
# HTML, which would be dangerous.
class HtmlSanitizer
def self.sanitize(*args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be providing default allowed tags and attributes ? I makes to be consistent across the code base, and if you need something different that the default, you can then needed pass arguments. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good. At the moment, we use different editors and have different tags and attributes. I thought that we should first bring the common sanitiser in for all places and then standardise. But now I'm thinking that it's okay to sanitise with the union of all tags and attributes instead of allowing only the supported tags for each attribute. You are right.

Copy link
Member

@dacook dacook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for thinking about the big picture and solving it the safest and most efficient way.

I think it's worth considering adding the sanitisation to the view first, and "testing" out in prod for a couple of weeks, before we make destructive changes though. I don't know what HTML people currently have, and what workarounds they might be employing to format their about pages. Or am I being overly cautious?

# Remove any unsupported HTML.
def long_description=(html)
super(HtmlSanitizer.sanitize(
html, tags: %w[h1 h2 h3 h4 p b i u a], attributes: %w[href target],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that Trix supports some other tags, and actually uses strong and em instead of b and i. Ugh, and so many brs:

Screenshot 2024-05-13 at 11 18 09 am

https://staging.openfoodnetwork.org.au/elmore-compost-and-organics/shop

<p class="text-small ng-binding" data-controller="add-blank-to-link" ng-bind-html="::product.description_html">
<div>
   Biochar is a porous charcoal well recognised for being able to hold water, create great building blocks for biology and a store of plant carbon.<br><br><br>
   <strong>
   bold
   </strong>
   <br>
   <em>italic</em>
   <br>
   <del>
   strikethrough
   </del>
   <br>
   <a href="https://staging.openfoodnetwork.org.au/" target="_blank">
   link
   </a>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
</div>
<h1>
   heading
</h1>
<blockquote>
   quote
</blockquote>
<pre>code</pre>
<ul>
   <li>
      bullet list
      <ul>
         <li>
            indented
         </li>
      </ul>
   </li>
</ul>
<ol>
   <li>
      number list
      <ol>
         <li>
            indented
         </li>
      </ol>
   </li>
</ol>
<div>
   <br>
   horizontal rule:
   <hr>
   <br>
   Pasted tables get reformatted by Trix I guess:
   <br>
   <br>
   <strong>
   CompanyContactCountry
   </strong>
   <br>
   Alfreds Futterkiste | Maria Anders | Germany
   <br>
   Centro comercial Moctezuma | Francisco Chang | Mexico
   <br>
   Ernst Handel | Roland Mendel | Austria
   <br>
   Island Trading | Helen Bennett | UK
   <br>
   Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada
   <br>
   Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </hr>
   </br>
</div>
</p>

I would err on the side of being more permissive (of safe elements) and allow the other tags in case Trix allows pasting of other html, or we use a different editor one day. Is there a comprehensive list we could copy from somewhere? If not, can we add all the above examples?

Copy link
Member

@dacook dacook May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I just realised that the Enterprise description uses a different editor!
So your list here should be enough (though it would be worth checking for br).

Still, if we extend this to products we'd need to expand the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a comprehensive list we could copy from somewhere?

Well, when we omit our list of safe tags then Rails comes with a comprehensive list already. We could use that to allow more pasted HTML. But then we have a discrepancy of pasted vs edited. And when it's about security, I usually err on the side of caution and be more restrictive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the full standard list: https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb

It would allow certain CSS as well. I don't think it's all safe in our case, it could mess up the layout or maybe more.

# We offer an editor which supports certain tags but you can't insert just any
# HTML, which would be dangerous.
class HtmlSanitizer
def self.sanitize(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 👍

Comment on lines 7 to 31
def up
Enterprise.where.not(long_description: [nil, ""]).find_each do |enterprise|
enterprise.update!(long_description: sanitize(long_description))
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is un-reversable of course, so I think it's a bit risky (what if we messed up formatting for lots of enterprises).
Maybe it would be fine, and I don't want to make more effort unnecessarily, but would consider first sanitising the view and wait until it's been released for a couple of weeks before applying the migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We should at least test it first. The br tag is a good example for something I forgot and where I'm not sure it's covered.

@mkllnk mkllnk marked this pull request as draft May 14, 2024 02:42
@kirstenalarsen
Copy link
Contributor

which project should this be coded against @mkllnk

@dacook
Copy link
Member

dacook commented Jun 13, 2024

@mkllnk , just wondering if you have a plan for the migrations? Shall we include them as a final step before closing #12448 ?

PS I updated the release notes section above to suggest we do a minor version increment.

@mkllnk
Copy link
Member Author

mkllnk commented Jun 13, 2024

I wanted to give production a few weeks to wait for any reports of broken text. Then we can do a migration PR.

@mkllnk mkllnk added the technical changes only These pull requests do not contain user facing changes and are grouped in release notes label Sep 5, 2024
Existing descriptions have been sanitised in a migration. New
descriptions are sanitised when assigned. That should cover everything.
@mkllnk mkllnk marked this pull request as ready for review September 5, 2024 03:09
Copy link
Collaborator

@rioug rioug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@dacook dacook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👍

@filipefurtad0 filipefurtad0 self-assigned this Sep 19, 2024
@filipefurtad0 filipefurtad0 added pr-staged-au staging.openfoodnetwork.org.au pr-staged-uk staging.openfoodnetwork.org.uk and removed pr-staged-au staging.openfoodnetwork.org.au labels Sep 19, 2024
@filipefurtad0
Copy link
Contributor

Hey @mkllnk,

I've verified different types of styles used on enterprise descriptions are displayed as before:

Before this PR
image

After staging this PR - and a hard page refresh:
image

Performed a description edit:
image

Looking good!
Merging.

@filipefurtad0 filipefurtad0 merged commit a5d17b4 into openfoodfoundation:master Sep 19, 2024
55 checks passed
@mkllnk mkllnk deleted the description-html branch September 19, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-staged-uk staging.openfoodnetwork.org.uk technical changes only These pull requests do not contain user facing changes and are grouped in release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants