Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for corruption in downloaded files #806

Open
1 of 7 tasks
fmarier opened this issue Jan 6, 2024 · 6 comments
Open
1 of 7 tasks

Look for corruption in downloaded files #806

fmarier opened this issue Jan 6, 2024 · 6 comments
Assignees

Comments

@fmarier
Copy link
Member

fmarier commented Jan 6, 2024

https://community.brave.com/t/ads-and-trackers-blockers-breaks-all-websites/524842 reported widespread webcompat problems. This was due to a corrupted cookie list. We should be validating filter lists after downloading them. List of things to check (in priority order):

  • If there is a checksum on an adblock list, verify the checksum at crx build time in a place similar to https://github.com/brave/adblock-resources/blob/master/index.js#L7
  • File size being too small
  • Check the MIME/file type (should be a text file)
  • Generate a warning if the file has invalid rules sent out via Sentry
  • Check for lines which are too long (the whole file should not be in one line for example)
  • Check file diff sizes before publishing (shouldn’t be too large)
  • Check against “known-good” sites before publishing (nothing should be blocked): https://github.com/brave/devops/issues/11117

In all these cases, we should not publish a new version of the list/component and send a Sentry alert like how it's done already in the code: https://github.com/brave/brave-core-crx-packager/blob/master/scripts/generateAdBlockRustDataFiles.js#L101

@ryanbr
Copy link

ryanbr commented Jan 6, 2024

@fmarier
Copy link
Member Author

fmarier commented Jan 6, 2024

Example of data corruption:

singkinderlieder.de##+js(set-cookie, acceptMatomo, true)

became:

singkinderlieder.de-o+js(set-clert, acceptMatomo, true)

@fmarier
Copy link
Member Author

fmarier commented Jan 6, 2024

For added robustness, it may be worth verifying the checksum both when downloading the upstream file / building the component, as well as client-side when loading the list.

@ShivanKaul
Copy link
Collaborator

ShivanKaul commented Jan 6, 2024

From checking S3, it looks like the corrupted file was shipped to clients. To be safe though, we should check the list before loading into brave-core, since the corruption might happen as part of the CRX zipping process.

@ShivanKaul
Copy link
Collaborator

I edited the top-level issue description to have list of validation checks we can do.

@bbondy
Copy link
Member

bbondy commented Jan 10, 2024

I added an item to the top level issue (item 0) "Verify checksums when available at crx build time in a place similar to https://github.com/brave/adblock-resources/blob/master/index.js#L7"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants