Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old deploys take up way heaps of space #899

Open
dacook opened this issue Oct 5, 2023 · 5 comments
Open

Old deploys take up way heaps of space #899

dacook opened this issue Oct 5, 2023 · 5 comments

Comments

@dacook
Copy link
Member

dacook commented Oct 5, 2023

Description

We store the last 5 deploys on a server in the releases-old folder, but this can take up lots of space. (note that the current release is also in this directory).

The majority of the space used is from node_modules, currently 260mb (therefore 1.3gb used).
Logs also take up a bit.

Expected Behavior

There shouldn't be too much space used up

Actual Behaviour

# au-prod
du -sh apps/openfoodnetwork/releases-old
5.5G	apps/openfoodnetwork/releases-old

Jul 2024: 6.3 GiB

Severity

Possible Fix

Perhaps node_modules could be linked to a single shared folder (I think assets might be set up like this?).

But I think we can simply just store less old releases. I don't think we ever really refer to them.

@mkllnk
Copy link
Member

mkllnk commented Oct 5, 2023

When I go back to old releases then it's mostly for log files to see what happened two weeks ago. So we could think of storing them centrally with logrotate.

Keeping node modules centrally has the advantage of a quicker installation process because we don't need to duplicate all of them. But pruning is a challenge. If we just prune according to the current deploy then we lose the backup of the old releases. Or are the installation packages cached independently?

@dacook
Copy link
Member Author

dacook commented Oct 5, 2023

Good point about logs, we should consider storing them centrally.

For node_modules, I'm not sure if there's a global local cache (that would be a good idea). I had assumed it was copied from the previous release but I now see that's not the case. If it's not cached we should do something about that to speed up deploys.

But in any case, I reckon just reducing the number of old releases is simple and would be ok. How about the last 4 releases (that should cover almost a month of logs)?

@mkllnk
Copy link
Member

mkllnk commented Oct 5, 2023

How about the last 4 releases (that should cover almost a month of logs)?

Yes, good idea. We may do more deploys when something goes wrong and then we lose logs but that's a different issue.

For node_modules, I'm not sure if there's a global local cache

We could also copy the previous release, install and prune if we don't have another cache.

@dacook
Copy link
Member Author

dacook commented Jul 15, 2024

Silly me I mustn't have been looking closely when I first wrote this. There are ten things in the releases-old dir, but half are DB dumps. There's only 5 old releases (plus the current one). Eg:

ofn-admin@prod3:/home/openfoodnetwork/apps/openfoodnetwork/releases-old$ ls -1tr
2024-06-10-231508
2024-06-10-231508.sql.gz
2024-06-18-003347
2024-06-18-003347.sql.gz
2024-06-19-055150
2024-06-19-055150.sql.gz
2024-06-25-051021
2024-06-25-051021.sql.gz
2024-07-02-043842
2024-07-02-043842.sql.gz
2024-07-09-043115  # <-- current release

The logic is here, and takes a bit of explanation..

shell: set -o pipefail && ls -1tr | head -n -10

  1. set -o pipefail: Fail on any errors in the pipeline
  2. ls -1tr: List each file on 1 line, in order of time updated, reversed.
  3. head -n -10: Strip the last 10 lines.

This deletes the one oldest file/dir

ofn-admin@prod3:/home/openfoodnetwork/apps/openfoodnetwork/releases-old$ ls -1tr | head -n-10
2024-06-10-231508

This doesn't seem quite right, I think it means we delete the SQL or dir on alternating weeks. So it keeps about 4.5 old releases, not five. We should change that number to 9. But it will only make a small difference on disk usage...

@dacook
Copy link
Member Author

dacook commented Jul 15, 2024

So I think the focus should be on node_modules, but we could also consider the tmp folder too. Eg:

  273.5 MiB [##########] /node_modules
  129.7 MiB [####      ] /.git
  119.5 MiB [####      ] /tmp
  117.8 MiB [####      ] /log
   45.1 MiB [#         ] /public
...

I think re-using /tmp and public would be advantageous, but needs to be managed properly, which is hard. Better to avoid it (I also considered this for CI).

For node_modules, after a bit of reading, I found that:

  • yarn keeps a global cache storage of all downloaded packages, so it doesn't re-download them every time
    • This doesn't
  • It does copy the packages each time, but there's a setting you can change to hardlink from the cache them instead: https://yarnpkg.com/configuration/yarnrc#nmMode
    • The feature is discussed a lot here.
    • I think it would be worth using this, to save >1gb. But we should test it, to ensure that required packages aren't deleted:
      • when old releases are deleted by a deployment
      • when yarn cache clear (✅ seems to work fine on my dev env)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: All the things 💤
Development

No branches or pull requests

2 participants