Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes and improvements #7

Open
davidspek opened this issue Aug 11, 2023 · 2 comments
Open

Some fixes and improvements #7

davidspek opened this issue Aug 11, 2023 · 2 comments

Comments

@davidspek
Copy link

Hi LatchBio. First of all thanks for creating and sharing this repo. While we at Plural were trying to use the DaemonSet installation method from the Sysbox docs we ran into a number of problems and the custom AMI you've built was much more stable.

However, when trying to use the AMI from this repo to run KinD we ran into problems that were caused by the crio.conf not being setup properly. Note that other docker containers were able to run within the pod on our cluster, but specifically not KinD. This lead us down a journey to understand each step being executed in the packer build, as well as all the scripts used by the sysbox daemonset deployment script.

What we ended up doing is rewriting most of the build script so that it is equivalent to the script from sysbox, along with comments for the relevant function so upstream changes by sysbox can more easily be found and maintained. Another reason for doing this is because possible incorrect configurations might lead to security vulnerabilities in the system so we wanted to be absolutely sure CRI-O and sysbox are configured properly. The biggest changes here are that we use the sysbox installer container image to grab the relevant binaries and configs for sysbox and CRI-O and install them the same as the upstream installer script does. It took some time to test all the changes but eventually the images were fully working and we were able to run KinD. We were even able to launch a container, run KinD within that container, then use CAPI to deploy another 6 node Kubernetes cluster with the docker provider and install Calico on that cluster. It's a bit container-inception but I hope it's somewhat clear.

Finally, we setup some GitHub actions that use semantic release, configure-aws-credentials that uses OIDC so GitHub can securely authenticate against AWS without static credentials and of course packer to automatically build AMIs for multiple k8s version and system architectures. To setup the OIDC configuration between AWS and GitHub you can use this terraform module. The action also caches the files we are sourcing from the sysbox installer container so builds are pretty fast.

What we will still add very shortly is a small configuration that will allow us to use Renovate to keep the environment variable that sets the sysbox version in the GitHub action YAML updated automatically (by creating PRs, but can be set to auto-merge). This way there will be very little maintenance overhead for us in the future as new versions of Sysbox are released.

If any of this sounds interesting to you, you can have a look at our fork. We'd also be happy to contribute this back upstream in a PR if you like, or feel free to use our changes and implement them here yourselves. Note that our AMIs are public, so if you'd like to use them directly (for testing or otherwise) you are able to. We're pushing them to all the regular AWS regions.

In the future we might also expand our repo to include images for Azure and GCP, which probably is less interesting to you but I thought I'd mention it.

@maximsmol
Copy link
Member

Hi! This is awesome to hear. Very glad you got some use out of this repo.

I'm going to open a PR to look at the diff, very curious what we got wrong on our end. I tried to match the installer as closely as possible too, but my understanding of Kubernetes setups and container runtimes is far from perfect. Might also be that the upstream installation script has changed a lot over the past year.

The only issues we have experienced are related to using host devices for FUSE and GPU through a bind-mount, so I will test against your fork to see if it's incidentally fixed as well. If not, we'll probably have pretty low interest in adopting your version given that everything works OK for us (barring any security implications that I see from the diff).


I'll add a banner on top of the readme linking to your version. All the CI/CD you've set up is certainly enough to prefer using it over our script. We're certainly less dedicated to actually supporting this as an open-source product :)

@davidspek
Copy link
Author

I’ve moved some files around to organize them a bit more so hopefully you can still get something useful from the diff.

In terms of the permission issues, this is also what we were seeing when trying to run KinD so possibly our images also solve this for FUSE host devices and GPUs. I’d be very interested to know how your testing goes regarding this. I’d be happy to collaborate with you to solve these issues if you can share the errors or exact problems so we can try to replicate it.

I think 2 parties using and maintaining something like this is certainly better than one so if we can solve the issues you mentioned maybe there’s also some room to collaborate further in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants