Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump the genai-workflow group across 1 directory with 12 updates #383

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Sep 12, 2024

Bumps the genai-workflow group with 12 updates in the /workflows/charts/huggingface-llm directory:

Package From To
accelerate 0.30.1 0.34.2
datasets 2.19.0 3.0.0
einops 0.7.0 0.8.0
evaluate 0.4.2 0.4.3
mkl-include 2023.2.0 2024.2.1
mkl 2023.2.0 2024.2.1
onnxruntime-extensions 0.10.1 0.12.0
onnxruntime 1.17.3 1.19.2
peft 0.11.1 0.12.0
protobuf 4.24.4 5.28.1
psutil 5.9.5 6.0.0
tokenizers 0.19.1 0.20.0

Updates accelerate from 0.30.1 to 0.34.2

Release notes

Sourced from accelerate's releases.

v0.34.1 Patchfix

Bug fixes

  • Fixes an issue where processed DataLoaders could no longer be pickled in #3074 thanks to @​byi8220
  • Fixes an issue when using FSDP where default_transformers_cls_names_to_wrap would separate _no_split_modules by characters instead of keeping it as a list of layer names in #3075

Full Changelog: huggingface/accelerate@v0.34.0...v0.34.1

v0.34.0: StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!

Dependency Changes

  • Updated Safetensors Requirement: The library now requires safetensors version 0.4.3.
  • Added support for Numpy 2.0: The library now fully supports numpy 2.0.0

Core

New Script Behavior Changes

  • Process Group Management: PyTorch now requires users to destroy process groups after training. The accelerate library will handle this automatically with accelerator.end_training(), or you can do it manually using PartialState().destroy_process_group().
  • MLU Device Support: Added support for saving and loading RNG states on MLU devices by @​huismiling
  • NPU Support: Corrected backend and distributed settings when using transfer_to_npu, ensuring better performance and compatibility.

DataLoader Enhancements

  • Stateful DataDataLoader: We are excited to announce that early support has been added for the StatefulDataLoader from torchdata, allowing better handling of data loading states. Enable by passing use_stateful_dataloader=True to the DataLoaderConfiguration, and when calling load_state() the DataLoader will automatically be resumed from its last step, no more having to iterate through passed batches.
  • Decoupled Data Loader Preparation: The prepare_data_loader() function is now independent of the Accelerator, giving you more flexibility towards which API levels you would like to use.
  • XLA Compatibility: Added support for skipping initial batches when using XLA.
  • Improved State Management: Bug fixes and enhancements for saving/loading DataLoader states, ensuring smoother training sessions.
  • Epoch Setting: Introduced the set_epoch function for MpDeviceLoaderWrapper.

FP8 Training Improvements

  • Enhanced FP8 Training: Fully Sharded Data Parallelism (FSDP) and DeepSpeed support now work seamlessly with TransformerEngine FP8 training, including better defaults for the quantized FP8 weights.
  • Integration baseline: We've added a new suite of examples and benchmarks to ensure that our TransformerEngine integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with raw TransformersEngine, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them here
  • Import Fixes: Resolved issues with import checks for the Transformers Engine that has downstream issues.
  • FP8 Docker Images: We've added new docker images for TransformerEngine and accelerate as well. Use docker pull huggingface/accelerate@gpu-fp8-transformerengine to quickly get an environment going.

torchpippy no more, long live torch.distributed.pipelining

  • With the latest PyTorch release, torchpippy is now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on
  • There are breaking examples and changes that comes from this shift. Namely:
    • Tracing of inputs is done with a shape each GPU will see, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of [1, n, n] rather than [2, n, n] as before.
    • We no longer support Encoder/Decoder models. PyTorch tracing for pipelining no longer supports encoder/decoder models, so the t5 example has been removed.
    • Computer vision model support currently does not work: There are some tracing issues regarding resnet's we are actively looking into.
  • If either of these changes are too breaking, we recommend pinning your accelerate version. If the encoder/decoder model support is actively blocking your inference using pippy, please open an issue and let us know. We can look towards adding in the old support for torchpippy potentially if needed.

Fully Sharded Data Parallelism (FSDP)

  • Environment Flexibility: Environment variables are now fully optional for FSDP, simplifying configuration. You can now fully create a FullyShardedDataParallelPlugin yourself manually with no need for environment patching:
from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)
  • FSDP RAM efficient loading: Added a utility to enable RAM-efficient model loading (by setting the proper environmental variable). This is generally needed if not using accelerate launch and need to ensure the env variables are setup properly for model loading:
from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
</tr></table> 

... (truncated)

Commits
  • c61f41c Release: v0.34.2
  • beb4378 Release: v0.34.1
  • e13bef2 Allow DataLoaderAdapter subclasses to be pickled by implementing __reduce__...
  • 73a1531 Fix FSDP auto_wrap using characters instead of full str for layers (#3075)
  • 159c0dd Release: v0.34.0
  • 8931e5e Remove skip_first_batches support for StatefulDataloader and fix all the te...
  • a848592 Speed up tests by shaving off subprocess when not needed (#3042)
  • 758d624 add set_epoch for MpDeviceLoaderWrapper (#3053)
  • b07ad2a Fix typo in comment (#3045)
  • 1d09a20 use duck-typing to ensure underlying optimizer supports schedulefree hooks (#...
  • Additional commits viewable in compare view

Updates datasets from 2.19.0 to 3.0.0

Release notes

Sourced from datasets's releases.

3.0.0

Dataset Features

  • Use Polars functions in .map()
    • Allow Polars as valid output type by @​psmyth94 in huggingface/datasets#6762

    • Example:

      >>> from datasets import load_dataset
      >>> ds = load_dataset("lhoestq/CudyPokemonAdventures", split="train").with_format("polars")
      >>> cols = [pl.col("content").str.len_bytes().alias("length")]
      >>> ds_with_length = ds.map(lambda df: df.with_columns(cols), batched=True)
      >>> ds_with_length[:5]
      shape: (5, 5)
      ┌─────┬───────────────────────────────────┬───────────────────────────────────┬───────────────────────┬────────┐
      │ idxtitlecontentlabelslength │
      │ ---------------    │
      │ i64strstrstru32    │
      ╞═════╪═══════════════════════════════════╪═══════════════════════════════════╪═══════════════════════╪════════╡
      │ 0The Joyful Adventure of Bulbasau… ┆ Bulbasaur embarked on a sunny qu… ┆ joyful_adventure180    │
      │ 1Pikachu's Quest for PeacePikachu, with his cheeky persona… ┆ peaceful_narrative138    │
      │ 2The Tender Tale of SquirtleSquirtle took everyone on a memo… ┆ gentle_adventure135    │
      │ 3Charizard's Heartwarming TaleCharizard found joy in helping o… ┆ heartwarming_story112    │
      │ 4Jolteon's Sparkling JourneyJolteon, with his zest for life,… ┆ celebratory_narrative111    │
      └─────┴───────────────────────────────────┴───────────────────────────────────┴───────────────────────┴────────┘
  • Support NumPy 2

Cache Changes

  • Use huggingface_hub cache by @​lhoestq in huggingface/datasets#7105
    • use the huggingface_hub cache for files downloaded from HF, by default at ~/.cache/huggingface/hub
    • cached datasets (Arrow files) will still be reloaded from the datasets cache, by default at ~/.cache/huggingface/datasets

Breaking changes

General improvements and bug fixes

... (truncated)

Commits

Updates einops from 0.7.0 to 0.8.0

Release notes

Sourced from einops's releases.

v0.8.0: tinygrad, small fixes and updates

TLDR

  • tinygrad backend added
  • resolve warning in py3.11 related to docstring
  • remove graph break for unpack
  • breaking TF layers were updated to follow new instructions, new layers compatible with TF 2.16, and not compatible with old TF (certainly does not work with TF2.13)

What's Changed

New Contributors

Full Changelog: arogozhnikov/einops@v0.7.0...v0.8.0

Commits

Updates evaluate from 0.4.2 to 0.4.3

Release notes

Sourced from evaluate's releases.

0.4.3

This release adds support for datasets>=3.0 by removing calls to deprecated code

What's Changed

Full Changelog: huggingface/evaluate@v0.4.2...v0.4.3

Commits

Updates mkl-include from 2023.2.0 to 2024.2.1

Updates mkl from 2023.2.0 to 2024.2.1

Commits

Updates onnxruntime-extensions from 0.10.1 to 0.12.0

Release notes

Sourced from onnxruntime-extensions's releases.

v0.12.0

What's Changed

  • Added C APIs for language, vision and audio processors including new FeatureExtractor for Whisper model
  • Support for Phi-3 Small Tokenizer and new OpenAI tiktoken format for fast loading of BPE tokenizers
  • Added new CUDA custom operators such as MulSigmoid, Transpose2DCast, ReplaceZero, AddSharedInput and MulSharedInput
  • Enhanced Custom Op Lite API on GPU and fused kernels for DORT
  • Bug fixes, including null bos_token for Qwen2 tokenizer and SentencePiece converted FastTokenizer issue on non-ASCII characters, as well as necessary updates for MSVC 19.40 and numpy 2.0 release

New Contributors

Full Changelog: microsoft/onnxruntime-extensions@v.0.11.0...v0.12.0

v0.11.0

What's changed

  • Created Java packaging pipeline and published to Maven repository.
  • Added support for conversion of Huggingface FastTokenizer into ONNX custom operator.
  • Unified the SentencePiece tokenizer with other Byte Pair Encoding (BPE) based tokenizers.
  • Fixed Whisper large model pre-processing bug.
  • Enabled eager execution for custom operator and refactored the header file structure.

Contributions

Contributors to ONNX Runtime Extensions include members across teams at Microsoft, along with our community members: @​sayanshaw24 @​wenbingl @​skottmckay @​natke @​hariharans29 @​jslhcl @​snnn @​kazssym @​YUNQIUGUO @​souptc @​yihonglyu

Commits
  • cb47d2c Update nuget extraction path for iOS xcframework (#792)
  • b27fbbe Update macosx framework packaging to follow apple guidelines (#776) (#789)
  • c7a2d45 Update build-package-for-windows.yml (#784)
  • 3ce1e9f Upgrade ESRP signing task from v2 to v5 (#780)
  • e113ed3 removed OpenAIAudioToText from config (#777)
  • c9c11b4 Fix the windows API missing issue and Linux shared library size issue for Jav...
  • c3145b8 add the decoder_prompt_id for whisper tokenizer (#775)
  • 620050f reimplement resize cpu kernel for image processing (#768)
  • d79299e increase timeout (#773)
  • 735041e increase timeout (#772)
  • Additional commits viewable in compare view

Updates onnxruntime from 1.17.3 to 1.19.2

Release notes

Sourced from onnxruntime's releases.

ONNX Runtime v1.19.2

Announcements

  • ORT 1.19.2 is a small patch release, fixing some broken workflows and introducing bug fixes.

Build System & Packages

  • Fixed the signing of native DLLs.
  • Disabled absl symbolize in Windows Release build to avoid dependency on dbghelp.dll.

Training

  • Restored support for CUDA compute capability 7.0 and 7.5 with CUDA 12, and 6.0 and 6.1 with CUDA 11.
  • Several fixes for training CI pipelines.

Mobile

  • Fixed ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access for CoreML EP.

Generative AI

  • Added CUDA kernel for Phi3 MoE.
  • Added smooth softmax support in CUDA and CPU kernels for the GroupQueryAttention operator.
  • Fixed number of splits calculations in GroupQueryAttention CUDA operator.
  • Enabled causal support in the MultiHeadAttention CUDA operator.

Contributors

@​prathikr, @​mszhanyi, @​edgchen1, @​tianleiwu, @​wangyems, @​aciddelgado, @​mindest, @​snnn, @​baijumeswani, @​MaanavD

Thanks to everyone who helped ship this release smoothly!

Full Changelog: microsoft/onnxruntime@v1.19.0...v1.19.2

ONNX Runtime v1.19.0

Announcements

  • Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: microsoft/onnxruntime@26250ae. This shouldn't effect much, but sorry for the inconvenience!

Build System & Packages

  • Numpy support for 2.x has been added
  • Qualcomm SDK has been upgraded to 2.25
  • ONNX has been upgraded from 1.16 → 1.16.1
  • Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
  • TensorRT 10.2 support added
  • Introduced Java CUDA 12 packages on Maven.
  • Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
  • Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
  • Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

Performance

  • Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
  • Implemented FlashAttention on CPU to improve performance for GenAI prompt cases

... (truncated)

Commits

Updates peft from 0.11.1 to 0.12.0

Release notes

Sourced from peft's releases.

v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more

Highlights

peft-v0 12 0

New methods

OLoRA

@​tokenizer-decode added support for a new LoRA initialization strategy called OLoRA (#1828). With this initialization option, the LoRA weights are initialized to be orthonormal, which promises to improve training convergence. Similar to PiSSA, this can also be applied to models quantized with bitsandbytes. Check out the accompanying OLoRA examples.

X-LoRA

@​EricLBuehler added the X-LoRA method to PEFT (#1491). This is a mixture of experts approach that combines the strength of multiple pre-trained LoRA adapters. Documentation has yet to be added but check out the X-LoRA tests for how to use it.

FourierFT

@​Phoveran, @​zqgao22, @​Chaos96, and @​DSAILatHKUST added discrete Fourier transform fine-tuning to PEFT (#1838). This method promises to match LoRA in terms of performance while reducing the number of parameters even further. Check out the included FourierFT notebook.

HRA

@​DaShenZi721 added support for Householder Reflection Adaptation (#1864). This method bridges the gap between low rank adapters like LoRA on the one hand and orthogonal fine-tuning techniques such as OFT and BOFT on the other. As such, it is interesting for both LLMs and image generation models. Check out the HRA example on how to perform DreamBooth fine-tuning.

Enhancements

  • IA³ now supports merging of multiple adapters via the add_weighted_adapter method thanks to @​alexrs (#1701).
  • Call peft_model.get_layer_status() and peft_model.get_model_status() to get an overview of the layer/model status of the PEFT model. This can be especially helpful when dealing with multiple adapters or for debugging purposes. More information can be found in the docs (#1743).
  • DoRA now supports FSDP training, including with bitsandbytes quantization, aka QDoRA ()#1806).
  • VeRA has been extended by @​dkopi to support targeting layers with different weight shapes (#1817).
  • @​kallewoof added the possibility for ephemeral GPU offloading. For now, this is only implemented for loading DoRA models, which can be sped up considerably for big models at the cost of a bit of extra VRAM (#1857).
  • Experimental: It is now possible to tell PEFT to use your custom LoRA layers through dynamic dispatching. Use this, for instance, to add LoRA layers for thus far unsupported layer types without the need to first create a PR on PEFT (but contributions are still welcome!) (#1875).

Examples

Changes

Casting of the adapter dtype

Important: If the base model is loaded in float16 (fp16) or bfloat16 (bf16), PEFT now autocasts adapter weights to float32 (fp32) instead of using the dtype of the base model (#1706). This requires more memory than previously but stabilizes training, so it's the more sensible default. To prevent this, pass autocast_adapter_dtype=False when calling get_peft_model, PeftModel.from_pretrained, or PeftModel.load_adapter.

Adapter device placement

The logic of device placement when loading multiple adapters on the same model has been changed (#1742). Previously, PEFT would move all adapters to the device of the base model. Now, only the newly loaded/created adapter is moved to the base model's device. This allows users to have more fine-grained control over the adapter devices, e.g. allowing them to offload unused adapters to CPU more easily.

PiSSA

... (truncated)

Commits
  • e6cd24c Release v0.12.0 (#1946)
  • 05f57e9 PiSSA, OLoRA: Delete initial adapter after conversion instead of the active a...
  • 2ce83e0 FIX Decrease memory overhead of merging (#1944)
  • ebcd079 [WIP] ENH Add support for Qwen2 (#1906)
  • ba75bb1 FIX: More VeRA tests, fix tests, more checks (#1900)
  • 6472061 FIX Prefix tuning Grouped-Query Attention (#1901)
  • e02b938 FIX PiSSA & OLoRA with rank/alpha pattern, rslora (#1930)
  • 5268495 FEAT Add HRA: Householder Reflection Adaptation (#1864)
  • 2aaf9ce ENH Sync LoRA tp_layer methods with vanilla LoRA (#1919)
  • a019f86 FIX sft script print_trainable_parameters attr lookup (#1928)
  • Additional commits viewable in compare view

Updates protobuf from 4.24.4 to 5.28.1

Commits
  • 10ef3f7 Updating version.json and repo version numbers to: 28.1
  • d70f077 Merge pull request #18191 from protocolbuffers/cp-ruby-upb
  • 60e585c Update staleness
  • 70b77de Fix a potential Ruby-upb use of uninitialized memory.
  • 5b4b3af Merge pull request #18188 from acozzette/28-fix
  • 8ea3bb1 Fix compiler error with StrongReferenceToType()
  • 9deedf0 upb: fix uninitialized upb_MessageValue buffer bugs (#18160)
  • 3454ed8 Merge pull request #18013 from protocolbuffers/28.x-202408281753
  • 976ab41 Updating version.json and repo version numbers to: 28.1-dev
  • 439c42c Updating version.json and repo version numbers to: 28.0
  • Additional commits viewable in compare view

Updates psutil from 5.9.5 to 6.0.0

Changelog

Sourced from psutil's changelog.

6.0.0

2024-06-18

Enhancements

  • 2109_: maxfile and maxpath fields were removed from the namedtuple returned by disk_partitions()_. Reason: on network filesystems (NFS) this can potentially take a very long time to complete.
  • 2366_, [Windows]: log debug message when using slower process APIs.
  • 2375_, [macOS]: provide arm64 wheels. (patch by Matthieu Darbois)
  • 2396_: process_iter()_ no longer pre-emptively checks whether PIDs have been reused. This makes process_iter()_ around 20x times faster.
  • 2396_: a new psutil.process_iter.cache_clear() API can be used the clear process_iter()_ internal cache.
  • 2401_, Support building with free-threaded CPython 3.13. (patch by Sam Gross)
  • 2407_: Process.connections()_ was renamed to Process.net_connections()_. The old name is still available, but it's deprecated (triggers a DeprecationWarning) and will be removed in the future.
  • 2425_: [Linux]: provide aarch64 wheels. (patch by Matthieu Darbois / Ben Raz)

Bug fixes

  • 2250_, [NetBSD]: Process.cmdline()_ sometimes fail with EBUSY. It usually happens for long cmdlines with lots of arguments. In this case retry getting the cmdline for up to 50 times, and return an empty list as last resort.
  • 2254_, [Linux]: offline cpus raise NotImplementedError in cpu_freq() (patch by Shade Gladden)
  • 2272_: Add pickle support to psutil Exceptions.
  • 2359_, [Windows], [CRITICAL]: pid_exists()_ disagrees with Process_ on whether a pid exists when ERROR_ACCESS_DENIED.
  • 2360_, [macOS]: can't compile on macOS < 10.13. (patch by Ryan Schmidt)
  • 2362_, [macOS]: can't compile on macOS 10.11. (patch by Ryan Schmidt)
  • 2365_, [macOS]: can't compile on macOS < 10.9. (patch by Ryan Schmidt)
  • 2395_, [OpenBSD]: pid_exists()_ erroneously return True if the argument is a thread ID (TID) instead of a PID (process ID).
  • 2412_, [macOS]: can't compile on macOS 10.4 PowerPC due to missing MNT_ constants.

Porting notes

Version 6.0.0 introduces some changes which affect backward compatibility:

  • 2109_: the namedtuple returned by disk_partitions()_' no longer has maxfile and maxpath fields.
  • 2396_: process_iter()_ no longer pre-emptively checks whether PIDs have been reused. If you want to check for PID reusage you are supposed to use Process.is_running()_ against the yielded Process_ instances. That will also automatically remove reused PIDs from process_iter()_ internal cache.

... (truncated)

Commits
  • 3d5522a release
  • 5b30ef4 Add aarch64 manylinux wheels (#2425)
  • 1d092e7 test subprocesses: sleep() with an interval of 0.1 to make the test process m...
  • 5f80c12 Fix #2412, [macOS]: can't compile on macOS 10.4 PowerPC due to missing MNT_...
  • 89b6096 process_iter(): use another global var to keep track of reused PIDs
  • 9421bf8 openbsd: skip test if cmdline() returns [] due to EBUSY
  • 4b1a054 Fix #2250 / NetBSD / cmdline: retry on EBUSY. (#2421)
  • 20be5ae ruff: enable and fix 'unused variable' rule
  • 5530985 chore(ci): update actions (#2417)
  • 1c7cb0a Don't build with limited API for 3.13 free-threaded build (#2402)
  • Additional commits viewable in compare view

Updates tokenizers from 0.19.1 to 0.20.0

Release notes

Sourced from tokenizers's releases.

Release v0.20.0: faster encode, better python support

Release v0.20.0

This release is focused on performances and user experience.

Performances:

First off, we did a bit of benchmarking, and found some place for improvement for us! With a few minor changes (mostly #1587) here is what we get on Llama3 running on a g6 instances on AWS https://github.com/huggingface/tokenizers/blob/main/bindings/python/benches/test_tiktoken.py : image

Python API

We shipped better deserialization errors in general, and support for __str__ and __repr__ for all the object. This allows for a lot easier debugging see this:

>>> from tokenizers import Tokenizer;
>>> tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
>>> print(tokenizer)
Tokenizer(version="1.0", truncation=None, padding=None, added_tokens=[{"id":0, "content":"[PAD]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":100, "content":"[UNK]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":101, "content":"[CLS]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":102, "content":"[SEP]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":103, "content":"[MASK]", "single_word":False, "lstrip":False, "rstrip":False, ...}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0)], pair=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id="[SEP]", type_id=1)], special_tokens={"[CLS]":SpecialToken(id="[CLS]", ids=[101], tokens=["[CLS]"]), "[SEP]":SpecialToken(id="[SEP]", ids=[102], tokens=["[SEP]"])}), decoder=WordPiece(prefix="##", cleanup=True), model=WordPiece(unk_token="[UNK]", continuing_subword_prefix="##", max_input_chars_per_word=100, vocab={"[PAD]":0, "[unused0]":1, "[unused1]":2, "[unused2]":3, "[unused3]":4, ...}))
>>> tokenizer
Tokenizer(version="1.0", truncation=None, padding=None, added_tokens=[{"id":0, "content":"[PAD]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":100, "content":"[UNK]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":101, "content":"[CLS]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":102, "content":"[SEP]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":103, "content":"[MASK]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0)], pair=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id="[SEP]", type_id=1)], special_tokens={"[CLS]":SpecialToken(id="[CLS]", ids=[101], tokens=["[CLS]"]), "[SEP]":SpecialToken(id="[SEP]", ids=[102], tokens=["[SEP]"])}), decoder=WordPiece(prefix="##", cleanup=True), model=WordPiece(unk_token="[UNK]", continuing_subword_prefix="##", max_input_chars_per_word=100, vocab={"[PAD]":0, "[unused0]":1, "[unused1]":2, ...}))

The pre_tokenizer.Sequence and normalizer.Sequence are also more accessible now:

from tokenizers import normalizers
norm = normalizers.Sequence([normalizers.Strip(), normalizers.BertNormalizer()])
norm[0]
norm[1].lowercase=False

What's Changed

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Sep 12, 2024
Copy link

github-actions bot commented Sep 12, 2024

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

License Issues

workflows/charts/huggingface-llm/requirements.txt

PackageVersionLicenseIssue Type
mkl-include2024.2.1NullUnknown License
mkl2024.2.1NullUnknown License
protobuf5.28.1NullUnknown License

OpenSSF Scorecard

Scorecard details
PackageVersionScoreDetails
pip/accelerate 0.34.2 🟢 6.3
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 17 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
Packaging🟢 10packaging workflow detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
SAST🟢 4SAST tool is not run on all commits -- score normalized to 4
pip/datasets 3.0.0 🟢 5.9
Details
CheckScoreReason
Code-Review🟢 4Found 13/30 approved changesets -- score normalized to 4
Maintained🟢 1030 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/einops 0.8.0 🟢 4.5
Details
CheckScoreReason
Maintained🟢 65 commit(s) and 3 issue activity found in the last 90 days -- score normalized to 6
Code-Review⚠️ 1Found 4/24 approved changesets -- score normalized to 1
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/evaluate 0.4.3 🟢 5.9
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 105 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 91 existing vulnerabilities detected
SAST🟢 4SAST tool is not run on all commits -- score normalized to 4
pip/mkl 2024.2.1 UnknownUnknown
pip/mkl-include 2024.2.1 UnknownUnknown
pip/onnxruntime 1.19.2 🟢 6.8
Details
CheckScoreReason
Code-Review🟢 10all last 30 commits are reviewed through GitHub
Maintained🟢 1030 commit(s) out of 30 and 8 issue activity out of 30 found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no badge detected
Vulnerabilities🟢 10no vulnerabilities detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1no published package detected
License🟢 10license file detected
Token-Permissions⚠️ 0non read-only tokens detected in GitHub workflows
Dependency-Update-Tool🟢 10update tool detected
Binary-Artifacts🟢 10no binaries found in the repo
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/onnxruntime-extensions 0.12.0 🟢 6.1
Details
CheckScoreReason
Code-Review🟢 9Found 27/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 6 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
Binary-Artifacts🟢 7binaries present in source code
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/peft 0.12.0 UnknownUnknown
pip/protobuf 5.28.1 🟢 6.8
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
CI-Tests🟢 1025 out of 25 merged PRs checked by a CI test -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Code-Review⚠️ 0found 28 unreviewed changesets out of 30 -- score normalized to 0
Contributors🟢 1012 different organizations found -- score normalized to 10
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Dependency-Update-Tool🟢 10update tool detected
Fuzzing🟢 10project is fuzzed
License🟢 9license file detected
Maintained🟢 1030 commit(s) out of 30 and 15 issue activity out of 30 found in the last 90 days -- score normalized to 10
Packaging⚠️ -1no published package detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Security-Policy🟢 10security policy file detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Vulnerabilities🟢 73 existing vulnerabilities detected
pip/psutil 6.0.0 🟢 5.8
Details
CheckScoreReason
Maintained🟢 106 commit(s) and 7 issue activity found in the last 90 days -- score normalized to 10
Code-Review⚠️ 2Found 8/30 approved changesets -- score normalized to 2
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 10security policy file detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing🟢 10project is fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/tokenizers 0.20.0 🟢 5
Details
CheckScoreReason
Code-Review🟢 8Found 24/27 approved changesets -- score normalized to 8
Maintained🟢 1030 commit(s) and 19 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Binary-Artifacts🟢 10no binaries found in the repo
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Vulnerabilities⚠️ 012 existing vulnerabilities detected

Scanned Manifest Files

workflows/charts/huggingface-llm/requirements.txt

Bumps the genai-workflow group with 12 updates in the /workflows/charts/huggingface-llm directory:

| Package | From | To |
| --- | --- | --- |
| [accelerate](https://github.com/huggingface/accelerate) | `0.30.1` | `0.34.2` |
| [datasets](https://github.com/huggingface/datasets) | `2.19.0` | `3.0.0` |
| [einops](https://github.com/arogozhnikov/einops) | `0.7.0` | `0.8.0` |
| [evaluate](https://github.com/huggingface/evaluate) | `0.4.2` | `0.4.3` |
| [mkl-include](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html) | `2023.2.0` | `2024.2.1` |
| [mkl](https://github.com/oneapi-src/oneMKL) | `2023.2.0` | `2024.2.1` |
| [onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions) | `0.10.1` | `0.12.0` |
| [onnxruntime](https://github.com/microsoft/onnxruntime) | `1.17.3` | `1.19.2` |
| [peft](https://github.com/huggingface/peft) | `0.11.1` | `0.12.0` |
| [protobuf](https://github.com/protocolbuffers/protobuf) | `4.24.4` | `5.28.1` |
| [psutil](https://github.com/giampaolo/psutil) | `5.9.5` | `6.0.0` |
| [tokenizers](https://github.com/huggingface/tokenizers) | `0.19.1` | `0.20.0` |



Updates `accelerate` from 0.30.1 to 0.34.2
- [Release notes](https://github.com/huggingface/accelerate/releases)
- [Commits](huggingface/accelerate@v0.30.1...v0.34.2)

Updates `datasets` from 2.19.0 to 3.0.0
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@2.19.0...3.0.0)

Updates `einops` from 0.7.0 to 0.8.0
- [Release notes](https://github.com/arogozhnikov/einops/releases)
- [Commits](arogozhnikov/einops@v0.7.0...v0.8.0)

Updates `evaluate` from 0.4.2 to 0.4.3
- [Release notes](https://github.com/huggingface/evaluate/releases)
- [Commits](huggingface/evaluate@v0.4.2...v0.4.3)

Updates `mkl-include` from 2023.2.0 to 2024.2.1

Updates `mkl` from 2023.2.0 to 2024.2.1
- [Release notes](https://github.com/oneapi-src/oneMKL/releases)
- [Commits](https://github.com/oneapi-src/oneMKL/commits)

Updates `onnxruntime-extensions` from 0.10.1 to 0.12.0
- [Release notes](https://github.com/microsoft/onnxruntime-extensions/releases)
- [Commits](microsoft/onnxruntime-extensions@v0.10.1...v0.12.0)

Updates `onnxruntime` from 1.17.3 to 1.19.2
- [Release notes](https://github.com/microsoft/onnxruntime/releases)
- [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
- [Commits](microsoft/onnxruntime@v1.17.3...v1.19.2)

Updates `peft` from 0.11.1 to 0.12.0
- [Release notes](https://github.com/huggingface/peft/releases)
- [Commits](huggingface/peft@v0.11.1...v0.12.0)

Updates `protobuf` from 4.24.4 to 5.28.1
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](protocolbuffers/protobuf@v4.24.4...v5.28.1)

Updates `psutil` from 5.9.5 to 6.0.0
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](giampaolo/psutil@release-5.9.5...release-6.0.0)

Updates `tokenizers` from 0.19.1 to 0.20.0
- [Release notes](https://github.com/huggingface/tokenizers/releases)
- [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md)
- [Commits](huggingface/tokenizers@v0.19.1...v0.20.0)

---
updated-dependencies:
- dependency-name: accelerate
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
- dependency-name: datasets
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: genai-workflow
- dependency-name: einops
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
- dependency-name: evaluate
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: genai-workflow
- dependency-name: mkl-include
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: genai-workflow
- dependency-name: mkl
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: genai-workflow
- dependency-name: onnxruntime-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
- dependency-name: onnxruntime
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
- dependency-name: peft
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
- dependency-name: protobuf
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: genai-workflow
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: genai-workflow
- dependency-name: tokenizers
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: genai-workflow
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/workflows/charts/huggingface-llm/genai-workflow-a5c124042f branch from 6e57d7e to 50be199 Compare September 16, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants