How do i increase inference speed for a given model? #4887

tsvisab · 2024-05-17T15:11:20Z

tsvisab
May 17, 2024

For example, i'm using Llama3 70B, how do i increase inference speed when i know the expected throughput to be some N(requests)/minute?
what if it's low? do i have a way to increase the speed of a single request?
what if the prompt is partially the same within following requests?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do i increase inference speed for a given model? #4887

{{title}}

Replies: 0 comments

Select a reply

How do i increase inference speed for a given model? #4887

tsvisab May 17, 2024

Replies: 0 comments

tsvisab
May 17, 2024