•   3 months ago

The Elephant in the Room: Gemini LLM Latency Is the Real Bottleneck

Over the last 24 hours, I instrumented our entire Resume-vs-JD analysis pipeline with millisecond-level precision.
The results were… eye-opening.

Here’s the truth:
Redis cache lookup: 3 ms

JD embedding: 3 ms

Vector search: 1.2 seconds

Knowledge embedding: 748 ms

Knowledge ingestion: 188 ms

Cache write: 2 ms

Everything is fast.
Everything is optimized.
Everything is efficient.

Except one thing.

Gemini LLM: 48–60 seconds of pure latency
Across multiple runs:

Knowledge generation: 18–22 seconds

Final analysis: 30–36 seconds

Total Gemini LLM time: 50–60 seconds

That’s 94–96% of the entire pipeline runtime.

  • 0 comments

Log in or sign up for Devpost to join the conversation.