Niraj Adhikary • 3 months ago
The Elephant in the Room: Gemini LLM Latency Is the Real Bottleneck
Over the last 24 hours, I instrumented our entire Resume-vs-JD analysis pipeline with millisecond-level precision.
The results were… eye-opening.
Here’s the truth:
Redis cache lookup: 3 ms
JD embedding: 3 ms
Vector search: 1.2 seconds
Knowledge embedding: 748 ms
Knowledge ingestion: 188 ms
Cache write: 2 ms
Everything is fast.
Everything is optimized.
Everything is efficient.
Except one thing.
Gemini LLM: 48–60 seconds of pure latency
Across multiple runs:
Knowledge generation: 18–22 seconds
Final analysis: 30–36 seconds
Total Gemini LLM time: 50–60 seconds
That’s 94–96% of the entire pipeline runtime.
Log in or sign up for Devpost to join the conversation.

0 comments