Postdoc in Large Language Model Inferencing

A full-time research & academia role at KTH Royal Institute of Technology, based in Stockholm, Sweden.

Full-time Posted 13 May 2026

Position closed.

The deadline (25 May 2026) has passed.

About the role

Two- to three-year postdoctoral position at KTH on systems-side research for large language model inference, serving throughput, GPU memory efficiency, KV-cache management, speculative decoding, and distributed inference across multiple accelerators. Deadline 25 May 2026.

Responsibilities

Design and prototype systems for efficient LLM inference at scale.
Build open-source artefacts on top of vLLM, TensorRT-LLM, or similar engines.
Publish in OSDI, SOSP, MLSys, ASPLOS, or similar systems venues.
Collaborate with industrial partners on production-ready inference deployments.

Requirements

Doctoral degree in computer science, electrical engineering, or a related field, defended by the start date.
Strong systems background: GPU programming (CUDA), distributed systems, or compiler design.
Hands-on familiarity with at least one large model serving stack.
Fluent English.

Nice to have

First-author publications in systems venues.
Open-source contributions to vLLM, TensorRT-LLM, DeepSpeed, or similar.
Experience working on multi-GPU or multi-node inference.

How to apply

Search “Large Language Model Inferencing” on KTH’s Varbi portal at the apply link. Submit CV, transcripts, research statement, and the “Five most meritorious scientific articles” document before 25 May 2026.

Share this role

Know someone qualified? A quick share saves them the search.

WhatsApp X LinkedIn Email