via indeed · 10 juin 2026 ·il y a 3 jours

GPU Engineer

Kog
Paris Temps plein Remote
1 231 autres offres à Paris.
Importez votre CV et voyez lesquelles vous correspondent vraiment.
Importer mon CV

About Kog
-------------

Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).

The hot path is a monokernel implemented with handwritten CUDA (with PTX inline assembly) on NVIDIA, and HIP (with CDNA ISA inline assembly) on AMD.

We optimize at the low level with engine/kernel/model co\-design, using reverse engineering to understand and exploit the details of how the GPU hardware works at the micro level.

We are a team of 11 people, including 10 engineers and 4 PhDs.

Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.

What you will work on
-------------------------

You will perform experiments to understand GPU internals, find creative solutions to accelerate critical computational sections used in LLM inference, and write optimized GPU kernels accordingly. Then test, profile, and optimize again.

  • Contribute to our monokernel pipeline, the single persistent GPU program that covers the full decode pass from QKV projection to LM head sampling, across AMD and NVIDIA architectures.

  • Work on low\-level GPU optimization, including impossibly\-fast grid synchronizations and inter\-GPU collectives, and optimized GEMM and attention kernels for specific batch sizes and context lengths.

  • Build profiling infrastructure inside a monokernel, including custom instrumentation, device\-timestamp frameworks, and per\-stage analysis to translate machine behavior into concrete engineering decisions.

  • Scale the stack to third\-party MoE models such as DeepSeek v4 and Qwen 3 to push generation speed on the models that matter in production today.

  • Contribute to building AI agents that will perform GPU Engineering research and kernel optimization autonomously, calibrated to hardware target and workload, starting from the inference foundations we are building now.
What we look for
--------------------

You have written GPU kernels where performance was the central constraint. Showing the code is a requirement to move forward in the process.

PyTorch custom ops are an acceptable starting point if the kernels show a genuine understanding of the hardware below the framework level.

Stronger signals include inline PTX or CDNA ISA in public repositories, experience with latency\-sensitive execution paths, understanding of why MBU matters more than MFU at batch size 1, and a background in inference engine components.

A top engineering school or a PhD with concrete GPU work counts, even without industry experience.

What we offer
-----------------

  • Direct access to AMD and NVIDIA datacenter GPUs from day one

  • A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions

  • Problems that sit on the critical path of model execution speed and that directly influence what the system can become

  • A remote\-friendly working model, though you'll spend at least 50% of your time in our Paris office.

  • Compensation aligned with top technical profiles in the Paris GPU Engineering market, including equity

Le marché pour ce type de poste

Offres similaires
1 231
postes Ingénierie à Paris
Temps plein
83%
des offres Ingénierie en France
Télétravail possible
3%
des offres Ingénierie
Kog

2 postes ouverts · Paris

📊 Ingénierie · France
38 726
offres actives
3%
Remote
Ø 1d
Ø en ligne
Compétences les plus demandées
ExcelERPISOPythonAWSCI/CDSQLAzureAgileLean

Questions fréquentes

Combien d'offres Ingénierie sont disponibles à Paris ?
Actuellement 1 231 postes en Ingénierie à Paris sur AlmostHired, dans 410 entreprises différentes. Nos données sont mises à jour quotidiennement.
Est-ce que les postes Ingénierie offrent du télétravail ?
3% des offres Ingénierie en France permettent le télétravail, partiel ou total. Pour filtrer spécifiquement les postes en remote, utilisez AlmostHired.
Comment savoir si je corresponds à cette offre ?
Déposez votre CV — notre IA compare votre profil aux exigences du poste et vous donne un score de compatibilité précis, avec les compétences qui correspondent et celles qui manquent.