November 2025 - Senior Python Engineer - Distributed Systems & AI Infrastructure
Description
We're working on a Layer-1 blockchain that merges high-performance SVM compatibility with an innovative proof-of-work mechanism, providing large-scale verified inference. The core idea is to deeply integrate AI inference and blockchain, creating a decentralized AI economy. Instead of traditional proof-of-stake systems, this uses an incentive mechanism similar to Bitcoin—offering predictable profits to nodes that perform inference, fine-tuning, or training. The challenge? Making AI computations provably correct and auditable on-chain, at scale, with sub-1% verification overhead on 600B+ parameter models. This is low-level ML infrastructure work—you'll be writing custom CUDA kernels, extending PyTorch with C++ operators, and modifying inference engines like VLLM to handle workloads they weren't designed for. You need to be comfortable reading CUDA code, optimizing GPU memory layouts, and debugging performance issues at the kernel level. This isn't about calling APIs or training models—it's about making the runtime itself faster and more reliable.
Contact
careers@codilas.comSalary range
12,000 – 16,000 EUR (Brutto I)
Your Role and Contributions
- Optimize Verified Inference Algorithms. You'll be modifying VLLM internals and writing custom CUDA kernels to make verified AI inference actually work at scale. This means profiling GPU memory access patterns, optimizing batch processing logic, and squeezing out every bit of performance while maintaining the verification guarantees that make the whole system trustworthy.
- Enhance VLLM for Better LoRA Support. Make modifications to VLLM's C++/CUDA backend so it plays nicer with LoRAs (Low-Rank Adaptations). This involves understanding the inference engine's memory management and kernel dispatch well enough to add features that the upstream project doesn't prioritize but are critical for production use.
- Build and Maintain Infrastructure UI. Handle some light frontend work to keep the interface with the chain, agent frameworks, and Python backend running smoothly. Nothing too fancy—just enough to make the system usable and maintainable.
- Benchmark and Stabilize the System. Help build end-to-end benchmarking infrastructure and maintain system stability. This is about making sure the network can handle real workloads without falling over, and catching problems before they become production issues.
- Develop System Jobs for Network Nodes. Write and maintain Python jobs that run on every node when there's spare capacity. These jobs generate synthetic data to improve their system model (currently GLM 4.6), which is a pretty interesting problem in distributed computing.
- Enable Distributed Inference. Build configs and tooling that enable sharded (distributed) inference across multiple nodes. The goal is to make it possible to run massive models by splitting the work intelligently across the network.
Required Skills & Qualifications
- 5+ years of production ML infrastructure experience—not just training models, but building the systems that make them run fast
- Real experience with LLM inference optimization (VLLM, TensorRT, SGLang, or similar). You should know what KV cache is and why it matters
- Solid C/C++ skills and hands-on CUDA experience. You need to be able to write and debug GPU kernels, not just call PyTorch functions
- Deep understanding of PyTorch internals—how autograd works, how to write custom operators, how memory is managed on GPU
- Track record of optimizing inference latency and throughput in production systems. Bonus if you have dealt with quantization, pruning, or custom accelerators
- Comfortable working in messy, real-world codebases where the documentation is sparse and you have to figure things out by reading source code
Nice-to-Have
- Experience with TensorRT, ONNX Runtime, or other inference optimization frameworks
- Background in distributed training (Horovod, DeepSpeed, FSDP) or distributed inference
- Contributions to open-source ML infrastructure projects (VLLM, TensorRT-LLM, etc.)
- Experience with ML accelerators (TPU, custom ASICs, or embedded GPU platforms like Jetson)
- Familiarity with blockchain or distributed systems concepts
Benefits & Compensation
- Token equity compensation based on experience - with token launch planned for 2026 (much faster liquidity than typical 5-10 year startup exits)
- Work on a project backed by top-tier VCs (a16z, Delphi) that is already at testnet stage with proven core technology
- Remote-first with flexible hours
- Opportunity to work on cutting-edge problems at the intersection of AI and blockchain
Selection process
- Share your CV and relevant work samples.
- Submit CV
- Conversational technical assessment - verbal multiple choice, not whiteboarding.
- Tech Screen
- Collaborative debugging session on a realistic codebase problem.
- Paired Coding
- 30-60 minute conversation with the founder about working style and expectations.
- Cultural Fit
- Formal offer with equity compensation details.
- Offer
What do we offer?
Flexible Working Hours
Manage your own schedule to maintain a healthy work-life balance.
Global Exposure
Opportunities to travel and immerse yourself in diverse cultures, expanding your perspectives.
Hybrid Work Environment
Work remotely, onsite, or mix it up—whatever works best for you.
Top-Tier Equipment
We provide state-of-the-art tools and resources to help you excel.
Continuous Learning
Access to educational resources, training, and professional development opportunities.
Conference Support
We sponsor attendance at industry-leading events, helping you stay on top of industry trends.
Performance Bonuses
Receive mid-year and year-end bonuses based on productivity and contributions.