A pre‑run validation suite that surfaces connectivity and configuration issues before launching trials. Includes a validation harness that checks whether RoCEv2‑capable SmartNICs meet performance requirements for Keysight scenarios.
The preflight feature runs automated checks (connectivity, bandwidth, latency, time sync) to reduce wasted runs and simplify troubleshooting. It also automates NIC profiling and verifies capabilities such as sustained bandwidth, number of queue pairs, and RDMA behavior to ensure testbed reliability.
What You'll Work On
- Connectivity checks (reachability, ports, credentials)
- Bandwidth/latency probes and thresholds
- Time synchronization validation and drift reporting
- Test drivers to exercise NIC RDMA/rocev2 capabilities
- Workloads to measure sustained bandwidth and QP limits
- Capture and summarize NIC firmware/version and settings
- Pass/fail gating and human‑readable reports
- Environment summary and guidance for remediation
What you will gain:
- Hands‑on Distributed Systems – Work with multi‑node orchestration and coordination
- AI/ML Infrastructure – Understand how production AI clusters are designed and operated
- Modern Python & Automation – Build production‑grade tools with Python, Docker, FastAPI
- DevOps & CI/CD – Use GitLab and automated pipelines for continuous integration and delivery
- Performance Analysis – Learn how networking performance is measured and improved
- Real‑world Impact – Ship features used by Keysight teams in day‑to‑day workflows
Skills required: Collective communication Libraries, RoCEv2, Python, SSH/SCP, distributed systems, AI/ML frameworks, automation, REST APIs, Linux, bash scripting