See all the jobs at Caixa Mágica Software here:
| Other | Full-time | Partially remote
,Overview:
A multinational conglomerate holding company with a special interest in areas such as Smart & Autonomous Cars, Mobility & Connected Services, Smart City & Energy Systems, Aerospace & Defence Innovation, Venture Building & Business Scaling Operations. The Group is headquartered in Brussels with presence in Japan, Greece, Cyprus & Portugal.
What will you do:
- Administration of HPC cluster for Computer Aided Engineering (CAE) and Render Cluster
- Maintenance of in-house shell scripts
- Failed computation investigation, problem determination, incident resolution, system support, co-ordination with vendor
- L1/L2 support on the HPC cluster for the customer
- Maintain application running on the cluster
- Manage network aspects (DNS, DHCP, internet access, …) with Network Team
- Perform daily monitoring, and ensure cluster high availability
- Manage patching and upgrade of the managed environment
- Monitor regular backup and ensure cluster high availability
- Create long term environment management centralization
- Collaborate with other technical team when required
Provide support when necessary for the customer’s project:
- HPC Cluster migration to AWS Cloud
Secondary Tasks:
Support the customer when needed on the following (out of maintenance scope):
- Maintain other servers as: ECU compiler server; Terrace server - Data synchronization support
As back-up of other team members:
- Administration of Linux based GPU HPC cluster for Artificial Intelligence (AI), VRED Rendering Cluster and HPC cluster for Computer Aided Engineering (CAE)
- Manage patching of Linux systems, including offline systems
- Installation and configuration of hardware, OS and software + tuning for all R&D Linux workstations
- Support artificial intelligence engineers to setup development environment on GPU HPC
- Support setup of a driving simulator based on real time OS
- Ensure Linux environment match company security standards
What are we looking for?
- Linux OS and Server knowledge
- Cluster management
- Infrastructure administration
- Virtualization knowledge
- Storage solution understanding and operating
- CAE application knowledge
Key words which are important in HPC systems:
- Workload manager: Slurm, PBS
- Parallel File system: Lustre, ceph, beegfs
- HPC management tools: Bright or Nvidia, Xcat
- AI words: gpu, docker, python
- OS: Rhel, Ubuntu, Rocky Linux
What can you expect from us?
- A permanent job contract for a long term project;
- Tech equipment + SIM Card + personal smartphone;
- Health and Life Insurance;
- Social events and team buildings;
- The commitment of letting you grow with us, and be rewarded accordingly;
- A dynamic and young team that will be always there to support you;
- Training in the latest technologies;
- Coffee, fruits, snacks and a warm welcoming when you pass by the office.