HPC Cluster Engineer II at Caixa Mágica Software

See all the jobs at Caixa Mágica Software here: http://caixamagica.recruiterbox.com/jobs

HPC Cluster Engineer II

Lisboa, Portugal | Other | Full-time | Partially remote

Overview:

A multinational conglomerate holding company with a special interest in areas such as Smart & Autonomous Cars, Mobility & Connected Services, Smart City & Energy Systems, Aerospace & Defence Innovation, Venture Building & Business Scaling Operations. The Group is headquartered in Brussels with presence in Japan, Greece, Cyprus & Portugal.

What will you do:

Administration of HPC cluster for Computer Aided Engineering (CAE) and Render Cluster
Maintenance of in-house shell scripts
Failed computation investigation, problem determination, incident resolution, system support, co-ordination with vendor
L1/L2 support on the HPC cluster for the customer
Maintain application running on the cluster
Manage network aspects (DNS, DHCP, internet access, …) with Network Team
Perform daily monitoring, and ensure cluster high availability
Manage patching and upgrade of the managed environment
Monitor regular backup and ensure cluster high availability
Create long term environment management centralization
Collaborate with other technical team when required

Provide support when necessary for the customer’s project:

HPC Cluster migration to AWS Cloud

Secondary Tasks:

Support the customer when needed on the following (out of maintenance scope):

Maintain other servers as: ECU compiler server; Terrace server - Data synchronization support

As back-up of other team members:

Administration of Linux based GPU HPC cluster for Artificial Intelligence (AI), VRED Rendering Cluster and HPC cluster for Computer Aided Engineering (CAE)
Manage patching of Linux systems, including offline systems
Installation and configuration of hardware, OS and software + tuning for all R&D Linux workstations
Support artificial intelligence engineers to setup development environment on GPU HPC
Support setup of a driving simulator based on real time OS
Ensure Linux environment match company security standards

What are we looking for?

Linux OS and Server knowledge
Cluster management
Infrastructure administration
Virtualization knowledge
Storage solution understanding and operating
CAE application knowledge

Key words which are important in HPC systems:

Workload manager: Slurm, PBS
Parallel File system: Lustre, ceph, beegfs
HPC management tools: Bright or Nvidia, Xcat
AI words: gpu, docker, python
OS: Rhel, Ubuntu, Rocky Linux

What can you expect from us?

A permanent job contract for a long term project;
Tech equipment + SIM Card + personal smartphone;
Health and Life Insurance;
Social events and team buildings;
The commitment of letting you grow with us, and be rewarded accordingly;
A dynamic and young team that will be always there to support you;
Training in the latest technologies;
Coffee, fruits, snacks and a warm welcoming when you pass by the office.

See all the jobs at Caixa Mágica Software here: http://caixamagica.recruiterbox.com/jobs

Apply for this opening at ?apply=true

See all the jobs at Caixa Mágica Software here: http://caixamagica.recruiterbox.com/jobs

Application Form