Data Science Team / Digital Transformation
Vall d’Hebron Research Institute (VHIR) is at the forefront of developing cutting-edge AI and data infrastructure solutions to advance healthcare research and diagnostics on Rare Diseases. We are seeking a seasoned Infrastructure for Data Science Engineer to design, implement, and maintain a robust data infrastructure, including a central data center that supports AI-driven and deep learning capabilities. This role is crucial in ensuring secure data integration, processing, and scalable infrastructure to support innovative AI and data science initiatives.
Education and qualifications:
Required:
- Bachelor’s or Master’s degree in computer science, Information Technology, Data Engineering, Data Science, Computer Science, Telecommunications, Bioinformatics, Artificial Intelligence or a related field.
- Additional training in cloud infrastructure, data center technologies, or systems architecture.
- Fluency in Catalan, Spanish, and English (business level).
Desired:
Certification in data center management or cloud platforms (e.g., AWS, Azure, Google Cloud).
Training in healthcare data standards like OMOP or OpenEHR.
Certification in AI or data science-related fields.
Experience and knowledge:
Required:
- At least 4 years of experience in data infrastructure design, implementation, and management, preferably in healthcare or life sciences.
- Proven experience in data center implementation, including the design, procurement, and deployment of hardware for data storage, computing, networking, security, and redundancy.
- Strong knowledge of data storage solutions (e.g., SQL/NoSQL databases, data lakes, data warehouses), ETL processes, and large-scale data management.
- Expertise in programming languages such as Python, Java, or Scala, with a focus on data manipulation and automation in big data environments.
- Familiarity with data security and compliance frameworks, especially GDPR.
- Experience in supporting AI and deep learning capabilities, and implementing both on-premise and cloud-based solutions.
Desired:
- Familiarity with data security and compliance frameworks, especially in relation to GDPR on healthcare data handling.
- Experience with federated learning environments or distributed systems, supporting AI and deep learning capabilities.
- Experience with big data tools and frameworks, such as Apache Hadoop HDFS, Spark, Kafka, Cassandra, Storm or HBase and cloud-based solutions like AWS Glue, Google Cloud Dataflow, or Azure Data Factory.
- Experience with Amazon S3 - Simple Storage Service, Google BigQuery, Apache Cassandra and/or MongoDB.
- Familiarity with data pipeline architectures, including Lambda and Kappa architectures, and the ability to implement batch and real-time data processing techniques.
- Experience with cloud-native data warehouses like Snowflake, Google BigQuery, Microsoft Azure SQL or Amazon Redshift.
- Experience with ETL tools like Apache Nifi, Talend, Informatica, Microsoft SSIS or AWS Glue.
- Proficiency in scripting and automation for infrastructure management using languages such as Bash or PowerShell.
- Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes).
- Experience with high-performance computing (HPC) environments and data pipeline frameworks like Apache Airflow.
- Experience in integrating machine learning models into production environments.
- Strong project management skills in handling large-scale IT infrastructure projects.
Main responsibilities and duties:
- Lead the design and implementation of a state-of-the-art central data center to support AI-driven analytics, deep learning, and distributed computing capabilities.
- Oversee the procurement, deployment, and maintenance of hardware and software infrastructure for data storage, computing, networking, and high-performance processing.
- Develop and optimize scalable data pipelines for large-scale healthcare datasets, ensuring seamless data flow from ingestion to analysis.
- Implement and manage big data environments with a focus on performance optimization, fault tolerance, and secure data handling.
- Collaborate with cross-functional teams to integrate AI algorithms into clinical workflows while ensuring compliance with data privacy, security standards, and relevant regulations (e.g., GDPR).
- Lead the adoption of advanced technologies for federated learning and distributed AI frameworks to support the organization's data science initiatives.
- Drive the deployment and automation of containerized applications using tools like Docker and Kubernetes, ensuring seamless integration into the data infrastructure.
- Mentor junior engineers, fostering a culture of continuous improvement, innovation, and knowledge sharing within the data infrastructure team.
- Implement both on-premises and cloud-based solutions for data storage, processing, and federated learning, integrating them seamlessly with the central data center.
- Monitor and continuously optimize the performance of data infrastructure to maintain high availability and scalability.
- Provide technical leadership in evaluating and adopting new tools and technologies to enhance data processing, computational capabilities, and AI-driven innovation.
Labour conditions:
- Full-time position: 40 hours/week.
- Starting date: January 2025.
- Gross annual salary: Remuneration will depend on experience and skills. Salary ranges are consistent with our Collective Agreement pay scale.
- Contract: Open-ended contract linked to project. This proposal is contingent upon securing financing from the designated funder
What can we offer?
- Incorporation to Vall d’Hebron Research Institute (VHIR), a public sector institution that promotes and develops the biomedical research, innovation and teaching at Vall d'Hebron University Hospital (HUVH), the biggest hospital of Barcelona and the largest of Catalan Institute of Health (ICS).
- A scientific environment of excellence, highly dynamic, where high-end biomedical projects are continuously developed.
- Continuous learning and a wide range of responsibilities within a stimulating work environment.
- Individual training opportunities.
- Flexible working hours.
- 23 days of holidays + 9 personal days.
- Flexible Remuneration Program (including dining checks, health insurance, transportation and more)
- Corporate Benefits: platform through which you can obtain significant discounts on travel, culture, technology, gastronomy, sports... among many others.
- Healthy Offering: choose from a variety of wellbeing focused activities to be the healthiest you.
Deadline to apply: 01-11-2024
VHIR embraces Equality and Diversity. As reflected in our values we work toward ensuring inclusion and equal opportunity in recruitment, hiring, training, and management for all staff within the organization, regardless of gender, civil status, family status, sexual orientation, gender identity and expression, religion, age, functional diversity or ethnicity.