From Training Clusters to Genomics Pipelines

Wherever your workloads are bottlenecked by storage throughput, locked into a single cloud, or buried under infrastructure costs — flexFS eliminates the constraint. See how teams across industries are using it today.

AI/ML Pipelines

Feed GPUs faster without bigger servers

Deep learning workloads are bottlenecked by data throughput, not compute. FlexFS streams training datasets directly from object storage to every GPU node in parallel, eliminating the shared NFS server that chokes distributed training jobs.

Compatible with: PyTorch DataLoader, Hugging Face Datasets, DeepSpeed, Horovod

Key Benefits

  • Stream massive training datasets (ImageNet, WebDataset shards) with sequential reads at scale — every mount client reads directly from object storage
  • Write model checkpoints sequentially for fault tolerance without coordinating through expensive central servers
  • Share data across multi-GPU, multi-node distributed training clusters with a single mount namespace
  • Scale training data throughput by adding mount clients, not bigger servers — aggregate bandwidth grows linearly
Life Sciences & Bioinformatics

Genomics at cloud scale without cloud filesystem costs

Genomics pipelines process enormous files — FASTQ, BAM/CRAM, and VCF files routinely reach 5 to 100 GiB each. Traditional cloud filesystems charge provisioned throughput whether your pipeline is running or idle. FlexFS gives you the throughput when you need it and the bill stops when the pipeline does.

Compatible with: GATK, Cromwell, Nextflow, Snakemake, PLINK, samtools, bcftools

Key Benefits

  • Run GATK, Cromwell, Nextflow, and Snakemake pipelines directly against object storage with zero code changes
  • Cache reference genomes locally for repeated access patterns — subsequent reads hit cache instead of object storage
  • Process large genomic files (FASTQ, BAM/CRAM, VCF) without downloading them first or managing local copies
  • Real-world result: PLINK on UK Biobank data runs significantly faster and cheaper than on EFS or FSx
HPC & Scientific Computing

Linear throughput scaling for I/O-bound clusters

High-performance computing clusters are often throttled by expensive central file servers that cannot keep pace with hundreds of analysis nodes. FlexFS removes the bottleneck entirely — each compute node reads and writes directly to object storage, so aggregate throughput scales linearly with your cluster.

Compatible with: MPI, OpenMP, Slurm, PBS, custom C/Fortran/Python analysis codes

Key Benefits

  • Aggregate I/O throughput scales linearly with the number of compute nodes — no expensive central server bottleneck
  • Direct client-to-storage access means every node gets its own bandwidth to object storage
  • Full POSIX compliance ensures existing HPC tools, MPI jobs, and analysis scripts work unchanged
  • Exabyte-scale capacity backed by object storage — no filesystem resizing or capacity planning
Multi-Region Deployments

Centralized data, distributed compute, zero transfer costs

When your compute spans multiple cloud regions but your data lives in one, cross-region transfer costs add up fast. FlexFS Enterprise proxy groups act as regional caches, serving data locally after the first read and routing each client to the lowest-latency proxy automatically.

Compatible with: Any multi-region deployment — Terraform, CloudFormation, cross-region Kubernetes clusters

Key Benefits

  • Run compute in multiple regions while keeping data centralized in a single object storage bucket
  • Enterprise proxy groups deployed per-region act as intelligent caches — data is fetched once, served many times
  • RTT-based routing automatically directs each client to the lowest-latency proxy in its region
  • Eliminate cross-region data transfer costs while maintaining a single authoritative data source
Hybrid Cloud

Local speed, cloud durability, no re-architecture

Moving to the cloud does not have to be all-or-nothing. FlexFS Enterprise proxy groups deployed on-premises give your existing compute infrastructure local-speed access to cloud-stored data, with writeback mode ensuring writes are durable in the cloud without sacrificing performance.

Compatible with: On-premises HPC clusters, legacy analysis pipelines, any POSIX-compatible application

Key Benefits

  • Access cloud-stored data from on-premises compute at local cache speeds — no application changes needed
  • Enterprise proxy groups deployed on-prem with writeback mode buffer writes locally and sync to the cloud
  • Local caching provides low-latency reads while cloud object storage provides unlimited, durable capacity
  • Bridge on-prem infrastructure to cloud storage incrementally — no forklift migrations required
Kubernetes Workloads

Cloud-native storage for containerized pipelines

Kubernetes workloads need persistent, shared storage that scales with the cluster. FlexFS provides a CSI volume driver with Helm chart deployment, giving pods direct access to object storage through standard PersistentVolumeClaims — no sidecar containers or custom SDKs.

Compatible with: Helm, kubectl, Kubernetes CSI, Argo Workflows, Kubeflow, Airflow on K8s

Key Benefits

  • Deploy with a single Helm chart — the CSI volume driver integrates natively with Kubernetes storage primitives
  • Static provisioning available in both Community and Enterprise editions for pre-configured volumes
  • Dynamic provisioning with StorageClass (Enterprise) creates volumes on demand as pods request them
  • Pods access object storage through standard PVC mounts — no application-level SDK integration needed

Ready to Accelerate Your Workloads?

Start with the free Community Edition or contact us to discuss Enterprise deployment for your use case.