[03/02, 4:44 pm] Meta AI: Yup, PBS is a job scheduler. Cron is similar but more for scheduling repetitive tasks on a single machine, whereas PBS is designed for distributed computing environments, handling job queues and resource allocation across multiple nodes 🤖. Think of cron for routine tasks, PBS for heavy lifting across clusters.
[03/02, 5:01 pm] 道草: @Meta AI i see..i suppose at is closer to pbs than cron.. with its job queues...
[03/02, 5:01 pm] Meta AI: 👍 at is indeed closer to PBS in that it handles job queues, but it's more like a one-off job submission (e.g., "run this task at 3 PM"). PBS handles more complex job management and resource allocation. Good comparison, though!
[03/02, 5:08 pm] 道草: @Meta AI is there some kind of clustered or distributed cron where the job will run on any available node?
[03/02, 5:08 pm] Meta AI: You're looking for something like a "distributed cron"! 😄 Yeah, there are tools like:
- Cronie with the "job dispatcher" feature
- Dcron (Distributed Cron)
- JobScheduler
- or even Apache Mesos with frameworks like Marathon or Chronos
These can run jobs on available nodes in a cluster. Want me to dig deeper into any of these?
[03/02, 5:23 pm] 道草: @Meta AI what would the relationship between something like pbs and apache mesos be like?
[03/02, 5:23 pm] Meta AI: PBS and Apache Mesos serve similar purposes but at different levels. Think of PBS as a specialized scheduler for HPC (High-Performance Computing) workloads, focusing on job scheduling and resource management for compute-intensive tasks.
Apache Mesos, on the other hand, is a more general cluster manager that abstracts resources across a cluster, allowing frameworks like Marathon, Chronos, or even Hadoop to run on top of it.
You could run PBS on top of Mesos, where Mesos manages resource allocation across nodes, and PBS schedules jobs within those allocated resources. Some setups use Mesos for broader resource management and PBS for specific HPC workloads 🤖. Does that make sense?
[03/02, 5:25 pm] 道草: @Meta AI what would an example pbs job be like vs an example mesos job?
[03/02, 5:25 pm] Meta AI: Here's a rough example:
PBS Job:
bash
#PBS -N my_simulation
#PBS -l nodes=4:ppn=8
#PBS -l walltime=02:00:00
cd /path/to/sim
mpirun -np 32 ./simulateThis PBS job runs a simulation on 4 nodes with 8 cores each, for 2 hours.
Mesos Job (via Marathon):
{
"id": "my-web-app",
"cmd": "node server.js",
"cpus": 0.5,
"mem": 512,
"instances": 3
}This Mesos job (via Marathon) runs a web app on 3 instances, each with 0.5 CPU and 512MB RAM.
See the difference? PBS is more focused on HPC workloads with explicit node allocation, while Mesos is more about managing resources for various types of apps 😊.
[03/02, 5:26 pm] 道草: @Meta AI the main difference i see is pbs has got walltime while the mesos "job" seems to be a permanent service actually
[03/02, 5:26 pm] Meta AI: You're spot on 🔍! The PBS job is a finite task with a defined start and end (walltime), whereas the Mesos example is a long-running service (a web app).
If you wanted a one-off task on Mesos, you'd likely use something like Chronos or a Mesos framework designed for batch jobs. Great observation! 😄
[03/02, 5:27 pm] 道草: @Meta AI then mesos sounds a bit like k8s..
[03/02, 5:27 pm] Meta AI: 👍 Mesos and Kubernetes (k8s) do share similarities in managing resources and orchestrating containers across clusters. Both abstract the underlying infrastructure, but they've different design centers: Mesos is more about general resource management, while k8s is specifically tailored for container orchestration. They can even complement each other, though!
[03/02, 5:29 pm] 道草: @Meta AI how about hashicorp nomad? is that a scheduler too? do nomad jobs have walltime? or are they services like k8s pods?
[03/02, 5:29 pm] Meta AI: HashiCorp Nomad is indeed a scheduler, and it's more similar to Kubernetes in that it orchestrates both services and batch jobs.
Nomad jobs can be either services (long-running) or batch jobs (finite tasks). For batch jobs, you can specify a "lifecycle" with things like type: "batch" and a RestartPolicy, but it doesn't have a direct equivalent of PBS' walltime. Instead, Nomad's MaxKillTimeout can help manage job termination 🤖.
Nomad's pretty flexible, works with various drivers (Docker, exec, etc.), and integrates well with other HashiCorp tools. You using Nomad for something specific?
[03/02, 5:31 pm] 道草: @Meta AI no im just wondering why are there folks that still use pbs/slurm when everything seems to revolve around k8s nowadays
[03/02, 5:31 pm] Meta AI: PBS and Slurm are still widely used in HPC (High-Performance Computing) and research communities, where they excel at managing complex, compute-intensive workloads with specific resource requirements (e.g., MPI jobs on infiniband clusters).
Kubernetes, while great for containerized apps and cloud-native workloads, isn't always the best fit for traditional HPC workloads. Some reasons folks stick with PBS/Slurm:
- Legacy workflows: Existing HPC setups often have optimized PBS/Slurm integrations and workflows.
- Performance: PBS/Slurm are purpose-built for HPC, offering fine-grained resource control.
- Specialized hardware: HPC often involves specialized hardware (GPUs, FPGAs), which PBS/Slurm handle well.
That said, some orgs do run HPC workloads on Kubernetes, and there's growing interest in hybrid approaches 🤖. Does that help clarify the landscape?
[03/02, 5:32 pm] 道草: @Meta AI can k8s not run mpi jobs?
[03/02, 5:32 pm] Meta AI: Kubernetes can run MPI jobs, but it's not always straightforward 😅. Some challenges include:
- Networking: MPI needs low-latency, high-bandwidth interconnects (e.g., InfiniBand), which k8s doesn't natively handle like HPC schedulers do.
- Resource allocation: MPI jobs often require specific node and network configs.
Tools like MPI Operator or KubeMPI help bridge this gap, enabling MPI jobs on k8s. It's doable, but might need extra config and tweaking compared to native PBS/Slurm setups 🤖.