[03/02, 5:40 pm] 道草: @⁨Meta AI⁩ what does pbs_rsub do?
[03/02, 5:40 pm] Meta AI: pbs_rsub is used to submit a job to a PBS (Portable Batch System) server that will run on a subset of nodes within a larger allocation. Think of it like a "job within a job" – useful for workflows needing dynamic sub-allocations 🤖. You using it for something specific?
[03/02, 5:41 pm] 道草: @⁨Meta AI⁩ how can i rsub a job that take 1 cpu and 1 gpu each on 2 nodes, which have 56 cores and 8 gpus physically?
[03/02, 5:41 pm] Meta AI: For pbs_rsub, you'd specify resources like this:

bash
pbs_rsub -l select=2:ncpus=1:ngpus=1

This asks for 2 nodes, each with 1 CPU and 1 GPU. PBS will try to fit this within your existing allocation.

Given your nodes have 56 cores and 8 GPUs, this should work if your PBS setup recognizes those resources. You might want to check node names and availability with pbsnodes before submitting 🤖.
[03/02, 5:43 pm] 道草: @⁨Meta AI⁩ other users will still be able to submit jobs that run on the other 55 cpus and 7 gpus?
[03/02, 5:43 pm] Meta AI: Yup, other users can still submit jobs on the remaining resources. pbs_rsub only carves out the requested resources (1 CPU + 1 GPU per node in this case), leaving the rest (55 CPUs + 7 GPUs per node) available for other jobs, assuming your PBS setup is configured for concurrent job execution 🤖.
[03/02, 5:44 pm] 道草: @⁨Meta AI⁩ it seems pbs_rsub has start and end time. what happens when there are existing jobs on the nodes when my pbs_rsub job start time starts?
[03/02, 5:44 pm] Meta AI: When pbs_rsub's start time hits, PBS will:

  1. Preempt or wait for existing jobs to finish if needed.
  2. Allocate the requested resources (1 CPU + 1 GPU per node in your case).

If existing jobs are running, PBS might:

Your pbs_rsub job gets resources when they're available or at start time, whichever comes later 🤖. Check with your admin on specifics for your cluster.
[03/02, 5:46 pm] 道草: @⁨Meta AI⁩ can pbs_rsub job wait until the previous running job completes?
[03/02, 5:46 pm] Meta AI: pbs_rsub can have a dependency on previous jobs, but it's a bit indirect. You'd typically use pbs_rsub with -W depend=after:job_id to make it wait for a specific job to finish.

If you want pbs_rsub to wait for all existing jobs on the nodes, you'd likely need to script it (e.g., check job status before submitting). PBS doesn't natively block pbs_rsub on arbitrary running jobs 🤖. You want an example of scripting this?