From bc55eef89c59503f8e849404e4fa8931ba4cea84 Mon Sep 17 00:00:00 2001
From: Mike Lake <Mike.Lake@uts.edu.au>
Date: Mon, 25 Oct 2021 16:47:59 +1100
Subject: [PATCH] Added example check_utilisation output

---
 check_utilisation_example.html | 168 +++++++++++++++++++++++++++++++++
 1 file changed, 168 insertions(+)
 create mode 100644 check_utilisation_example.html

diff --git a/check_utilisation_example.html b/check_utilisation_example.html
new file mode 100644
index 0000000..6b65a7e
--- /dev/null
+++ b/check_utilisation_example.html
@@ -0,0 +1,168 @@
+
+<p>Hi</p>
+
+<p>The HPC is occasionally very busy and it is better for all users if we try to improve the 
+throughpout of jobs. Sometimes there are jobs that are requesting more CPU cores (ncpus) than the jobs 
+are capable of using. When you ask for 8 cores and only use 1 core, 7 cores lay idle. 
+Those cores could have been used by other researchers. 
+As an example, a simple python program is single threaded and can only ever use one core.</p> 
+
+<p>In the table below you will see your job(s). Consider the CPU and TIME "Utilisation" columns. 
+For each job those values should be close to 100%. Consider them like your high school reports :-)
+A description of these fields can be found under the table.</p>
+
+</p>If you are going to start a job then please consider how many cores (ncpus) your job really can utilise. 
+During your run use "<code>qstat -f job_id</code>" and after the run "<code>qstat -fx job_id</code>" 
+to see if your job used the cores that you requested. The same can be done for memory and walltime. 
+Do not ask for more than your job requires.</p>
+
+<p>If you have any questions just email me and I'll try to assist.</p>
+<p>Running Jobs</p><table border=1 cellpadding=4>
+<tr>
+<th>Job ID</th>
+<th>Job Owner</th>
+<th>Job Name</th>
+<th>Select Statement</th>
+<th>ncpus</th>
+<th>cpu%</th>
+<th>cputime</th>
+<th>walltime</th>
+<th>CPU<br>Utilisation</th>
+<th>TIME<br>Utilisation</th>
+<th>Comment</th>
+</tr>
+<tr>
+  <td>313295</td>
+  <td>u1234</td>
+  <td>PF_NEM_2019_pbs.pbs</td>
+  <td>1:ncpus=1:mpiprocs=1</td>
+  <td>1</td>
+  <td>98</td>
+  <td>    766.1</td>
+  <td>     768.6</td>
+  <td>     98.0%</td>
+  <td>      99.7%</td>
+  <td><span style="color:green;">Good</span></td>
+</tr>
+<tr>
+  <td>313296</td>
+  <td>u1234</td>
+  <td>PF_NEM_2019_pbs.pbs</td>
+  <td>1:ncpus=1:mpiprocs=1</td>
+  <td>1</td>
+  <td>98</td>
+  <td>    766.0</td>
+  <td>     768.6</td>
+  <td>     98.0%</td>
+  <td>      99.7%</td>
+  <td><span style="color:green;">Good</span></td>
+</tr>
+<tr>
+  <td>313300</td>
+  <td>u1234</td>
+  <td>PF_NEM_2019_pbs.pbs</td>
+  <td>1:ncpus=1:mpiprocs=1</td>
+  <td>1</td>
+  <td>98</td>
+  <td>    766.0</td>
+  <td>     768.6</td>
+  <td>     98.0%</td>
+  <td>      99.7%</td>
+  <td><span style="color:green;">Good</span></td>
+</tr>
+<tr>
+  <td>313302</td>
+  <td>u1234</td>
+  <td>PF_NEM_2019_pbs.pbs</td>
+  <td>1:ncpus=1:mpiprocs=1</td>
+  <td>1</td>
+  <td>99</td>
+  <td>    765.9</td>
+  <td>     768.6</td>
+  <td>     99.0%</td>
+  <td>      99.7%</td>
+  <td><span style="color:green;">Good</span></td>
+</tr>
+<tr>
+  <td>313303</td>
+  <td>u1234</td>
+  <td>PF_NEM_2019_pbs.pbs</td>
+  <td>1:ncpus=1:mpiprocs=1</td>
+  <td>1</td>
+  <td>99</td>
+  <td>    766.1</td>
+  <td>     768.6</td>
+  <td>     99.0%</td>
+  <td>      99.7%</td>
+  <td><span style="color:green;">Good</span></td>
+</tr>
+<tr>
+  <td>450182</td>
+  <td>u2468</td>
+  <td>STDIN</td>
+  <td>1:ncpus=10:mem=100gb</td>
+  <td>10</td>
+  <td>328</td>
+  <td>      0.5</td>
+  <td>       6.6</td>
+  <td>     32.8%</td>
+  <td>       0.7%</td>
+  <td><span style="color:red;">CHECK !</span></td>
+</tr>
+<tr>
+  <td>450473</td>
+  <td>u2468</td>
+  <td>test</td>
+  <td>1:mem=150gb:ncpus=15</td>
+  <td>15</td>
+  <td>176</td>
+  <td>      1.5</td>
+  <td>       0.9</td>
+  <td>     11.7%</td>
+  <td>      11.3%</td>
+  <td><span style="color:red;">CHECK !</span></td>
+</tr>
+<tr>
+  <td>450184</td>
+  <td>u1359</td>
+  <td>STDIN</td>
+  <td>1:ncpus=10:mem=100gb</td>
+  <td>10</td>
+  <td>32</td>
+  <td>      0.1</td>
+  <td>       6.1</td>
+  <td>      3.2%</td>
+  <td>       0.1%</td>
+  <td><span style="color:red;">CHECK !</span></td>
+</tr>
+<tr>
+  <td>449969</td>
+  <td>u1359</td>
+  <td>EGONAV-RL-Collision-Back</td>
+  <td>1:mem=150gb:ncpus=8:</td>
+  <td>8</td>
+  <td>345</td>
+  <td>    134.7</td>
+  <td>      42.8</td>
+  <td>     43.1%</td>
+  <td>      39.3%</td>
+  <td><span style="color:red;">CHECK !</span></td>
+</tr>
+</table>
+<p>HPC Utilisation Report created on 2021-10-25 at 04:38 PM from program <code>check_utilisation.py running</code></p>
+
+<p>What is "cpu%" ? <br>
+The PBS scheduler polls all jobs every few minutes and calculates an integer
+value called "cpupercent" at each polling cycle. This is a moving weighted average
+of CPU usage for the cycle, given as the average percentage usage of one CPU.
+For example, a value of 50 means that during a certain period, the job used 50
+percent of one CPU. A value of 300 means that during the period, the job used
+an average of three CPUs. You can find the cpupercent used from the <code>qstat</code> command.
+</p>
+
+<p>What is "CPU Utilisation %" ? <br>
+This is what I have calculated. It's the cpupercent / ncpus requested.<br>
+If you ask for 1 core and use it fully then this will be close to 100%. <br>
+If you ask for 3 cores and use all of those then this will be 300%/3 = 100% again. <br>
+If you ask for 3 cores and use 1 core it will be about 33%. You do not get a pass mark :-)  
+</p>
-- 
GitLab