Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
<p>Hi</p>
<p>The HPC is occasionally very busy and it is better for all users if we try to improve the
throughpout of jobs. Sometimes there are jobs that are requesting more CPU cores (ncpus) than the jobs
are capable of using. When you ask for 8 cores and only use 1 core, 7 cores lay idle.
Those cores could have been used by other researchers.
As an example, a simple python program is single threaded and can only ever use one core.</p>
<p>In the table below you will see your job(s). Consider the CPU and TIME "Utilisation" columns.
For each job those values should be close to 100%. Consider them like your high school reports :-)
A description of these fields can be found under the table.</p>
</p>If you are going to start a job then please consider how many cores (ncpus) your job really can utilise.
During your run use "<code>qstat -f job_id</code>" and after the run "<code>qstat -fx job_id</code>"
to see if your job used the cores that you requested. The same can be done for memory and walltime.
Do not ask for more than your job requires.</p>
<p>If you have any questions just email me and I'll try to assist.</p>
<p>Running Jobs</p><table border=1 cellpadding=4>
<tr>
<th>Job ID</th>
<th>Job Owner</th>
<th>Job Name</th>
<th>Select Statement</th>
<th>ncpus</th>
<th>cpu%</th>
<th>cputime</th>
<th>walltime</th>
<th>CPU<br>Utilisation</th>
<th>TIME<br>Utilisation</th>
<th>Comment</th>
</tr>
<tr>
<td>313295</td>
<td>u1234</td>
<td>PF_NEM_2019_pbs.pbs</td>
<td>1:ncpus=1:mpiprocs=1</td>
<td>1</td>
<td>98</td>
<td> 766.1</td>
<td> 768.6</td>
<td> 98.0%</td>
<td> 99.7%</td>
<td><span style="color:green;">Good</span></td>
</tr>
<tr>
<td>313296</td>
<td>u1234</td>
<td>PF_NEM_2019_pbs.pbs</td>
<td>1:ncpus=1:mpiprocs=1</td>
<td>1</td>
<td>98</td>
<td> 766.0</td>
<td> 768.6</td>
<td> 98.0%</td>
<td> 99.7%</td>
<td><span style="color:green;">Good</span></td>
</tr>
<tr>
<td>313300</td>
<td>u1234</td>
<td>PF_NEM_2019_pbs.pbs</td>
<td>1:ncpus=1:mpiprocs=1</td>
<td>1</td>
<td>98</td>
<td> 766.0</td>
<td> 768.6</td>
<td> 98.0%</td>
<td> 99.7%</td>
<td><span style="color:green;">Good</span></td>
</tr>
<tr>
<td>313302</td>
<td>u1234</td>
<td>PF_NEM_2019_pbs.pbs</td>
<td>1:ncpus=1:mpiprocs=1</td>
<td>1</td>
<td>99</td>
<td> 765.9</td>
<td> 768.6</td>
<td> 99.0%</td>
<td> 99.7%</td>
<td><span style="color:green;">Good</span></td>
</tr>
<tr>
<td>313303</td>
<td>u1234</td>
<td>PF_NEM_2019_pbs.pbs</td>
<td>1:ncpus=1:mpiprocs=1</td>
<td>1</td>
<td>99</td>
<td> 766.1</td>
<td> 768.6</td>
<td> 99.0%</td>
<td> 99.7%</td>
<td><span style="color:green;">Good</span></td>
</tr>
<tr>
<td>450182</td>
<td>u2468</td>
<td>STDIN</td>
<td>1:ncpus=10:mem=100gb</td>
<td>10</td>
<td>328</td>
<td> 0.5</td>
<td> 6.6</td>
<td> 32.8%</td>
<td> 0.7%</td>
<td><span style="color:red;">CHECK !</span></td>
</tr>
<tr>
<td>450473</td>
<td>u2468</td>
<td>test</td>
<td>1:mem=150gb:ncpus=15</td>
<td>15</td>
<td>176</td>
<td> 1.5</td>
<td> 0.9</td>
<td> 11.7%</td>
<td> 11.3%</td>
<td><span style="color:red;">CHECK !</span></td>
</tr>
<tr>
<td>450184</td>
<td>u1359</td>
<td>STDIN</td>
<td>1:ncpus=10:mem=100gb</td>
<td>10</td>
<td>32</td>
<td> 0.1</td>
<td> 6.1</td>
<td> 3.2%</td>
<td> 0.1%</td>
<td><span style="color:red;">CHECK !</span></td>
</tr>
<tr>
<td>449969</td>
<td>u1359</td>
<td>EGONAV-RL-Collision-Back</td>
<td>1:mem=150gb:ncpus=8:</td>
<td>8</td>
<td>345</td>
<td> 134.7</td>
<td> 42.8</td>
<td> 43.1%</td>
<td> 39.3%</td>
<td><span style="color:red;">CHECK !</span></td>
</tr>
</table>
<p>HPC Utilisation Report created on 2021-10-25 at 04:38 PM from program <code>check_utilisation.py running</code></p>
<p>What is "cpu%" ? <br>
The PBS scheduler polls all jobs every few minutes and calculates an integer
value called "cpupercent" at each polling cycle. This is a moving weighted average
of CPU usage for the cycle, given as the average percentage usage of one CPU.
For example, a value of 50 means that during a certain period, the job used 50
percent of one CPU. A value of 300 means that during the period, the job used
an average of three CPUs. You can find the cpupercent used from the <code>qstat</code> command.
</p>
<p>What is "CPU Utilisation %" ? <br>
This is what I have calculated. It's the cpupercent / ncpus requested.<br>
If you ask for 1 core and use it fully then this will be close to 100%. <br>
If you ask for 3 cores and use all of those then this will be 300%/3 = 100% again. <br>
If you ask for 3 cores and use 1 core it will be about 33%. You do not get a pass mark :-)
</p>