qd explanation

This page is an explanation of the output of the qd program. It discusses what information is available in the queue.dat file, where qd gets additional information, and special notes about each field of the qd printout.

Here is a sample of the output. For brevity, the data for indices 9, 0, 2, 3, 4, and 5 have been omitted:

qd released 20 May 2005 (fr 030); qd info 20 May 2005 (rph)
Queue version 5.01
Current index: 7
 Index 8: finished 49.00 pts (0.353 pt/hr) 2.94 X min speed
  server: 171.64.122.117:8080; project: 724, "p724_Abeta21-43-amberGS"
  Folding: run 1, clone 19, generation 4; benchmark 797; misc: 500, 300
  issue: Mon Aug  9 23:36:42 2004; begin: Mon Aug  9 23:41:10 2004
  end: Sun Aug 15 18:19:58 2004; due: Thu Aug 26 23:41:10 2004 (17 days)
  core URL: http://www.stanford.edu/~pande/Linux/x86/Core_78.fah (V1.68)
  CPU: 1,0 x86; OS: 4,0 Linux
  assignment info (le): Mon Aug  9 23:36:38 2004; AC1213B0
  CS: 171.67.89.100; upload failures: 1; P limit: 5241856
  user: rph_iv; team: 0; ID: 3539220A7A46065D; mach ID: 1
  work/wudata_08.dat file size: 1290111; WU type: Folding@Home
...
 Index 1: finished 6.07 pts (0.212 pt/hr)
  server: 171.64.122.125:80; project: 799, "SH3ligGH2/pdb1gbq.1.spa"
  Genome: unit 36476, 7554, 656; benchmark 739; misc: 500, 200
  issue: Fri Aug 27 13:06:35 2004; begin: Fri Aug 27 13:06:47 2004
  end: Sat Aug 28 17:45:24 2004
  core URL: http://www.stanford.edu/~pande/Linux/x86/Core_ca.fah
  CPU: 1,0 x86; OS: 4,0 Linux
  user: rph_iv; team: 0; ID: 3539220A7A46065D; mach ID: 1
  work/wudata_01.dat file size: 63721
...
 Index 6: finished 30.10 pts (0.273 pt/hr) 3.04 X min speed
  server: 171.64.122.119:80; project: 682, "p682_TZ2_NAT_VISC0MD"
  Folding: run 14, clone 48, generation 7; benchmark 768; misc: 500, 200
  issue: Sat Sep 18 01:47:03 2004; begin: Sat Sep 18 01:47:11 2004
  end: Wed Sep 22 16:14:20 2004; expire: Sat Oct  2 01:47:11 2004 (14 days)
  core URL: http://www.stanford.edu/~pande/Linux/x86/Core_65.fah (V2.53)
  CPU: 1,0 x86; OS: 4,0 Linux
  assignment info (le): Sat Sep 18 01:47:02 2004; AC125A56
  DL: Sat Sep 18 01:47:09 2004; upload failures: 1; P limit: 5241856
  user: rph_iv; team: 0; ID: 3539220A7A46065D; mach ID: 1
  work/wudata_06.dat file size: 100545; WU type: Folding@Home
 Index 7: folding now 51.40 pts (0.341 pt/hr) 5.73 X min speed; 11% complete
  server: 171.64.122.111:8080; project: 334, "p334_unf_305"
  Folding: run 1, clone 88, generation 13; benchmark 770; misc: 500, 300
  issue: Wed Sep 22 16:16:20 2004; begin: Wed Sep 22 16:16:56 2004
  expect: Tue Sep 28 22:59:03 2004; due: Thu Oct 28 15:16:56 2004 (36 days)
  core URL: http://www.stanford.edu/~pande/Linux/x86/Core_78.fah (V1.68)
  CPU: 1,0 x86; OS: 4,0 Linux
  user: rph_iv; team: 0; ID: 3539220A7A46065D; mach ID: 1
  work/wudata_07.dat file size: 197433; WU type: Folding@Home
Average download rate 5.267 KB/s (u=4); upload rate 3.403 KB/s (u=4)
Performance fraction 0.819036 (u=4)
Average pph: 0.329, ppd: 7.89, ppw: 55.2, ppy: 2663

Explanations of each field will follow, with the data for that field shown again in front of each explanation. Most of the fields will be taken from the index 8 entry, with references to the other indices when appropriate. Not all data will be present for each work unit, and unavailable data will usually lead to the field which would present it being absent in the printout. There are also some fields which may not seem to be useful. They are printed simply because they are present in queue.dat, and qd prints everything which is there which can be interpreted.

As qd prints information for each entry, it will try to find additional information about that unit by looking at other files. It will look at the core log file work/logfile_**.txt for the core version number and the project name, and if it's the current unit, for its progress. If it can't find the name of the current unit that way, it will look in work/current.xyz. If it can't find the progress, it will look in work/wuinfo_**.dat. And if it still can't find the name, or it doesn't have information about the unit's expiration time, it will look for those things in emprotz.dat or emprotx.dat. If any of these files can't be opened or read successfully, qd won't consider it a serious error. It simply won't have the data it might otherwise be able to print. If qd is run with the -p flag, it won't look at these other files. Similarly, the -q flag, implying it is looking at an isolated queue.dat file not in the context of a working folding directory, will suppress looking at these other things too.

The -c flag produces a more compact format, in which about half the lines of each queue entry are omitted. This format might be preferable when using qd simply for looking up work unit history. There is also a one-line format produced by the -l or -L flags, which is suitable for saving in a sorted log file, possibly combining the folding history from many clients.

There are eight types of date fields which can be printed for work units. These dates are normally printed in local time, according to the way the operating system is configured. However, if the -t or -z flags are used, they will be printed in the specified time zone. When output from qd is used for reporting problems, it will be most useful if the -z flag is used so the times are printed in UTC.

After the output field descriptions, there follows a shorter section listing all the status codes qd and the client use for queue entries.

Output fields

qd released 20 May 2005 (fr 030)

The program is released as necessary to keep up with changes and new features. Often it is several times a week, as new project numbers are activated and their scores have to be added to qd's tables. If there is a significant change, or a bug fix, there is also a "functional revision" level built into the program, which will be increased. The message above shows the program functional revision level at "030". The latest functional revision level available is given in the qdinfo.dat file as an "fr" field. Users who update only qdinfo.dat to stay current will be notified with a warning message here if qd notices an "fr" level greater than its own functional level.

qd info 20 May 2005 (rph)

If qdinfo.dat is available, it will be read to update the built-in point table. The text of a "da" entry will be printed here. The official distributed qdinfo.dat has the tag "(rph)" at the end. People maintaining their own versions of the file, for whatever reason, should use a different tag.

The date the table information was compiled is recorded in qdinfo.dat as a hex value in a "pg" entry. If such a value is present and it is earlier than the date of the built-in tables in qd, then a warning message will be printed here, and the table entries in qdinfo.dat will not be allowed to overwrite the built-in values.

Queue version 5.01

The client stores its own queue revision level in queue.dat any time it creates a new queue.dat or updates an old one to a newer format. The format does not change every time a new client comes out, so the version number shown here may be perfectly up to date but still not show the latest client revision level. It would not be an error.

If the queue.dat file being interpreted is not from the same type of system that qd is being run on (strictly speaking, a system different enough that the file requires format conversion), the system type will also be shown on this line. It will be given in parentheses, as "(Linux)", "(Windows)", or "(Mac)".

Current index: 7

When a work unit is downloaded, it is assigned to one of the ten queue entries. The data in queue.dat at that index is used by the client to keep track of that unit's status. It is especially needed if the client is stopped and restarted, or if finished work doesn't upload on the first attempt. This index number is also used in file names in the work subdirectory. In this example, since the current index is 7, the work file names have the string "_07" in them.

The indices are assigned sequentially, starting at 1, and cycling back to 0 after 9. If work is still active at the next sequential index (most likely because it has had a lot of trouble uploading), that index will be skipped over. An active index won't be reused unless there are no inactive ones left at all.

Index 8:

When qd formats its output, it starts with the oldest entry, and proceeds forward until the last printed data is for the current index.

finished

The printed status is an interpretation of an integer status code for each entry, sometimes influenced by other data fields in that entry. The names given to these status codes by qd are a little different from the names printed by the client with its -queueinfo flag. If qd is started with the -h flag, it will print out an explanation of its status code names. See below for a summary.

49.00 pts

This is the point value for the unit, as well as qd can determine it. The value isn't stored anywhere at all by the Folding@home client, so qd looks it up in its internal point table, after augmentation by data in the info file qdinfo.dat or even from emprotz.dat or emprotx.dat. The valuation of Genome units is calculated from the protein chain length, according to the published formula, suitably scaled depending on whether the team number recorded with this unit is for a Folding or Genome team. The score will be unavailable for Genome units if work/logfile_**.txt isn't present for that index, since the chainlength can only be found in the log, which is often deleted automatically when the unit uploads. Since the Genome project has now ended, it is likely that very few Genome scores will be calculatable.

If the table lists several values for units of the relevant project, qd will select the value in effect, as far as it knows, when the unit was issued. The table is not authoritative, so the score shown here might not correspond to what is actually awarded by the work server if the value had been changed at a time near to the time the unit was issued.

(0.353 pt/hr)

If points and progress can both be determined for this work unit, then the score rate can be calculated from the beginning time and ending time, or for the current unit, the modification time of the core log file. In this sense, completed units are taken to have made 100% progress. Points are determined as described in the paragraph above. Progress is considered valid only if the total elapsed time is between fifteen minutes and fifty days, or for the current unit, at least thirty seconds. See the note below explaining how the fraction complete, and thus also the score rate, may be inaccurate before the unit is 2% done.

2.94 X min speed

If progress and expiration time can be determined for this work unit, then the rate of progress can be calculated as a factor times the minimum speed necessary to barely make the deadline. If this number is less than 1.00, it means that if the unit continues to process at the effective rate it has gone since it was downloaded, it won't finish in time. If the machine has been running FAH exclusively 24/7, then it is hopelessly too slow. The current unit might as well be deleted, and the client should be reconfigured to process only deadlineless units. Before the V5.00 client, that was done by selecting a preference for Genome units. See the note below explaining how the fraction complete, and thus also the min speed factor, may be inaccurate before the unit is 2% done.

11% complete [see index 7]
The current unit will show the fraction complete, if it can be determined.

NOTE that with regard to these last three items, until the unit is several percent finished, the displayed values may not be very accurate. Especially with Gromacs units, where there are checkpoints and log entries made between frames, qd can't establish the calculation rate until the unit is at least 2% done. This will cause the percent complete to show only 0% or 1%, and the point/hour rate and min speed numbers to be significantly lower than their proper values. Further, with Genome units, there is quite a bit of time spent in initialization as the rotamers library is calculated. At the beginning of the unit, this time all appears to be taken by the first tenth of the first sequence, which can make the rate of progress initially appear to be as low as one half of its ultimate value. This also affects, of course, the expected completion time for the unit.

server: 171.64.122.117:8080

This is the IP and port address of the data server from which the work unit was downloaded. It is also the address to which the unit must eventually be uploaded if it is still active.

project: 724, "p724_Abeta21-43-amberGS"

This is the number and name of the project which the work unit is part of. The project name is not stored in queue.dat, and old core log files often get deleted by the client, so it is common for the project name not to be known except for the current unit. If the project name can be found only in emprotx.dat, then it will be printed here, but in single quotation marks. In this case, the name, strictly speaking, is really only a guess, being simply the first name qd found in emprotx.dat which starts with "P" or "p" and then a string of digits matching the project number. On rare occasions the project names change, and there is no guarantee that the name selected here is the name of the actual finished unit. This issue will soon be moot, since emprotx.dat is being replaced by emprotz.dat, which doesn't contain the name string at all.

Folding: run 1, clone 19, generation 4

If the unit is not using Core_ca (the Genome core), these numbers, along with the project number, uniquely identify the work unit. If it is necessary to notify Stanford of the work unit ID, those four numbers (project, run, clone, generation) should always be given.

Genome: unit 36476, 7554, 656 [see index 1]
If the unit is using Core_ca (the Genome core), these numbers are printed, because they are stored in queue.dat, but as far as we know, they can not be used to uniquely identify the work unit. The sequence name and initial seed value are needed for Genome, and they aren't stored in queue.dat.

benchmark 797

When the client starts, it calculates this benchmark number. It is a measure of floating point processing power, but it is also strongly affected by any other programs which happen to be running on the machine at the time. It's normal for the benchmark number to vary over a wide range.

misc: 500, 300

These numbers are present in queue.dat, but it isn't known for sure what they are. Most likely they are related to the revision levels of the client which are suitable for this work unit.

issue: Mon Aug 9 23:36:42 2004

This time stamp is placed on the work unit by the work server when it assigns the unit. It is an epoch 1970 value, based on the server's clock. The servers lately have been accurately synchronized with NTP. If a unit, for any reason, gets lost so it isn't either completely finished or ready for upload when the next unit is requested, the server will often notice the previous assignment and send the same unit again, with the original issue time stamp.

begin: Mon Aug 9 23:41:10 2004

This time is recorded by the client when it finishes downloading a new unit to process. It is an epoch 2000 value, based on the local machine's clock. If the clock is set accurately and the work unit is not a duplicate, the "begin" time will be only a few seconds after the "issue" time, the difference being simply the time it took to receive the unit over the Internet.

end: Sun Aug 15 18:19:58 2004

This time is recorded by the client when the core returns with an indication that the processing has finished and the results file should be uploaded to the server. It is an epoch 2000 value, based on the local machine's clock, just like the "begin" time.

expect: Tue Sep 28 22:59:03 2004 [see index 7]
This time, printed only for the current unit, is not stored in queue.dat, but instead is a projection made by qd if the progress can be determined. This date shows when the unit will finish if processing continues at exactly the speed it has run so far since the unit began. See the note above under "fraction complete" explaining how the rate of calculation may not be accurately determined before the unit is 2% done.

due: Thu Aug 26 23:41:10 2004 (17 days)

This time is calculated by the client when it downloads a unit. It is determined by adding the "begin" time to the expiration period, if this expiration data is sent as part of the work unit. The interval, which qd shows in parentheses after the date, is rounded to the nearest day, or to the nearest hour if it is less than two days. Genome units had no deadlines, nor do a few Folding units designated for machines which are either slow or only infrequently connected to the Internet. All other Folding units do, but not all work units have the expiration period fields filled in. The stored time is an epoch 2000 value, based on the local machine's clock.

There are actually two sorts of deadline associated with a work unit. The one reported here, which is the only one known to the client, is what the Stanford project summary web page calls the "Final Deadline", after which time the client may delete the unit and no credit will be given even if it is completed and returned. The other deadline, shown on the project summary as "Preferred", indicates when the unit will be considered late enough that the server might send another copy of it to someone else, which means that a unit returned after that time might have little scientific value.

NOTE that if the machine clock runs, or is set, forward to a time past the "due" date, the client might think the unit is past due, and delete it.

expire: Sat Oct 2 01:47:11 2004 (14 days) [see index 6]
This time is equivalent to the "due" date, but it is reported as "expire" when the client has not calculated it itself. Normally this means the work unit didn't contain any expiration data, so the client couldn't calculate it, but qd was able to find out how much time is allowed by reading the emprotx.dat or emprotz.dat file. This distinction might be important, since often the deadline stored in the EM III data file is the shorter "Preferred" one, but in any case, the unit will never be in danger of deletion, since the client doesn't know any deadline at all.

core URL: http://www.stanford.edu/~pande/Linux/x86/Core_78.fah (V1.68)

This full URL is actually constructed by qd (and the client) from the directory URL and the core number, both of which are stored in queue.dat. It can be used to verify that the server knows the correct client OS type, and also to determine which core is required for this work unit. At present, Core_65 is Tinker (Folding), Core_78 is Gromacs (Folding), Core_79 is double-precision SSE2 Gromacs (Folding), Core_82 is PMD Amber (Folding), Core_96 is QMD (Folding), and Core_ca used to be SPG (Genome).

The core version which was most recently run on this unit is shown in parenthesis after the URL, if it can be determined. It is parsed by qd from the core log file, which is deleted if the unit uploads on the very first try, so on some systems which have permanent connections to the Internet, the core version will only rarely be shown at indices other than the current one.

CPU: 1,0 x86; OS: 4,0 Linux

The CPU and OS types are detected by the client and sent to the servers when requesting assignments. They can influence what sort of work will be assigned, and which core URL is given to the client. Each of these types is reported as a pair of numbers, the first of which is a basic type defined by Cosm, and the second of which is a subtype determined by the client. The most detail is available for Windows systems running on Intel CPUs. The numbers are sometimes stored in little-endian order in queue.dat, and sometimes in big-endian order, with no clearly consistent pattern. If qd recognizes the pair of numbers, it prints a string which interprets them.

assignment info (le): Mon Aug 9 23:36:38 2004; AC1213B0

This time stamp and data word are sent to the client by the assignment server along with the IP address of the work server and other information. The data word is a checksum derived from the IP address and the time stamp. The client sends all three words to the work server as validation of the assignment. These values can be stored in queue.dat in either little-endian or big-endian byte order. It is unknown why it can be either way, but the notation in parentheses, either "(le)" or "(be)" indicates how it actually is stored in this entry.

CS: 171.67.89.100

This is the collection server IP address, which is not always present. If it is given, it identifies an alternate server to which the results can be returned if there are too many errors attempting to upload to the actual work server. If work is returned to the collection server, credit for it will occasionally be delayed in the Stanford statistics, but this is better than having the work deleted by the client if its deadline expires before the server can be reached successfully.

upload failures: 1

This field will not be printed for the current unit, or for anything other than a unit pending upload if the number of failures is zero. The client uses this field to decide when to try sending results to the collection server instead of to the actual work server.

P limit: 5241856

This is a packet size limit, used to determine how much memory should be allocated when returning results. Its default size seems to be about five megabytes, but if "large units" are selected, the client will set the value to ten times that size. If the actual results file is bigger than the packet limit, the client will be unable to return it to the server.

DL: Sat Sep 18 01:47:09 2004 [see index 6]
This field is usually absent if a collection server address is given for this unit. It seems to be a date roughly corresponding to the end of the download, from the work server's perspective. It is an epoch 2000 value, most likely based on the server's clock.

user: rph_iv

This is the UserName as configured when the unit started. The UserName actually sent to the server when the unit is uploaded is what is current at that time.

team: 0

This is the TeamNumber as configured when the unit started. The TeamNumber actually sent to the server when the unit is uploaded is what is current at that time.

ID: 3539220A7A46065D

This ID is used by the server to identify the user's machine, and to validate the work unit when it is returned. The number stored in queue.dat, and printed here, is really the sum of the assigned UserID and the configured MachineID, as it was when the unit started. The ID actually sent to the server when the unit is uploaded is what is current at that time on the machine which does the upload. Ideally, it should match the ID to which the unit is assigned, to within the allowed configurable range of the MachineID, or the uploaded unit might be rejected. Recent experiments with transferring work among machines have shown the servers usually to be more permissive, but relying on this would probably be a bad idea.

mach ID: 1

This is the MachineID as configured when the unit started. When the MachineID and UserID are added together and sent to the server with an uploaded unit, the client uses the MachineID which is current at that time.

work/wudata_08.dat file size: 1290111

This is used as one of the checks for a proper work unit download.

WU type: Folding@Home

This actually isn't a work unit type at all, but simply a text message which the data server can insert in a downloaded unit. It usually just says "Folding@Home", and sometimes nothing at all.

Average download rate 5.267 KB/s (u=4); upload rate 3.403 KB/s (u=4)

These rates are measured by the client as a benchmark of system network performance. They are a sliding window average weighted over the recorded number of units. This number is capped by the client at four, so the average tends to track more recent performance. It doesn't mean that only the last four units are averaged, but rather that the stored value is weighted four times greater than the value from a new unit, so in fact it determines the characteristic rate at which the effects of older units decay.

Performance fraction 0.819036 (u=4)

This value is calculated by the client as a benchmark of the system's ability to complete work units quickly. It is a sliding window average of the fraction of the deadline time remaining when a unit is completed, weighted over the recorded number of units. The number of units is capped at four, exactly as it is for the network download and upload rates.

Average pph: 0.329, ppd: 7.89, ppw: 55.2, ppy: 2663

This line appears at the bottom of the report as a projection made by qd of how many points this machine should score over intervals of an hour, day, week, and year, if it continues to earn points at the rate it has earned them while calculating the units reported in the preceding printout. It is a rough indicator of how effective this machine is at calculating Folding@home work units. The final "ppy" figure corresponds to the "Annual score benchmark" printed by qd prior to functional revision "018".

Status codes

Here is a list of all the possible status codes that can be printed for a queue entry. When qd was first written, the client had no publicly-available names, so qd made up names according to the status of the work unit, from the perspective of the user. The following list is available if qd is started with the -h flag:

The status code for each queue entry may be interpreted as follows:

(0) empty
  The queue entry has never been used, or has been completely cleared.
(0) deleted
  The unit was explicitly deleted.
(0) finished
  The unit has been uploaded.  The queue entry is just history.
(0) garbage
  The queue entry is available, but its history is unintelligible.
(1) folding now
  The unit is in progress.  Presumably the core is running.
(1) queued for processing
  The unit has been downloaded but processing hasn't begun yet.
(2) ready for upload
  The core has finished the unit, but it is still in the queue.
(3) DANGER will be lost if client is restarted!
  Bug before V3b5, neglected to post status (1).
(3) abandoned
  Bug before V3b5, neglected to post status (1), and client was restarted.
(4) fetching from server
  Client presently contacting the server, or something failed in download.
  If this state persists past the current unit, the queue entry will be
  unusable, but otherwise things will go on as usual.
(?) UNKNOWN STATUS = ??
  Something other than 0 to 4.

The numbers in parentheses are the actual value used for that status code. Code (0) is called four different names depending on other data found in the queue entry.

Several months after all this was written and working as described, the guys at Stanford added the -queueinfo option to the client, and they gave their own names to these same codes according to the queue entry, from the perspective of the client, describing what it needs to do about it. Status code (3) really never happens any more, but here is what it calls the others.

(0) empty
  The queue index is available for reuse by a new unit.
(1) active
  The unit files are actively processing, and the core should be running.
(1) ready
  The unit files are ready for processing but are queued, and the core
  should be started to process them when earlier units are finished.
(2) finished
  The core has finished the unit, and it needs to be uploaded.
(4) fetching
  The index is allocated to a unit being fetched from a server.

The "fetching" status was not handled correctly until client version 5.0, which will free the index if it finds that status in the queue. If an earlier client was restarted and discovered the "fetching" status at one of the queue indices, it simply skipped over that index for ever afterward, never reclaiming it for further use.

It's unfortunate that the word finished is used with differing meanings. In the output from qd, "finished" means it's really all done, and the information still stored is just history. But if it's the client which says "finished", it means only that the processing has finished, and the work has not been sent back yet.