qgen

This page is an explanation of how to use the qgen program. Please read it, even though it is long. What you need to do isn't really very difficult, and will take only several minutes once you understand it, but if you do things wrong, you could interfere with your currently running client and lose work instead of recover it.

What does qgen do?

This program provides a means of recovering work when the Folding@home queue.dat file has been lost or damaged, or it shows completed results as having been deleted or uploaded when in fact they have not reached Stanford. It examines the present contents of the work subdirectory, and with help from several other files, it will construct a new queue.dat file for sending in or continuing processing of orphaned work units. Along with queue.dat it also produces a modified client.cfg file, and on Linux, machinedependent.dat, so uploaded work will be credited to the appropriate UserName and TeamNumber.

Unless the original queue.dat file has been completely lost, and the machine it was originally on is the same machine being used for the recovery, qgen should be run in a new directory specifically created for the purpose. It will set up the client configuration with MachineID=8 for the recovery.

Why would I want to run qgen?

The most common reasons are as follows:

You have transported completed wuresults_xx.dat files from a machine not connected to the Internet, but for some reason you were not able to copy their associated queue.dat file too.
You ran a client in "nonet" mode through all ten queued units, and then it went back and cleared the first queue slot before it stopped, thus leaving that first results file disconnected.
Your client got a lot of download errors and cycled all the way through the queue after finishing a unit. Sometimes this leaves results unable to return.
You restored a backup of the folding directory which overwrote queue.dat, but left later results still in the work subdirectory.
You want to transfer new or completed units from a PC to a Mac or the other way around, and need a way to convert the queue.dat file format.
Someone else unwilling or unable to run qgen has sent you some completed results, and you have agreed to try to help return them.
Your client got an error probably due to a transient hardware malfunction, then deleted its current work without notifying the Stanford server and downloaded a new unit. You think you can recover all the deleted work files but you have no backup of queue.dat.
Because of previous recoveries either by qgen or from backups, you have more clients on your machine than you want, and you would like to consolidate their queues.
You loaded up a V4 client after running V5 and it destroyed everything.
You were trying to be too clever doing something else and somehow blew away your queue.dat.

There are also several situations when running qgen is not the best thing to do:

You have discovered a wuresults_xx.dat file left over in your work directory, but in fact it was uploaded but just not deleted by the client. This happens sometimes. It's best to check your stats, and check your client log to see if you got the "Thank you" message. There is nothing to be gained by sending in work more than once, and it is just a lot of trouble.
The client was shut down just as the core was finishing, and when it restarted the core, it got an error which made it delete the unit and fetch a new one. In this case the results might be orphaned, but it will be much easier to reattach them using qfix. Proceed with qgen only if qfix can't do the job, and you still want to go ahead after seeing what qfix says is wrong.
The same as case 2, except that the core restarted the work from the beginning. In this case, if there is no results file, there is nothing to recover, but if it exists, and you are sure it didn't get truncated if the core was killed before it shut itself down, then you can either move the results file elsewhere and use qgen or else delete the current unit and use qfix to reattach it. The client won't delete the results if you give it the -delete N flag.
Some sort of error on your machine caused the core to end the unit early, and the client then sent the partial results back to Stanford. Even if you can recover the files and complete the computation, the results won't be expected by the server, and they won't be accepted.
The client is running normally, but it isn't getting the assignments you think it should. Leave it alone. Fussing with the queue won't make things any better, and it might make them worse.

What doesn't qgen do?

It doesn't fix problems which aren't related to the queue statuses of existing work units. It doesn't repair any work files; in fact it doesn't modify anything not explicitly mentioned here. It doesn't allow anyone to cheat, beyond doing a few questionable things with credit designations which could just as well be done with the client itself. It won't reattach a unit which has expired. It won't allow changing the credit designation after a unit has already successfully uploaded.

What information does qgen need?

The units to be recovered must be preloaded into the work subdirectory. Any wuresults_xx.dat or wudata_xx.dat files found there will be considered for requeuing. For each of them, qgen will need to find the server IP address the unit must be returned to. This information will normally come from a copy of the Stanford psummaryC.html page, although it can also come from a sample old queue.dat file from the machine the units were originally run on.

For the credit designations, qgen normally gets the UserName and TeamNumber from the old queue.dat. It also tries to pick up the CPUID (UserID plus MachineID) from the old queue.dat, and configuration options needed to run the client on the machine doing the recovery are found in a sample client.cfg.

If the old queue.dat or psummaryC.html are not available, missing information can be supplied via a short text file qgen.txt.

What could go wrong?

Aside from the normal sorts of system and operator issues, there can be problems with the data to be recovered. If the work has expired, or the IP address it should be returned to can't be found, it won't be requeued. If the credit designation information isn't available, qgen will exit. If the recovery client is earlier than Version 5, it will trash the recovered queue, which is in V5 format. If the client isn't started with the proper flags, it might try to fetch a new unit to the recovery directory, which will most likely not be convenient, and possibly will lead to trouble later, since there are a few obscure data fields in queue.dat which qgen is unable to fill in. If in doubt, configure the client to ask before connecting to the Internet.

If there is already a client on the machine using MachineID=8, there will be a conflict. Most likely it won't produce ill effects unless the recovery client tries to fetch new work, but there will at least be a warning message.

If the recovery is being done under Windows, and the recovery client is not started with the -local flag, the client might run in the wrong directory and trash the files of the currently running (real) client.

If the recovery is being done after the client got an error and deleted the work, and it notified the server of the failure, then requeuing the work will be a waste of time. The results, even if calculated to completion, will not be expected by the server, and it will not accept them.

If the work files have been recovered from the Windows trash bin, they might contain nothing but zeros. Such a unit can not, of course, be requeued.

After the recovered files have been processed and/or returned, that directory should not be used to process new units, because the environment is not quite the way the client would set it up itself. This is especially true on Linux, where the UserID in machinedependent.dat has been fudged to make the CPUID exactly match the client the units were assigned to when the MachineID was whatever it was on that machine. See the paragraph below under "What Else" if using qgen to consolidate folding directories.

When a Windows client is being used for recovery, the CPUID will not, in general, match the CPUID to which the units were assigned. This has not, in the past, been a problem, but there is no guarantee that the servers at Stanford might not become more strict about this.

After the units have been returned to the servers by the recovery client, the Stanford statistics will, if the recovery was done under Linux, show that CPU as a Linux machine until the next work unit is returned by it. If the recovery is done under Windows, because of the non-matching CPUID, the statistics will show an extra (Windows) processor for that UserName.

Will qgen let me send in results more than once?

Yes, it will, but it would accomplish nothing. It would be the same as if you backed up a normal folding directory just before it returned results, and then restored and restarted it repeatedly. It will just bother the servers needlessly, and you won't get credit for any completed work more than once.

How do I run qgen?

The program should be run in a directory separate from the main folding directory, in which has been placed some of the following things. Only the first four are absolutely and always required:

The executable binary of qgen itself. See below.
A V5 console client. Get it from Stanford, or copy it from the active folding directory on the machine. Under Linux, a symbolic link is fine. The cores are not needed unless you are recovering wudata_xx.dat and other core work files for a unit only partially processed.
A client.cfg file, suitable for the machine on which qgen is being run, renamed to be called client.old. Get it from the active folding directory on the machine. Under Linux, a symbolic link is fine.
A work subdirectory, containing whatever wuresults_xx.dat or wudata_xx.dat files are to be requeued. If partially processed units are being recovered, also put the rest of the core work files into the work directory, or else the unit will be restarted from the beginning. If multiple results files are being recovered at once, they don't have to be saved with names corresponding to their original index numbers; for example, if you have two files both called wuresults_01.dat, you can rename one of them wuresults_02.dat and it will be just fine.
A sample queue.dat file, if needed and available, renamed to be called queue.old. Get it, if possible, from the machine on which the results were calculated. The UserName, TeamNumber, CPUID, possibly some server IP and port addresses, and expiration times will be extracted from it.
A second queue.dat file, if desired (usually when consolidating folding directories), renamed to be called queue.ol2, from which additional server IP and port addresses, and expiration times will be extracted.
A recent copy of the Stanford "psummary" web page, if needed, from which server IP addresses and expiration times will be read. Get it with your web browser and save the file psummary.html or psummaryC.html in this directory. There is no need to save all the images and other references which would be needed by the browser to display the page.
A plain text file qgen.txt, if needed, which can augment or override any automatically-gleaned info. Create it with a text editor if it is needed. It can contain any of these records that are needed, in any order:
```
name <UserName>
team <TeamNumber>
cpuid <UserID+MachineID>
proj <project number> <server IP[:port]>
initial <index number>
 
```
There can be as many "proj" records as needed, and they can specify the port number. If it is omitted, port 8080 will be used unless the project number appears in queue.old with a different port number. [Note that the angle brackets in the format above are used only to indicate a field to be filled in. Neither they nor the square brackets around the optional port number should appear in the actual file.]

If you have queue.old and the "psummary" page, you ordinarily should not need qgen.txt.

When qgen is run, it will produce new client.cfg and queue.dat files, and for Linux a machinedependent.dat file. It will set the MachineID to 8. It would be a good idea at this point to run qd to check things. If there are errors, you can fix them and rerun qgen as many times as necessary.

When it all looks OK, it should then be possible to run the client with the -local -verbosity 9 -send all flags. If there were incomplete wudata_xx.dat files also requeued, the -oneunit flag is also advised (without -send all) so the MachineID=8 client won't fetch new units. It is up to the user to decide when to run the client in this mode, as it will, of course, share time with the currently running (real) client on the machine.

This should work on any system type. On Windows, it will get the UserID from the registry, which will make it look like a different machine to Stanford if it is not in fact the machine on which the units were orphaned. On Linux, the -local flag is unnecessary.

Where do I get qgen?

Three precompiled binaries of qgen are available here. Depending on the download method, non-Windows users may have to change the permissions of the downloaded file to make it executable.

Linux Version   qgen (14K)
Windows Version   qgen.exe (15K)
Mac OS X CLI Version   mac_qgen.gz (9K)

What Else?

Here's what the output of qgen looks like.

qgen v1.1

Found the following units to requeue:
  index 3: + (finished) proj 724, run 21, clone 45, gen 104
  index 4: + (finished) proj 926, run 44, clone 56, gen 2
  index 5: + (incomplete) proj 1140, run 59, clone 80, gen 6
  index 7: + (finished) proj 1308, run 682, clone 0, gen 0
  index 8: + (incomplete) proj 0, run 0, clone 0, gen 0
  index 9: - can't read file "work/wuresults_09.dat"

Designation:
  UserName:    rph_iv
  TeamNumber:  0
  CPUID:       3539220A7A46065D

Constructing files for the folding environment and new queue:
  index 3: + OK for upload; proj 724, run 21, clone 45, gen 104
  index 4: - not queued; can't find server IP address for project 926
  index 5: + OK for processing; proj 1140, run 59, clone 80, gen 6
  index 7: - not queued; expired Fri Nov 26 10:50:04 2004
  index 8: - not queued; project number (and probably entire file) is zero

Units queued for processing: 1
Units queued for upload: 1
Errors: 4

done

Of course, this example has an absurd number of errors. They have been included here simply to show the sorts of messages which are possible. If you get errors, you can always make appropriate repairs and rerun qgen. It will happily overwrite its own queue.dat file, although as soon as the client has changed anything in it, qgen will refuse to run.

As mentioned above, it is possible to use qgen to consolidate two folding directories. This isn't 100% recommended, since it is always possible that there is something hidden in the queue.dat format which qgen doesn't construct quite right, and which could somehow confuse the client later on. Nothing of the sort is even suspected at this time; in fact, qgen takes considerable pains to build the file just right, but still, it isn't really the client's file. The Stanford license agreement states: "You may not alter the software or associated data files." The new queue.dat, strictly speaking, isn't "altered" — it's built from scratch — but continuing to run with it beyond the expedient of recovering data which would otherwise be lost is arguably in violation. Use your own judgment.

When consolidating two folding directories, it will be simplest to think of it as merging a second directory into the first one. Here are the necessary steps:

If you are running Linux, make a backup copy of machinedependent.dat in the first folding directory.
Stop both clients (or all clients if there are more than two).
In the first folding directory, rename client.cfg to be client.old.
In the first folding directory, rename queue.dat to be queue.old.
Copy the second directory's queue.dat into the first directory, renaming it to be queue.ol2.
Copy the second directory's work subdirectory into the first directory's work subdirectory. Only copy the files beginning "wu". Note that the files all have their queue index number encoded in the file name; for example, the results file for index five would be called wuresults_05.dat. If copying any of these files would conflict with and overwrite an active work file in the first folding directory, the index number can be changed in the name; that is, the "_05" could be changed, for example, to "_07" to avoid a conflict. That would cause qgen to queue that work at index 7, even though it originally came from index 5 in the second directory. If you are copying a group of work files for a unit in progress, be sure to rename them all to the same new unit number. If it's easier to rename the files in the first directory, that will work too, and it might be more convenient if you ultimately want to have the work queued in a particular order.
If you will have more than one unit queued for further processing, even after you rename the files so they will be queued in the right order, you might need to tell the client which queue index it should start at. You can do that with an "initial" entry in qgen.txt. Without such an entry, the current index will be set to index 9.
Run qgen. You won't need the "psummary" page.
Check everything with qd.
If you are combining more than two folding directories, go back to step 4 with the "second directory" now being the next to be merged in.
Copy client.old back to client.cfg. This will set your MachineID back to its proper value.
If you are running Linux, restore your backup copy of machinedependent.dat in the first folding directory. This will set your UserID back to its proper value.
Restart the first client as it would normally be run. Abandon the second client. Once you are sure everything is working as it should, you can clean up the second folding directory.