qgen Explanation
This page is an explanation of how to use the qgen
program.
Please read it, even though it is long.
What you need to do isn't really very difficult, and will take only several
minutes once you understand it, but if you do things wrong, you could
interfere with your currently running client and lose work instead of
recover it.
What does qgen do?
This program provides a means of recovering work when the Folding@home
queue.dat file has been lost or damaged, or it shows
completed results as having been deleted or uploaded when in fact they have
not reached Stanford.
It examines the present contents of the
work
subdirectory, and with help from several other files, it will construct
a new
queue.dat file for sending in or continuing
processing of orphaned work units.
Along with
queue.dat it also produces a modified
client.cfg file, and on Linux,
machinedependent.dat, so uploaded work will be
credited to the appropriate UserName and TeamNumber.
Unless the original
queue.dat file has been completely
lost, and the machine it was originally on is the same machine being used for
the recovery,
qgen should be run in a new directory
specifically created for the purpose.
It will set up the client configuration with MachineID=8 for the recovery.
Why would I want to run qgen?
The most common reasons are as follows:
- You have transported completed wuresults_xx.dat
files from a machine not connected to the Internet, but for some reason you
were not able to copy their associated queue.dat
file too.
- You ran a client in "nonet" mode through all ten queued units, and
then it went back and cleared the first queue slot before it stopped,
thus leaving that first results file disconnected.
- Your client got a lot of download errors and cycled all the way through
the queue after finishing a unit.
Sometimes this leaves results unable to return.
- You restored a backup of the folding directory which overwrote
queue.dat, but left later results still in the
work subdirectory.
- You want to transfer new or completed units from a PC to a Mac or the other
way around, and need a way to convert the queue.dat
file format.
- Someone else unwilling or unable to run qgen has
sent you some completed results, and you have agreed to try to help return
them.
- Your client got an error probably due to a transient hardware malfunction,
then deleted its current work without notifying the Stanford server and
downloaded a new unit.
You think you can recover all the deleted work files but you have no backup of
queue.dat.
- Because of previous recoveries either by qgen or
from backups, you have more clients on your machine than you want, and you
would like to consolidate their queues.
- You loaded up a V4 client after running V5 and it destroyed everything.
- You were trying to be too clever doing something else and somehow blew
away your queue.dat.
There are also several situations when running
qgen is
not the best thing to do:
- You have discovered a wuresults_xx.dat file left
over in your work directory, but in fact it was
uploaded but just not deleted by the client.
This happens sometimes.
It's best to check your stats, and check your client log to see if you got
the "Thank you" message.
There is nothing to be gained by sending in work more than once, and it is just
a lot of trouble.
- The client was shut down just as the core was finishing, and when it
restarted the core, it got an error which made it delete the unit and fetch
a new one.
In this case the results might be orphaned, but it will be much easier to
reattach them using qfix.
Proceed with qgen only if qfix
can't do the job, and you still want to go ahead after seeing what
qfix says is wrong.
- The same as case 2, except that the core restarted the work from the
beginning.
In this case, if there is no results file, there is nothing to recover, but if
it exists, and you are sure it didn't get truncated if the core was killed
before it shut itself down, then you can either move the results file
elsewhere and use qgen or else delete the current unit
and use qfix to reattach it.
The client won't delete the results if you give it the
-delete N
flag.
- Some sort of error on your machine caused the core to end the unit early,
and the client then sent the partial results back to Stanford.
Even if you can recover the files and complete the computation, the results
won't be expected by the server, and they won't be accepted.
- The client is running normally, but it isn't getting the assignments
you think it should.
Leave it alone.
Fussing with the queue won't make things any better, and it might make them
worse.
What doesn't qgen do?
It doesn't fix problems which aren't related to the queue statuses of existing
work units.
It doesn't repair any work files; in fact it doesn't modify anything not
explicitly mentioned here.
It doesn't allow anyone to cheat, beyond doing a few questionable things with
credit designations which could just as well be done with the client
itself.
It won't reattach a unit which has expired.
It won't allow changing the credit designation after a unit has already
successfully uploaded.
What information does qgen need?
The units to be recovered must be preloaded into the
work subdirectory.
Any
wuresults_xx.dat or
wudata_xx.dat files found there will be considered
for requeuing.
For each of them,
qgen will need to find the server
IP address the unit must be returned to.
This information will normally come from a copy of the Stanford
psummaryC.html page, although it can also come from
a sample old
queue.dat file from the machine the
units were originally run on.
For the credit designations,
qgen normally gets the
UserName and TeamNumber from the old
queue.dat.
It also tries to pick up the CPUID (UserID plus MachineID) from the old
queue.dat, and configuration options needed to run the
client on the machine doing the recovery are found in a sample
client.cfg.
If the old
queue.dat or
psummaryC.html are not available, missing information
can be supplied via a short text file
qgen.txt.
What could go wrong?
Aside from the normal sorts of system and operator issues, there can be
problems with the data to be recovered.
If the work has expired, or the IP address it should be returned to can't be
found, it won't be requeued.
If the credit designation information isn't available,
qgen will exit.
If the recovery client is earlier than Version 5, it will trash the recovered
queue, which is in V5 format.
If the client isn't started with the proper flags, it might try to fetch a
new unit to the recovery directory, which will most likely not be convenient,
and possibly will lead to trouble later, since there are a few obscure
data fields in
queue.dat which
qgen is unable to fill in.
If in doubt, configure the client to ask before connecting to the Internet.
If there is already a client on the machine using MachineID=8, there will be
a conflict.
Most likely it won't produce ill effects unless the recovery client tries to
fetch new work, but there will at least be a warning message.
If the recovery is being done under Windows, and the recovery client is not
started with the
-local
flag, the client might run in the
wrong directory and trash the files of the currently running (real) client.
If the recovery is being done after the client got an error and deleted the
work, and it notified the server of the failure, then requeuing the work
will be a waste of time.
The results, even if calculated to completion, will not be expected by the
server, and it will not accept them.
If the work files have been recovered from the Windows trash bin, they might
contain nothing but zeros.
Such a unit can not, of course, be requeued.
After the recovered files have been processed and/or returned, that directory
should not be used to process new units, because the environment is not quite
the way the client would set it up itself.
This is especially true on Linux, where the UserID in
machinedependent.dat has been fudged to make the CPUID
exactly match the client the units were assigned to when the MachineID was
whatever it was on that machine.
See the paragraph below under "What Else" if using
qgen
to consolidate folding directories.
When a Windows client is being used for recovery, the CPUID will not, in
general, match the CPUID to which the units were assigned.
This has not, in the past, been a problem, but there is no guarantee that
the servers at Stanford might not become more strict about this.
After the units have been returned to the servers by the recovery client,
the Stanford statistics will, if the recovery was done under Linux, show
that CPU as a Linux machine until the next work unit is returned by it.
If the recovery is done under Windows, because of the non-matching CPUID,
the statistics will show an extra (Windows) processor for that UserName.
Will qgen let me send in results more
than once?
Yes, it will, but it would accomplish nothing.
It would be the same as if you backed up a normal folding directory just
before it returned results, and then restored and restarted it
repeatedly.
It will just bother the servers needlessly, and you won't get credit for
any completed work more than once.
How do I run qgen?
The program should be run in a directory separate from the main folding
directory, in which has been placed some of the following things.
Only the first four are absolutely and always required:
- The executable binary of qgen itself.
See below.
- A V5 console client.
Get it from Stanford, or copy it from the active folding directory on the
machine.
Under Linux, a symbolic link is fine.
The cores are not needed unless you are recovering
wudata_xx.dat and other core work files for a unit
only partially processed.
- A client.cfg file, suitable for the machine on which
qgen is being run, renamed to be called
client.old.
Get it from the active folding directory on the machine.
Under Linux, a symbolic link is fine.
- A work subdirectory, containing whatever
wuresults_xx.dat or
wudata_xx.dat files are to be requeued.
If partially processed units are being recovered, also put the rest of the
core work files into the work directory, or else the
unit will be restarted from the beginning.
If multiple results files are being recovered at once, they don't have to
be saved with names corresponding to their original index numbers; for example,
if you have two files both called wuresults_01.dat,
you can rename one of them wuresults_02.dat and it
will be just fine.
- A sample queue.dat file, if needed and available,
renamed to be called queue.old.
Get it, if possible, from the machine on which the results were
calculated.
The UserName, TeamNumber, CPUID, possibly some server IP and port addresses,
and expiration times will be extracted from it.
- A second queue.dat file, if desired (usually when
consolidating folding directories), renamed to be called
queue.ol2, from which additional server IP and port
addresses, and expiration times will be extracted.
- A recent copy of the Stanford "psummary" web page, if needed, from
which server IP addresses and expiration times will be read.
Get it with your web browser and save the file
psummary.html or
psummaryC.html in this directory.
There is no need to save all the images and other references which would be
needed by the browser to display the page.
- A plain text file qgen.txt, if needed, which can
augment or override any automatically-gleaned info.
Create it with a text editor if it is needed.
It can contain any of these records that are needed, in any order:
name <UserName>
team <TeamNumber>
cpuid <UserID+MachineID>
proj <project number> <server IP[:port]>
initial <index number>
There can be as many "proj" records as needed, and they can specify the
port number.
If it is omitted, port 8080 will be used unless the project number appears
in queue.old with a different port number.
[Note that the angle brackets in the format above are used only to indicate
a field to be filled in.
Neither they nor the square brackets around the optional port number should
appear in the actual file.]
If you have queue.old and the "psummary" page, you
ordinarily should not need qgen.txt.
When
qgen is run, it will produce new
client.cfg and
queue.dat
files, and for Linux a
machinedependent.dat file.
It will set the MachineID to 8.
It would be a good idea at this point to run
qd to
check things.
If there are errors, you can fix them and rerun
qgen
as many times as necessary.
When it all looks OK, it should then be possible to run the client with the
-local -verbosity 9 -send all
flags.
If there were incomplete
wudata_xx.dat files also
requeued, the
-oneunit
flag is also advised (without
-send all
) so the MachineID=8 client won't fetch new
units.
It is up to the user to decide when to run the client in this mode, as it
will, of course, share time with the currently running (real) client
on the machine.
This should work on any system type.
On Windows, it will get the UserID from the registry, which will make it
look like a different machine to Stanford if it is not in fact the machine
on which the units were orphaned.
On Linux, the
-local
flag is unnecessary.
Where do I get qgen?
Three precompiled binaries of
qgen are available
here.
Depending on the download method, non-Windows users may have to change the
permissions of the downloaded file to make it executable.
What Else?
Here's what the output of
qgen looks like.
qgen v1.1
Found the following units to requeue:
index 3: + (finished) proj 724, run 21, clone 45, gen 104
index 4: + (finished) proj 926, run 44, clone 56, gen 2
index 5: + (incomplete) proj 1140, run 59, clone 80, gen 6
index 7: + (finished) proj 1308, run 682, clone 0, gen 0
index 8: + (incomplete) proj 0, run 0, clone 0, gen 0
index 9: - can't read file "work/wuresults_09.dat"
Designation:
UserName: rph_iv
TeamNumber: 0
CPUID: 3539220A7A46065D
Constructing files for the folding environment and new queue:
index 3: + OK for upload; proj 724, run 21, clone 45, gen 104
index 4: - not queued; can't find server IP address for project 926
index 5: + OK for processing; proj 1140, run 59, clone 80, gen 6
index 7: - not queued; expired Fri Nov 26 10:50:04 2004
index 8: - not queued; project number (and probably entire file) is zero
Units queued for processing: 1
Units queued for upload: 1
Errors: 4
done
Of course, this example has an absurd number of errors.
They have been included here simply to show the sorts of messages which are
possible.
If you get errors, you can always make appropriate repairs and rerun
qgen.
It will happily overwrite its own
queue.dat file,
although as soon as the client has changed anything in it,
qgen will refuse to run.
As mentioned above, it is possible to use
qgen to
consolidate two folding directories.
This isn't 100% recommended, since it is always possible that there is
something hidden in the
queue.dat format which
qgen doesn't construct quite right, and which could
somehow confuse the client later on.
Nothing of the sort is even suspected at this time; in fact,
qgen takes considerable pains to build the file just
right, but still, it isn't really the client's file.
The Stanford license agreement states:
"You may not alter the software or
associated data files."
The new
queue.dat, strictly speaking, isn't "altered"
— it's built from scratch — but continuing to run with it beyond
the expedient of recovering data which would otherwise be lost is arguably in
violation.
Use your own judgment.
When consolidating two folding directories, it will be simplest to think of it
as merging a second directory into the first one.
Here are the necessary steps:
- If you are running Linux, make a backup copy of
machinedependent.dat in the first folding directory.
- Stop both clients (or all clients if there are more than two).
- In the first folding directory, rename client.cfg
to be client.old.
- In the first folding directory, rename queue.dat
to be queue.old.
- Copy the second directory's queue.dat into the first
directory, renaming it to be queue.ol2.
- Copy the second directory's work subdirectory into
the first directory's work subdirectory.
Only copy the files beginning "wu".
Note that the files all have their queue index number encoded in the file
name; for example, the results file for index five would be called
wuresults_05.dat.
If copying any of these files would conflict with and overwrite an active
work file in the first folding directory, the index number can be changed
in the name; that is, the "_05" could be changed, for
example, to "_07" to avoid a conflict.
That would cause qgen to queue that work at index 7,
even though it originally came from index 5 in the second directory.
If you are copying a group of work files for a unit in progress, be sure to
rename them all to the same new unit number.
If it's easier to rename the files in the first directory, that will work
too, and it might be more convenient if you ultimately want to have the
work queued in a particular order.
- If you will have more than one unit queued for further processing, even
after you rename the files so they will be queued in the right order, you
might need to tell the client which queue index it should start at.
You can do that with an "initial" entry in
qgen.txt.
Without such an entry, the current index will be set to index 9.
- Run qgen.
You won't need the "psummary" page.
- Check everything with qd.
- If you are combining more than two folding directories, go back to step 4
with the "second directory" now being the next to be merged in.
- Copy client.old back to
client.cfg.
This will set your MachineID back to its proper value.
- If you are running Linux, restore your backup copy of
machinedependent.dat in the first folding
directory. 
This will set your UserID back to its proper value.
- Restart the first client as it would normally be run.
Abandon the second client.
Once you are sure everything is working as it should, you can clean up the
second folding directory.