====== Troubleshooting ======
//**Q:** The jobs don't seem to be progressing, how do I tell if jobs are being processed?//
**A:** Use 'ps' to see what jobs are currently running: ps -u fossy -f. Also use top: top -u fossy. If none of the processes are consuming CPU cycles and jobs are scheduled then the scheduler might be hung. However, if any jobs are running and consuming CPU cycles, then there is probably nothing hung -- some jobs just take a while. (Processing large files, like ISO images, can take hours.)
Use 'psql' to check the database:
SELECT * FROM scheduler_status WHERE agent_status='RUNNING';
At minimum, the scheduler should be running. Look at the last update time. The scheduler updates this table every 10 seconds. If the time is older than 10 seconds (even 20 seconds for good luck), then the scheduler is likely hung. This can happen if there was a connectivity issue with the database.
Assuming the scheduler_status table is updating, check the log table for any abnormal errors.
SELECT * FROM log ORDER BY log_pk desc;
If over time the number of jobs does not decrease in the show jobs screen, then either the scheduler is no longer running or the applications have lost contact with the DataBase.
You can enable scheduler logging by creating a file "/etc/default/fossscheduler". The contents should specify a log file:
SCHEDULEROPT="-d -L /tmp/fossscheduler.log"
Then restart the scheduler:
sudo /etc/init.d/fossscheduler stop
Use "ps -ef | grep scheduler" and "sudo kill -9 " to ensure that the scheduler stopped.
sudo /etc/init.d/fossscheduler start
tail -f /var/log/fossology/fossscheduler.log
The log file identifies every running job. If a job fails, the scheduler will log exactly what was running and the parameters that were passed to the scheduler.
//**Q:** What are some common failure causes?//
**A:** There are a few things that can quickly cause failures:
* **Database restart**. If connectivity to the database drops for any reason, the scheduler and all agents will fail.
* **Network connectivity**. When using FOSSology across remote hosts, a drop in network connectivity can lead to process failures.
* **Bad URLs**. The #1 cause of wget_agent failures are bad URLs. As user fossy, try running wget with the provided URL. Bad proxy settings or invalid URLs cause problems. Running the wget command as user fossy quickly shows the problem.
//**Q:** How do I reset a failed job?//
**A:** If a job failed due to a connectivity or network problem (and the problem has been resolved), then you can restart the job with the following SQL command:
UPDATE jobqueue SET jq_starttime=NULL,jq_endtime=NULL,jq_end_bits=0 WHERE jq_end_bits=2;
When jobs fail, the jq_end_bits field is set to "2".
//**Q:** Why do some jobs take so long to finish?//
**A:** Some jobs contain thousands of licenses and thousands of files. It takes time to read and analyze all of them and store the analysis in the DataBase. By examining the show jobs output for your job, it can be determined how far along the various agents are. For example if the license agent output shows 105/2000 that means it has processed 105 out of the 2000 licenses it found. There is a lot of work left to do for this job. Depending on many factors, a job can take anywhere from 10 minutes to over a day to process.
===== Factors Influencing Job Performance=====
* If much of the code has already been analyzed due to previous uploads of the same code, the job will run much faster.
* Using as much memory as possible on all servers.
* The type of server(s) in use. Performance is much improved by the use of separate machines for the agents to run on.
* Separating out the DataBase to its own machine will also aid in performance.
* The use of external disk arrays can also help with db and agent performance.