FOSSology Project Logo FOSSology
Advancing open source analysis and development
 

fossology-scheduler

The fossology scheduler daemon processes the fossology job input queue and agents.

The scheduler keeps track of three things: jobs, agents, and running tasks. It records its status in the scheduler_status table. The scheduler assigns each job to 1 or more agents, and ensures that there are not too many jobs running at the same time. The number, hosts, and agent parameters are all specified in the schedulers config file (typically /usr/local/etc/fossology/Scheduler.conf).

The logic works as follows:

  • Read in a job from the jobqueue (or from stdin if -I is used).
  • Check if there is an available agent already running. If so, then use it to process the job.
  • If there is no available agent running, check if there is room to spawn a new agent. If so, spawn the new agent and give it the job.
  • If there is no room, see if some other kind of agent can be killed to make room. (If an agent is sitting unused, then kill it.) Then spawn the right kind of agent and give it the job.
  • Otherwise… Hold onto the job until it can be assigned to an agent.

For a detailed developer oriented discussion of fossology-scheduler see scheduler.

The scheduler must run as user 'fossy'. If it is started as root, then it will immediately change itself to run as user 'fossy' in group 'fossy'.

SYNOPSIS

/usr/local/fossology/agents/fossology-scheduler [options] [setup.conf] < 'type command'
Common usage:
/usr/local/fossology/agents/fossology-scheduler -d -L /tmp/foss-scheduler.log

DESCRIPTION

This is the fossology job scheduler. When an upload is analyzed by fossology it is the scheduler that takes the upload and schedules a number of agents to operate on that upload. The most basic agents do things like unpack the archive and store it in the repository and the data-base. Other agents perform tasks like license analysis, and meta-data analysis.

The scheduler uses a configuration file to schedule the agents. This allows for flexible scheduling depending on the machines resources available. Usually the agent configuration file is created when fossology is installed. If setup.conf is not specified then /usr/local/share/fossology/agents/foss-scheduler.conf is used.

Advanced users or programmers adding agents to the system may need to regenerate this file or edit it directly. A custom file can also be used.

The setup.conf file defines each kind of known agent and how to run it. The list of jobs to run comes from the database's jobqueue table. Alternately (for debugging), -I can be used to specify the jobs to run using stdin.

Configuring the Scheduler

The scheduler uses a configuration file to specify the number of processes per host and each agent. A configuration file creator script, mkschedconf, is available to aid in the creation of this file. For example, here is a snipped of a config file where all the agents run on localhost:

%Host localhost 2 1
agent=wget host=localhost | /usr/local/fossology/agents/wget_agent
agent=unpack host=localhost | /usr/local/fossology/agents/engine-shell unpack '/usr/local/fossology/agents/ununpack -d /home/repository//ununpack/%{U} -qRCQx'
agent=filter_license host=localhost | /usr/local/fossology/agents/Filter_License
agent=filter_license host=localhost | /usr/local/fossology/agents/Filter_License
agent=license host=localhost | /usr/local/fossology/agents/bsam-engine -L 20 -A 0 -B 60 -G 10 -M 2 -E -T license -O n -- - /usr/local/share/fossology/agents/License.bsam
agent=mimetype host=localhost | /usr/local/fossology/agents/mimetype
agent=mimetype host=localhost | /usr/local/fossology/agents/mimetype
agent=specagent host=localhost | /usr/local/fossology/agents/specagent
agent=filter_clean host=localhost | /usr/local/fossology/agents/filter_clean -s
agent=pkgmetagetta host=localhost | /usr/local/fossology/agents/pkgmetagetta
agent=pkgmetagetta host=localhost | /usr/local/fossology/agents/pkgmetagetta

Here is a snippet where agents will be spawned both on the localhost and across multiple machines via ssh:

agent=pkgmetagetta host=buckbeak.ostt | /usr/bin/ssh buckbeak.ostt "/usr/local/lib/fossology/agents/pkgmetagetta"
agent=selftest host=buckbeak.ostt | /usr/bin/ssh buckbeak.ostt "/usr/local/lib/fossology/agents/selftest -s"
agent=adj2nest host=buckbeak.ostt | /usr/bin/ssh buckbeak.ostt "/usr/local/lib/fossology/agents/adj2nest"
agent=fo_notify host=sirius.ostt | /usr/local/lib/fossology/agents/engine-shell fo_notify '/usr/local/bin/fo_notify %{*}'
agent=fosscp_agent host=sirius.ostt | /usr/bin/ssh buckbeak.ostt "/usr/local/lib/fossology/agents/engine-shell fosscp_agent '/usr/local/bin/cp2foss %{*}'"
agent=fossjobstat host=localhost | /usr/local/lib/fossology/agents/engine-shell fossjobstat '/usr/local/bin/fossjobstat %{*}'

The format of the configuration file is as follows:

  • Lines beginning with a ”#” are comments.
  • Lines beginning with a ”%” are settings.
  • %Verbose specifies the verbose level (same as using ”-v” on the command-line). %Verbose 2 is like ”-vv”.
  • %Host lists a host name, the number of agents that can run at a time, and the number of urgent (additional) agents that can run. Currently “urgent” is implemented but not used and not tested.
  • All other lines define agents. The agent definitions use two parts: attributes | command.
  • There is one line per agent. If you want to permit three unpack agents on the same host, then you will need to have three of the exact same line!
  • Attributes:
    • agent=name. This comes from the agent table and specifies the type of agent.
    • host=name. “name” is an identifier used to group agent lines. Typically, it is set to the name of the host to run the agent on. The actual system the agent will run on is specified in the command.
  • A vertical bar (|) separates the attribute list from the command.
  • The command will be used by system() to run the agent.
  • Each command is also passed an environment variable “$THREAD_UNIQUE”. This specifies the unique thread number for the process. NOTE: It is unique for the current running, but if the child dies then the value will likely be reused. In some situations, this is better than $PID or $PPID for managing any temporary files.
  • Some commands may appear to contain macro expansion variables, like ${U} or ${*}. However, these are not processed by the scheduler. They are processed by the agent. For example, the agent called “engine-shell” is used to run ugly-hack agents from shell scripts.
  • Commands can be shells around agent processes. For example, “engine-shell” is an agent-aware wrapper for shell scripts. Similarly, you can use “ssh” (specify the full path!) to run a command on a remote host. Remember: The command that is executed is independent of the attribute string “host=”.
  • With the -I option, stdin lists the jobs to run. (For debugging)
  • stdout comes from threads, non-interlaced and only when thread ends.
  • stderr comes from threads, interlaced and immediate.

Usage

Usage: ./fossology-scheduler [options] [setup.conf] < 'type command'

  • -d Run as a daemon! Still generates stdout and stderr
  • -i Initialize the database, then exit.
  • -h Print this usage message
  • -H Ignore hosts for host-specific agent requests
  • -I Use stdin and queue (default: use queue only). This is only for debugging the scheduler.
  • -k Kill the running schedulers. All other options are ignored.
  • -v verbose (-v -v = more verbose)
  • -L log send stdout and stderr to log
  • -q Run Quietly. Default shows all agent status (FREE, RUNNING, …) changes.
  • -R Reset the job queue in case something was hung.
  • -t test every agent to see if it runs, then quit.
  • -T Test every agent to see if it runs, then continue if no problems.
  1. setup.conf: defines each engine – one 'type command' per line
  2. If setup.conf is not specified the installed conf file (/usr/local/etc/fossology/Scheduler.conf for package installs) is used.
  3. stdin lists type+data, one per line.
  4. stdout comes from threads, non-interlaced and only when thread ends.
  5. stderr comes from threads, interlaced and immediate.
  6. Each stdin line is matched to a free engine of the same type.
  7. If no engine is free, then it will pause until one is available.
  8. Normally partially processed jobqueue items are automatically cleaned up by the scheduler after the scheduler detects abandonded jobs in the queue. However, this detection may take 10-20 minutes. Use -R to reset the queue immediately.

Examples

The standard way to start and stop fossology-scheduler is:

sudo /etc/init.d/fossology start
sudo /etc/init.d/fossology stop

To validate the scheduler configuration file and verify that all agents will run:

sudo /usr/lib/fossology/foss-scheduler -t -L stdout
 
foss-scheduler.txt · Last modified: 2010/04/12 15:50 by laser

Copyright (C) 2007-2009 Hewlett-Packard Development Company, L.P.
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2
Recent changes RSS feed Valid XHTML 1.0 Valid CSS3 Driven by DokuWiki