FOSSology Project Logo FOSSology
Advancing open source analysis and development
 

Agents

The entire FOSSology system is a combination of agents that run in series:

  1. The unpack agent extracts files to analyze.
  2. The filter_license agent creates the bsam cache files.
  3. The license analysis agent runs the bsam algorithm on the cache files. This generates the list of detected licenses.
  4. The filter_clean agent purges the bsam cache files that are no longer needed.
  5. The engine-shell agent is a generic agent that can turn any command-line program into an agent. This is currently used by the unpack agent.
  6. The mimetype agent associates a mime-type with every file.
  7. The pkgmetagetta agent extracts meta data from files.
  8. The specagent agent is a special subset of pkgmetagetta and designed for extracting information from RPM ”.spec” files.
  9. The sqlagent agent performs generic SQL requests. This is useful if the purpose of the agent is to perform some SQL actions.
  10. The delagent agent deletes uploads, folders, or license information. It can also be used from the command-line.

Here is a brief summary of each of the 'core' agents and their functioning:

Unpack

The unpack agent extracts files from containers. A container is any kind of file that stores other files. For example, a ZIP file contains an archive of different files. Other types of containers include tar, ar, ISO, and rpm files. The extraction is performed by the Universal Unpacker (ununpack) program.

  Universal Unpacker, version 0.9.6, compiled Feb 7 2007 14:16:32
  Usage: ununpack [options] file [file [file...]] 
  Extracts each file.
  If filename specifies a directory, then extracts everything in it.

  Unpack Options:
  -C     :: force continue when unpack tool fails.
  -d dir :: specify alternate extraction directory.
            Default is the same directory as file.
  -m #   :: number of CPUs to use (default: 1).
  -P     :: prune files: remove links, >1 hard links, zero files, etc.
  -R     :: recursively unpack (same as '-r -1')
  -r #   :: recurse to a specified depth (0=none/default, -1=infinite)
  -X     :: remove recursive sources after unpacking.
  -x     :: remove ALL unpacked files when done (clean up).

  I/O Options:
  -L out :: Generate a log of files extracted to out.
  -F     :: Using files from the repository.
  -Q     :: Using osrb queue system. (Includes -F)
            Each source name should come from the repository.
            First 'gold' is checked, then 'files'.
            If -L is used, unpacked files are placed in 'files'.
  -T rep :: Set gold repository name to 'rep' (for testing)
  -t rep :: Set files repository name to 'rep' (for testing)
  -q     :: quiet (generate no output).
  -v     :: verbose (-vv = more verbose).

  Currently identifies and processes:
  * Unordered List Item
  * Compressed files: .Z .gz .bz .bz2 upx
  * Archives files: tar cpio zip jar ar rar cab
  * Data files: pdf
  * Installer files: rpm deb
  * File images: iso9660(plain/Joliet/Rock Ridge) FAT(12/16/32) ext2/ext3 NTFS
  * Boot partitions: x86, vmlinuz

Besides unpacking files, the ununpack program extracts meta data from containers. For example, ZIP and RPM files include meta data that is not unpacked during file extraction. The meta data is extracted to a ”.meta” file. For example, happy.zip will create happy.zip.meta.

Implementation Notes

The unpack agent uses the command-line universal unpacker (ununpack) tool to extract files.

The unpack agent is implemented using the Engine-Shell agent wrapper around the ununpack executable. The rational: Agents are expected to always be running. This is to cut down on spawning time. However, ununpack spawns lots of processes to unpack files. (For example, to unpack a tar file, tar is used in a system call. The same for unzip, rpm, ar, etc.) Since there is already a massive delay from spawning times, using Engine-Shell to spawn one ununpack process does not add a significant delay.

The unpack agent is implemented as a host-specific multi-SQL (MSQ) query.

  • The unpack agent takes two arguments: the ufile ID and pfile name (sha1.md5.len) to process.
    • The pfile name specifies the file to unpack. If the pfile exists in the gold repository, then it is used. Otherwise, the files repository is checked. Otherwise, the unpack fails since the file could not be found.
    • The ufile ID is used when inserting the first entry into the database. This allows unpacked files to be chained in a ufile relationship tree.
  • Although the MSQ query only returns one record, a host-specific MSQ query is used. This is because of the host-specific requirement. Since the source file may be an ISO and very large, we don't want to transfer it using NFS in order to unpack it. Thus, it runs on a specific host. There will still be a significant amount of NFS transfers (i.e., every unpacked file is placed in the repository; for two repository hosts, that is an average of 50% NFS transfers) however, the first file is usually the largest so the amount of the NFS transfers is less.

The unpack agent creates “artifact” files. There are three types of artifacts: artifact.meta, artifact.dir, and artifact.unpacked.

  • Meta data extracted by ununpack does not actually have a filename. When using ununpack as a command-line tool, it just appends ”.meta” to the filename. However, when inserting data into the database, it uses the name “artifact.meta” and links it to the container file.
  • Files are unpacked into directories. The ununpack program creates a ”.dir” directory for each container. (E.g., happy.zip is extracted into the directory happy.zip.dir.) This directory name is also an artifact and is added to the database with the name “artifact.dir”.
  • Compressed files do not actually store their uncompressed filenames. For example, `zcat happy.gz` extracts the file contents but not the filename. (And for anyone who things “happy.gz” should extract to “happy”, you need to remember that not every gzipped file ends with ”.gz”. What if the compressed file is called “happy” and not “happy.gz”?) Packed files, where a single file contains the packed contents of a single file (e.g., .gz, .Z, .bz2, and upx) extract to a file named ”.unpacked”. So happy.gz extracts to happy.gz.unpacked. Unpacked files are added to the database using the name “artifact.unpacked”.

In general, anything that does not have an explicit filename is an “artifact”.

filter license

The bsam analysis tool uses pre-processed license files. These are tokenized versions of the original data files. The filter_license tool converts a real file into a tokenized bsam file.

For the filter_license agent:

  • The agent runs as a host-specific MSQ query. This is because there will be many files to process and we can reduce NFS costs by running on the same host as the data file.
  • The MSQ query generates a list of pfile IDs and pfile names (sha1.md5.len) to process.
  • The source comes from the files repository.
  • The tokenized bsam file uses the same sha1.md5.len name as the source, but is placed in the license repository.
  • The agent_lic_cache table is populated with the pfile ID. The agent_lic_cache.processed is set to false, and agent_lic_cache.inrepository is set to true.

Notes on the format of the bSAM cache file

Tue Jan 31 09:34:32 MST 2006
  
New bSAM binary data format:
All data entries are in the same format:
  type (2 bytes)
  size (2 bytes, unsigned) -- size of the data (can be zero meaning "no data")
  data (size matches size)

Basic Type categories:
Type & 0xFFF0 == 0x0000 :: File oriented
Type & 0xFFF0 == 0x0100 :: Function oriented
Type & 0xF000 == 0xF000 :: Comment

All strings should be null terminated!

The different types:
0000 EOF -- size and data are not required
0001 File name -- data is string
0002 File checksum -- data the checksum (may be a string)
0003 File license -- data string
0004 File type -- data string (e.g., "C", "Java", "Class", "Obj")
	File type is used to ensure that comparisons are only done
	between same type of data
0010 File unique value -- data string

0101 Function name -- data is string
0103 Function license -- see 03 File license
0104 Function type -- see 04 File type
0108 Function tokens -- data contains tokens, 2 bytes each!
0110 Function unique value -- data string
0118 Function tokens OR list -- 2 byte tokens, at least one of which must
	be in the comparison token list.  (None are in the comparison?
	Then skip the comparison!)
0128 Function tokens AND list -- 2 byte tokens, all must be in the comparison
	token list.
0129 Function "important" tokens (to be used for choosing between similar matches)
0131 Byte offset to start in untokenized file (length is always 4)
0132 Byte offset to end in untokenized file (length is always 4)
	Both contain 4 bytes.
	0131 and 0132 values should be reset to "undefined" each time
	0101 is seen.
0138 Byte offsets for tokens.
	Each token in the 0108 tag will be represented by 1 byte here.
	This byte is the number of bytes to skip between tokens.
	This way, matches can be calculated down to specific locations
	in the file rather than general ranges specified by 0131 - 0132.
	There is one extra byte here so the end of the last token is known.
	E.g., if the match starts at token #7, then:
	  for(i=0; i<7; i++) RealOffset += TokenValue0138[i].
0140 Single-sentence licenses (text, tokenized into space separated)
01FF End of Function (ok to start processing) -- size is always zero.

F001 File Comment
F101 Function Comment
FFFF General Comment

All unknown types are skipped (treated as comments).

=====================================================================
Wed Feb  1 12:48:49 MST 2006

With the new data format, I don't need to separate .o, .java, .c, and .class
files.  I can store them all in one directory.
BUT: Different file formats need different bsam parameters.
(The number of similar tokens varies based on the language.)
For this reason, I am still keeping them separate.

The license analysis Agent

This agent runs the command-line bsam program for comparing bsam cache files. The bsam program compares two cache files against each other and identifies similarities between them. For license analysis, the first file is the cache file created by filter_license. The second file is /usr/local/share/fossology/agents/License.bsam. This file is a bsam cache containing all of the tokenized known-licenses.

Scheduler Type

The bsam-engine runs as a host-specific MSQ agent. The rational:

  • License cache files may be large, so transferring them over NFS could be time consuming.
  • There are usually many files that need to be analyzed, so an MSQ agent is very desirable (rather than queueing up every file individually).

Command-Line Parameters

The License Agent uses a stand-alone program called “bsam-engine”. A sample scheduler configuration file entry:

agent=license host=fawkes | /usr/bin/ssh fossy@fawkes "/usr/local/fossology/agents/bsam-engine
         -L 20 -A 0 -B 60 -G 10 -M 2 -E -T license -O n -- -
         /usr/local/share/fossology/agents/License.bsam"

This entry specifies the following parameters to bsam-engine:

  • -L 20 :: The minimum sequence length is 20 tokens. Shorter matches are ignored. The value of 20 was chosen because you cannot have a good license in under 20 words (with punctuation).
  • -A 0 :: The percent of the file being analyzed (the pfile from the database) that matches the license can be as small as 0% (any match is good).
  • -B 60 :: The percent of the license that matches the pfile must be at least 60%.
  • -G 10 :: Although the minimum sequence length is 20 tokens, each matched token can be as far as 10 tokens apart (gap of 10). The value of 10 was chosen to get past company names with punctuation. (E.g., “Hewlett-Packard Company, Inc.” is 7 tokens)
  • -M 2 :: The minimum sequential is two tokens. If a token is by itself, then it is not a match. This works with -G to specify how gaps are handled.
  • -E :: This performs an exhaustive search. As each best-match is found, the remaining portions of the pfile are recursed independently. This is slow, but it finds all licenses (including multiple licenses per file). (This is also what eats up most of the processing performance. If you want to know why some files take a long time to process, it is due to -E.)
  • -T license :: Sets the repository name to use for comparisons. This specifies the license repository.
  • -O n :: This says to generate non-text output and store results in the database. For testing, use ”-O t” (text) or ”-O N” (non-text to stdout rather than the database).
  • – :: This ends the hyphen commands for getopt.
  • - :: The first file will come from stdin.
  • /usr/local/sahre/fossology/agents/License.bsam :: The second file is the static license cache.

Input Values

The stdin stream from the scheduler can contain any of four values:

  • A=file :: Set the first filename (A file) to the string. The string from the jobqueue's MSQ is a pfile name: sha1.md5.len.
  • Akey=Num :: Set the pfile_pk associated with the file.
  • B=file :: Similar to A=file, this changes the value of the second filename (B file). If this is not set, then bsam-engine continues to use the License.bsam file.
  • Bkey=Num :: This is the counterpart of Akey.

For license analysis, only A and Akey are specified. The B and Bkey are provided as support for future agents.

Each line can contain an A, Akey, B, and Bkey values. After the line is read, the license values are generated.

Output Values

When using ”-O n”, licenses are stored in the agent_lic_meta table. The agent_lic_cache table is updated so the processed column is marked as true.

The processed column is required since a file may be processed and not generate any licenses.

Performance

The bSAM engine is fast for what it does, but it is very slow compared to other agents.

  • Each license scan is CPU intensive. This usually consumes 100% of one CPU.
  • Very large text segments can result in very large data allocations. bSAM can be a memory hog.
    • While individual tokens take 2 bytes each, they are arranged in an AxB matrix of 2-byte integers. Comparing 90 tokens against 100 tokens = 90x100x 2 bytes = 18,000 bytes. This may not seem like much, but some license sections can contain thousands of tokens. GPLv2, for example, contains 2,250 tokens. Comparing GPLv2 against GPLv2 consumes 10,125,000 bytes
    • The maximum token size for a license section is 65,500 bytes yielding a maximum matrix of 8,580,500,000 bytes (however, the largest allocated seen so far is only 500 Megs).
    • For speed, bsam does not free() memory (it only reallocs to make more room as needed). The free() is only performed when all comparisons between A and B are completed.
    • For speed, bSAM uses mmap() to load the files for comparison. This also consumes a significant amount of mapped memory when the files are large.
  • When files have many licenses, any network delay to the database becomes a considerable latency.

Filter clean

After bsam processes the license files, the tokenized bsam cache files are no longer needed. The filter_clean agent removes the unnecessary token files.

Implementation Details

For the filter_license agent:

  • The agent runs as a host-specific MSQ query. Although files are not used for read/write, this cuts down on NFS costs from delete requests.
  • The MSQ query generates a list of pfile IDs and pfile names (sha1.md5.len) to process.
  • If the pfile name exists in the license repository, then it is deleted.
  • The agent_lic_cache table is updated: the “agent_lic_cache.inrepository” column for the pfile ID is set to false.

Usage

The program filter_clean can perform many tasks:

  Usage: filter_clean [options] [projects]
  For each processed file, remove the cache.
  Options
  -i :: Initialize the DB, then exit.
  -L :: List project IDs.
  -s :: operate via the scheduler.  Stdin contains each record to process.
  -S :: operate via the scheduler.  Stdin contains each project ID.
  -v :: Verbose (-vv for more verbose)
  -T :: TEST -- do not update the DB or delete any files (just pretend)
  You can also list one or more project IDs for processing.
  Project ID of '-1' will process all projects.

When used as an agent, filter_clean can either take a specific record to process (-s), or a project ID (-S). If the input is a project ID, then every record associated with the project is cleaned.

The filter_clean program contains many debugging options:

  • Rather than using ”-s” (or ”-S”) to identify the project ID, you can use no parameters and list project IDs on the command-line. This will remove the cache files for these projects and update the agent_lic_cache table. The project ID ”-1” will remove all cache files.
  • The ”-L” option lists all project IDs. This is useful if you want to use -N and do not remember the project ID number. The output lists the project ID and description (or source if the description is blank). For example:
$ filter_clean -L | tail
6796810 :: pmccabe-2.1-2.i386.rpm
6796835 :: 15.238.5.206:jboss-4.0.2-src.tar.gz
6798336 :: 15.238.5.206:jadls158.zip
6864012 :: 15.238.5.206:hvdistrib.zip
6865408 :: http://devoss/~nealkrawetz/HPSIM-Linux_C.05.00.02.00.bin
6865409 :: stress the artifact part of the GUI
6865420 :: test continer/pfile reuse algos
6865428 :: 15.238.5.206:jakarta-tomcat-5.5.9-src.tar.gz
6868810 :: Firefox 1.0.7 test submitted by Neal
6870424 :: ?description
  • The ”-T” option is a safety test. It will perform all of the DB checks but will not delete files or update the database.

Engine-Shell

The engine-shell is a generic agent wrapper for command-line applications. Although spawning new applications for each database record is not efficient, it is frequently convenient. The engine-shell is designed for rapid prototyping and for long-term use by infrequently spawned agents.

For example, the unpack agent runs the command-line program “ununpack”. Since this is only called once per upload, there is no significant impact from spawning the ununpack program as needed. Thus, there is no need for an ununpack-specific agent; ununpack can be started by using engine-shell instead.

Developers should consider not using engine-shell if the application is spawned thousands of times in rapid succession since the spawn times will create large processing delays.

The engine-shell converts all database parameters to shell environment variables and handles communications with the scheduler.

Usage

Usage: /usr/local/fossology/agents/engine-shell agent_name command < args
The agent_name is a string assigned to the engine-shell. It can be used as a parameter to the command.

The command-line itself takes a series of parameters that are expanded each time the process is spawned:

%{%}  = percent sign
%{P}  = PID (process ID) of the engine-shell!
%{PP} = PPID (parent process ID) of the engine-shell!
%{U}  = Unique string assigned by the engine-shell!
%{A}  = Agent name assigned to the engine-shell.
%{1}  = the first arg from scheduler  (there is no %{0})
%{2}  = 2nd arg from scheduler
%{1000} = 1000th arg from the scheduler (no real limit)
%{*}  = all args from the scheduler

For example:

  agent=unpack host=localhost | /usr/local/fossology/agents/engine-shell unpack
     '/usr/local/fossology/agents/ununpack -d /srv/fossology/repository/ununpack/%{U} -qRCQx'

In this case, the directory for ununpack (-d) expands to ”/srv/fossology/repository/ununpack/12ab45” where 12ab45 is a unique string. The unique string is guaranteed to be unique among all currently running scheduler processes, however it could have been used by a previous process and may be used by a future process when this process completes. It is unique as long as this process is running.

The %{U}, %{P}, and %{PP} parameters are intended to be used by command-lines that require a unique identifier. For example, if a temporary file is needed, then these parameters can form the temporary file name.

Normally the scheduler sends “field=value” pairs to agents for processing. The engine-shell converts these to environment variables before spawning the command. For example, if the scheduler sends “a=12345 b=yes” then the environment variable ARG_a becomes “12345” and ARG_b becomes “yes”. The name of the variable comes from the field (“field=” becomes “ARG_field”) and the variable is assigned the value.

To make this very clear: if the scheduler needs to call engine-shell 500 times with different parameters, then there will be 500 iterations of (1) set the environment variables and (2) spawn the command line to process the request.

Return Codes

If the command returns a 0 return code, then engine-shell assumes the command succeeded. A non-zero return code indicates a command-failure.

The command-line should not generate any unnecessary output. All output it logged by the scheduler as debug statements.

Mimetype Agent

Every file has a mimetype. Some may be “text/plain” or “image/jpeg”, and others may be specific to programs like “application/pdf”. The mimetype agent is passed a file to analyze and sets the mime-type value in the pfile table. (It also populates the database “mimetype” table as needed.)

This agent creates the “official” mimetype used by the database.

Mimetypes are determined in many ways.

  1. The unpack agent sets mimetypes for containers. Since it could unpack the file, the mimetype must be accurate.
  2. If there is no mimetype set, then it uses magic(5) to identify the mimetype. Magic(5) is usually right, but not always right. In particular, a binary file may falsely match a known magic type.
  3. Magic(5) has two default values for when a file is not matched: plain/text for text files, and application/octet-stream for binary files. In these cases, the file extention (from the ufile database table) is compared with the known suffixes in /etc/mime.types. If there is a match, then the matched mimetype is used. This could also lead to errors. For example, if a text file ends with ”.c”, it will be labeled as “text/x-csrc” even if it contains C headers. (This is common in the kernel source code – many ”.c” files are used by #include statements.) Similarly, a text file that happens to end with ”.png” will be called “image/png” instead of “plain/text”.
  4. If magic(5) returned a default mime type and the file ends with ”.spec”, then it is assigned the mimetype “application/x-rpm-spec” (for use with the specagent system).
  5. If the file extension does not exist or is unknown, then the first 100 bytes are checked for non-printable characters. This results in a default value of “text/plain”, “application/octet-stream”, or “application/x-empty” for a zero-length file.

Because magic(5), /etc/mime.types, and even the ”.spec suffix” match may have false mime associations, the output from the mimetype agent is likely correct, but may contain some errors. Applications that process files based on the mimetype must be careful and check that the file is really the specified mimetype. Programs should not crash if the file contents do not match the mimetype.

PkgMetaGetta Agent

Many file formats, such as DEB and RPM, contain meta data. While the unpack agent extracts meta data to “artifact.meta” files, this is not always complete or useful for applications. The pkgmetagetta agent (pronounced “Package Meta Getta”) extracts meta information from files and stores the information as attributes in the database attrib table.

The extraction is performed using “libextractor”. This library identifies hundreds of different meta types from dozens of different file formats. The extracted information is extremely useful for license analysis. For example, an RPM may have a “license” meta header that says “GPL”, while the license analysis agent may identify GPLv2, MIT, OSL, and many other licenses within the package.

While libextractor is useful, there are some limitations:

  • Mimetype. The libextractor library uses different mimetype labels than those found in magic(5) and /etc/mime.types. For example, libextractor may find a file is “text/x-c”, magic(5) may say “text/plain”, and /etc/mime.types may identify “text/x-csrc”. As a result, the mimetype found by the mimetype agent is considered authoritative and linked to the database pfile entry, while the value found by the pkgmetagetta agent is associated to the pfile through an attribute.
  • Reuse. The libextractor library does not distinguish purpose. For example, the “Packager” attribute from an RPM file lists the user who did the packaging, while the “Packager” from some other file format may indicate the program that created the package.
  • Duplication. If a single file contains multiple instances of the same header with different values, then the attribute table will least each instance. However, there is no sense of ordering in the attribute table; applications must not assume that the first one returned is the right one to use.
  • User error. Some mimetypes appear to be assigned arbitrary values. For example, I have seen some RPM files that contain a “Copyright” attribute with a real copyright statement, while other RPM files simply say “GPL” for the copyright. Since “GPL” is a license and not a copyright, this is a meta header with a non-sense value.

Because of these limitations, applications that process based on meta data must be aware that not every meta header may exist for a file, multiple headers may exist, and the header's value may be inaccurate.

SpecAgent

RPM files come in two forms. There are the actual, packed RPM (file.rpm) files, and there is the source form, before the RPM file is packed. While pkgmetagetta can process meta information from a packed RPM file, it cannot extract information from an unpacked/source RPM directory. The specagent is designed for this situation.

Given a file of mimetype “application/x-rpm-src” (as determined by the mimetype agent), the specagent extracts the meta data and adds it as attributes to the associated pfile.

SQLAgent

The SQLAgent takes a single-line containing an SQL query and runs the SQL. The single line may be multiple SQL statements, separated by a ”;”.

Usage: sqlagent [options]
  -i        :: initialize the database, then exit.
  -a arg    :: Expect SQL in parameter 'arg='.
  no file   :: process data from the scheduler.

The ”-i” option initializes the database for use with the agent. (All agents have this option.)

The scheduler can run agents in two modes: any-host and MSQ. (See the scheduler documentation for details.) With the “any host” configuration, each line of input to the SQLAgent is an SQL comment to perform. No output is generated by the agent. However, if the SQL command fails, then the agent identifies the failure.

In the MSQ mode, use ”-a” to specify the SQL column that contains the SQL query. For example, “sqlagent -a go” can be used with the input line:

a="anything" b="ignored" go="select * from ..."

DelAgent

The DelAgent is used to delete uploads, folders, or license information from the database and repository.

Usage: delagent [options]
  List or delete uploads.
  Options
  -i   :: Initialize the DB, then exit.
  -u   :: List uploads IDs.
  -U # :: Delete upload ID.
  -l   :: List uploads IDs. (same as -u, but goes with -L)
  -L # :: Delete ALL licenses associated with upload ID.
  -f   :: List folder IDs.
  -F # :: Delete folder ID and all uploads under this folder.
          Folder '1' is the default folder.  '-F 1' will delete
          every upload and folder in the navigation tree.
  -s   :: Run from the scheduler.
  -T   :: TEST -- do not update the DB or delete any files (just pretend)
  -v   :: Verbose (-vv for more verbose)

The DelAgent can be used from the command-line or from the scheduler.

Command-Line Usage

The basic command-line uses lowercase options to list items that can be processed, and capitals to actually perform the processing. For example, to list the available folders, use ”-f”. This will generate output that matches the UI's left-hand tree. Each folder is enumerated. For example:

# Folders
     63 :: AUrls (Parent folder AUrls)
        65 :: a-c
           -- :: Contains: Bash
        -- :: Contains: ubuntu
        64 :: v-z
           -- :: Contains: WebSuck
        66 :: WebSuck
           67 :: v-z
              -- :: Contains: WebSuck
     75 :: baz (Parent folder baz)
        -- :: Contains: foo
     77 :: Bobg (BobG test folder)
        -- :: Contains: rats-2.1.tar.gz

Each number indicates a folder ID. The indentation matches the tree layout, and uploads contained in the folders are also displayed. Using '-F', you can delete a specific folder. This will remove the folder, all subfolders, and all uploads contained in the folders.

Similar to -f and -F, '-u” lists all available uploads with their IDs. These IDs can be used with -U to delete a specific upload.

The '-l' options actually works like -u, listing all uploads. However, -L (followed by an upload ID) will only delete license information associated with the upload. This is useful for resetting a license analysis.

The DelAgent is intelligent about file reuse. For example, -F and -U delete projects. However, files (technically pfiles) in those projects may also be used by other projects. When using -F and -U, pfile information (and associated license and repository data) are not deleted unless the pfile is no longer used by any uploads.

Finally, some notes:

  • There is a special folder number: folder “1” is the top of the folder tree. This folder cannot be deleted. Using “delagent -F 1” will recursively delete EVERY upload and folder, but will leave the top folder of the tree. (Only do this if you want to delete everything.)
  • If you use ”-T”, then no deletions occur. This is just used for testing the process.
  • While deleting entries from the database is relatively fast, removing files from the repository can be time-consuming (due to I/O delays). Also, the bigger the upload, the longer it will take. Be patient; don't expect an ISO to delete instantly.
  • If a folder or upload becomes detached from the tree for any reason, the ”-f” option will identify it under the heading “Unlinked folders” or “Unlinked uploads”.

Scheduler Usage

When using the ”-s” command-line option, the DelAgent will assume it is running from the scheduler. This is an “any host” agent since it primarily accesses the database. However, it requires write-access to the entire repository (for deleting unused repository entries).

The input from the scheduler requires three arguments: action, target, and id. The action can either be “LIST” (for debugging – similar to the lowercase command-line options), or “DELETE”. The target is either “UPLOAD”, “LICENSE”, or “FOLDER”. The ID specifies the item to delete (it is optional and ignored for the “LIST” action).

For example, to delete folder 3, the value of the jobqueue jq_args should be: DELETE FOLDER 3 This is equivalent to the command-line “delagent -F 3”.

 
agents.txt · Last modified: 2008/06/08 16:43 by danger

Copyright (C) 2007-2009 Hewlett-Packard Development Company, L.P.
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2
Recent changes RSS feed Valid XHTML 1.0 Valid CSS3 Driven by DokuWiki