FOSSology Project Logo FOSSology
Advancing open source analysis and development
 

Useful SQL

Count number of files in an upload:

The ufile_mode condition keeps artifacts out of the results.

select count(*) from uploadtree 
  where upload_fk=$Ufile_pk 
        and (ufile_mode&x'3C000000'::int)=0;

Find files by name

SELECT * FROM uploadtree 
   INNER JOIN pfile ON pfile_fk = pfile_pk
    AND ufile_name LIKE 'name';

Find containers by name

SELECT * FROM uploadtree 
   INNER JOIN pfile ON pfile_fk = pfile_pk
    AND ((uploadtree.ufile_mode & (1<<29)) != 0)
    AND ufile_name LIKE 'name'

select uploadtree rec and ALL its parents

select UT2.* from uploadtree as UT1, uploadtree as UT2
  WHERE UT1.lft BETWEEN UT2.lft 
        and UT2.rgt 
        and UT1.upload_fk=UT2.upload_fk 
        and UT1.uploadtree_pk=$Uploadtree_pk; 

select uploadtree rec and ALL its children

select UT1.* from uploadtree as UT1, uploadtree as UT2 
  WHERE UT1.lft BETWEEN UT2.lft and UT2.rgt 
        and UT1.upload_fk=UT2.upload_fk 
        and UT2.uploadtree_pk=$Uploadtree_pk; 

It's ~3x faster (1.3 secs vs 3.8 secs on 1.7M rec iso) to do this in two steps and avoid the self join:

  SELECT lft,rgt,upload_fk FROM uploadtree 
    WHERE uploadtree_pk = $Uploadtree_pk;
  SELECT * from uploadtree 
    WHERE uplodtree.lft BETWEEN $Lft and $Rgt 
          and upload_fk=$Upload_fk;

Get unique pfiles in the subtree

  SELECT distinct(pfile_fk) from uploadtree 
     where uplodtree.lft BETWEEN $Lft and $Rgt 
           and upload_fk=$Upload_fk

Get unique pfiles in an upload

  SELECT distinct(pfile_fk) from uploadtree 
     where upload_fk=$Upload_pk

Find latest enabled agent by name

This REQUIRES knowing the agent name:

SELECT agent_pk from agent 
  where agent_name='license' 
  order by agent_rev desc limit 1;

Latest currently enabled agent that completed an analysis on this upload

This REQUIRES knowing the agent name:

  SELECT agent_pk from agent_runstatus, agent
    where agent_fk=agent_pk
    and agent_name='$Agent_name' 
    and ars_complete='true'
    and agent_enabled='true'
    order by agent_rev desc limit 1;

Get all licenses for an uploadtree subtree, and count (i.e. LICENSE HISTOGRAM)

needed by: LicenseGetAll() Input: uploadtree_pk Returns: list of license name, count This is usually used with the latest currently enabled agent that completed an analysis on this upload.

Find the subtree lft,rgt boundaries and upload_fk:

  SELECT lft,rgt,upload_fk FROM uploadtree 
    WHERE uploadtree_pk = $Uploadtree_pk;

Get the license(s) for every file (unique pfile) and license count for this subtree and produced by this agent:

  SELECT distinct(rf_shortname), count(rf_shortname) 
    from license_file_ref,
    (SELECT distinct(pfile_fk) as PF from uploadtree 
       where upload_fk=$Upload_fk 
             and uploadtree.lft BETWEEN $Lft and $Rgt) as SS
    where PF=pfile_fk and agent_fk=$Agent_fk 
     group by rf_shortname;

This took 13.5 secs on sirius looking at FC9 (~2M files). So enable the report cache with the 2 sec rule (cache any reports over 2 sec).

Find results of all license analyses for a single uploadfile_pk

Needed by: View License (http://fossology.org/~bobg/1.2reqs/viewlic.html) Input: uploadtree_pk Output: list of fl_pk's

Get pfile_pk from the uploadtree_pk:

SELECT pfile_fk from uploadtree where uploadtree_pk=$Uploadtree_pk;

Get all license recs for this pfile:

SELECT * from license_file where pfile_fk=$Pfile_fk

Best to use the faster (sub ms):

SELECT license_file.* from license_file, uploadtree 
  WHERE uploadtree.pfile_fk=license_file.pfile_fk
    and uploadtree_pk=$Uploadtree_pk

Find what licenses have changed between two license analyses

Input: upload_pk, agent_fk1, agent_fk2 Output: List of uploadfile's and their license_ref data

SELECT LR1.fl_pk, LR1.rf_pk, LR1.rf_shortname 
  from license_file_ref as LR1, license_file_ref as LR2,
  (SELECT distinct(pfile_fk) as PF from uploadtree 
     where upload_fk=$Upload_pk) as SS
  WHERE PF=LR1.pfile_fk 
    and PF = LR2.pfile_fk
    and (LR1.agent_fk=$Agent_fk1 or LR2.agent_fk=$Agent_fk2)
    and LR1.rf_shortname != LR2.rf_shortname 

This is slow on the Fedora 9 src iso (took 54 seconds) so enable the report cache with the 2 sec rule.

Populate license_file with data from agent_lic_meta (for testing)

INSERT into license_file (fl_start_byte, fl_end_byte, rf_fk, agent_fk, pfile_fk) 
   select tok_pfile_start, tok_pfile_end, lic_fk, 1, pfile_fk from agent_lic_meta;

Populate license_ref with data from agent_lic_raw (for testing)

Note, cannot import lic_text since it is null except for phrase.

INSERT into license_ref (rf_shortname, rf_fullname, rf_text)
   select lic_name, lic_name_full, 'not loaded yet' from agent_lic_raw;

Find all the pfiles in an upload that need to be reanalyzed (license analysis)

An upload submitted for license analysis may involve pfiles that have been previously analyzed. These pfiles do not need to be reanalyzed UNLESS the agent_pk has changed.

Input: agent_fk, upload_fk Output: pfile columns This (or a variant) would be used in jq_args for the license analyzers.

SELECT pfile_pk,
       pfile_sha1 || '.' || pfile_md5 || '.' || pfile_size AS pfilename,
       pfile_size as size
  from 
  (SELECT distinct(pfile_fk) as PF from uploadtree where upload_fk=$Upload_pk) as SS
  left outer join license_file on (PF=pfile_fk and agent_fk=$Agent_fk)
  inner join pfile on (PF=pfile_pk)
  where fl_pk is null

Count file types

List and count all the mimetypes of all the files in the repository(database).

SELECT distinct(pfile_mimetypefk), count(*), mimetype_name 
  from pfile, mimetype 
  where mimetype_pk=pfile_mimetypefk 
  group by pfile_mimetypefk, mimetype_name;

Find Uploads that contain a particular file

SELECT distinct(upload_fk), ufile_name, upload_filename 
  from uploadtree, upload 
  where ufile_name='$FileName' and upload_pk=upload_fk;

General notes

  1. Implement some of the sql above as plsql functions so that they are available in all languages?
  2. Remove code that uses pfile_liccount since this field is maintained but not used.
  3. Scheduler
    1. write agent_runstatus(agent_pk, upload_pk) Other fields will default
    2. If agent completes successfully
        update ars_complete, ars_ts
      else
        update ars_status with error (if possible)
  • 1.2 task notes

  •  
    1.2_notes.txt · Last modified: 2009/08/31 14:33 by bobg

    Copyright (C) 2007-2009 Hewlett-Packard Development Company, L.P.
    FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2
    Recent changes RSS feed Valid XHTML 1.0 Valid CSS3 Driven by DokuWiki