Table of Contents

Interpreting the License Analysis Report

This section describes how to interpret the License Analysis Report. It includes a description of basic elements of license analysis, such as license hierarchy, license templates, license phrases, license terms, and canonical names.

For information about adding a license, refer to How to Add a License Template or License Phrase and Re-Analyze Licenses.

License Hierarchy Description

License analysis is performed by comparing an unknown file (that contains zero or more license sections) with a set of license templates. The comparison algorithm used by the license agent looks for groups of the similar words in a similar ordering. The algorithm does not mind if individual words are placed. For example, “This is the Gnu public license and you can share it” matches “This is the Neal public license and you can share it”. Changing a single word in the license usually does not change the meaning of the license.

Individual word changes (or small groups of words) are very common. (It appears that plagiarism does not apply to license text.) While some open source projects use “standard” licenses, such as GPL or BSD, other projects create their own licenses by merging in parts from different licenses. For example, a license may contain the three requirements found in the BSD license along with the warranty disclaimer from GPL and the distribution requirements from the MIT license. There are also many cases where a well-known license is simply renamed. One of the most common is the use of the LGPL license, where “GNU Lesser General Public License” is renamed after a company or project. Similarly, many projects take the GNU Library General Public License (GLGPL) and replace “library” with “program” or an application's name. None of this changes the license requirements or the template that it matches; it only changes the license name and the percentage of the match.

The worst-case scenarios happen when projects take a non-GPL license and simply replace the license name with “GPL”. The question becomes, did the author mean “GPL” or did they mean they wanted their own license rules? Fortunately, this is an issue for the lawyers to resolve. The license analyzer makes no legal interpretation about the semantic meaning of the license. It only matches text against license templates and identifies the percentage of the match.

Under the user interface, you can select a project and click on the license tab. This shows a histogram of the discovered licenses. Each type of license is listed as well as the number of files containing the license. For example:

Count License
1299 Apache Software License 2.0 reference
13 Intel-OSL
10 Phrase
6 Apache Software License 2.0
3 BSD UCRegents 2
2 RSA MD5
1 MIT (oldstyle)
1 Apache Software License 1.1

Each of the licenses has a distinct name and identifies a distinct license. However, “Phrase” is a catch-all category. License that are unknown by the analysis system are usually identified by common phrases, such as “is distributed under…”. Phrases that are potentially associated with licenses are listed the Phrase category.

You can click on each of the license types and see a list of files that contain the license.

License: Intel-OSL
1. 98% pcretest.c
2. 97% COPYING
3. 97% LICENCE
4. 97% pcre.hw
5. 97% pcre.in
6. 97% pcregrep.c
7. 95% internal.h
8. 95% ucptypetable.c
9. 94% pcre.c
10. 93% study.c
11. 91% maketables.c
12. 91% printint.c
13. 89% dftables.c

The files are ordered by the percentage of match. In the example, the file “COPYING” has a section of text that includes a 97% match with a section of the Intel-OSL license. By clicking on the file name, you can see the actual text of the file with the matching license text highlighted.

At the top of the file contents is an index table that lists the licenses in the file, a link to the instance (click on “view”), a link to the actual license (click on “ref”), and a color – each identified license is color coded. Items without a “ref” denote Phrases that are identified as possible license text.

The actual matched text within the document are highlighted to match the license key. Words that are not included in the match are not highlighted. In this example, the attribution of the license has been changed to say “University of Cambridge” and the owner's name has been replaced with “COPYRIGHT OWNER”. Outside of these specific changes, the license text matches the Intel-OSL license.

Templates

The license templates are categorized into families with similar text. It is important to note that “similar text” does not mean “similar purpose”. Your company may consider one type of license to be “bad” but a similar text license to be “good”. Any interpretation is left up to you; the groupings are strictly by similar text. For example, the generic Academic Free License (AFL) appears to have similar text to the Open Software License (OSL). Based on the word usage, OSL was probably derived from AFL (or vice versa). This creates the hierarchy “AFL/AFL/” and “AFL/OSL/”. Under AFL/AFL/ are different versions of the AFL license: 1.1, 1.2, 2.0, etc. Similarly, the BSD license comes in two flavors: the old and new BSL license. BSD/BSD.new/ contains the actual BSD “new” license (BSD/BSD.new/BSD_new) as well as derivatives such as the Apache, Cryptix, and SSLeay licenses. Each license is different, but they contain enough similar text to be derivatives from a central license.

The names for the license text attempt to describe the license. For example, GPL-based licenses contain “GPL” in the template name or family, and the Free Software Federation's license family is denoted by “FSF”. However, some licenses do not have formal names. For the templates, these have been named after the general purpose, such as “Free/Free Use No Change” and “Free/Beerware” (Beerware is a real license, but it is categorized under the general-purpose “Free” family.) To re-emphasize: the naming is relatively arbitrary and should not be interpreted as legal advice.

When there are multiple ways to present a license, the different variants are numbered. For example, “Corporate/Sun/Sun Microsystems variant 1” and “Corporate/Sun/Sun Microsystems variant 2” include different text that means the same thing.

Licenses are not always included in their entirety. Files frequently contain references to the license rather than the actual license. License references are included in the templates, such as “AFL/OSL/Open Software License 1.0 reference”. In some cases, there may be multiple common reference templates, so these are numbered such as “GPL/v2/GPLv2 reference 2” and “GPL/v2/GPLv2 reference 3”.

Besides references, licenses may include shortened versions that summarize the license. For example, “Adobe/Adobe short” is a variation of “Adobe/Adobe”. Similarly, licenses may contain sections such as supplements and appendices.

And finally, licenses must be at least 20 “tokens” long. A token being a words separated by space, punctuation, hyphens, underscores, etc. If you wish to add a license less than 20 tokens, define it as a phrase (discussed below).

License Phrases

When possible, text is matched against license templates. However, not every template matches every license. This can happen when a new license (not in the list of templates) is identified. Similarly, licenses may be single sentences, such as “This is free, enjoy.” Single sentences usually have legal meanings even if there is no formal license associated with the file.

The license analyzer (technically, the filter_license agent) identifies potential license phrases. Any sentence containing these phrases and not found in a license template is flagged as a potential license phrase:

As an aside, there is an intentional spelling error “licenced” in this mix because it appears in far too many files. (Nobody ever said that engineers could spell.)

While matches against license templates are very accurate (very few false-positives and very few false-negatives), license phrase matching is less accurate. For example, the sentence “mimic existing proprietary applications for instance” likely matches source code, while “license from proprietary software” is probably important for a legal interpretation. (And “Throw away proprietary and site licenses” could be from source code or be legally important.) Each of these examples comes from a real code analysis.

Due to the lower accuracy level for phrases, it is important to review each case.

License Terms and Canonical Names

Templates and phrases are good for identifying “license-ish” text. However, they are not ideal for assigning names to the identified license text. For example, there are instances of the MPL license having all instances of “MPL” replaced with “OSL”. Although most of the text matches the MPL template, the license itself is OSL.

Terms are used to identify the name for the identified license text. Each term is a common name for a license. For example, the Gnu General Public License is commonly called “GPL”, “Gnu GPL”, “Gnu PL”, “Gnu Public License”, and “General Public License”. Each of these are common aliases for the same license. The group of terms are used by the definition of a single canonical name for the group. In this example, the canonical name is “GPL”; if any of the GPL terms are seen in the license text, then the text is called “GPL”.

The basic test for identifying terms: Does the term, by itself and out of any context, describe a license? Based on this, “Sun” is not a license term (it is a company, it is a nearby star). However, “Sun Public License” is a license term, as is “SunPL”.

Canonical names can also be associated with templates. For example, if the license text is found in the “Academic Free License”, then it can be associated with the canonical name “AFL”.

Combining Templates, Phrases, Terms, and Canonical Names

Canonical names are used for displaying the names associated with identified license text. The appropriate canonical name is determined as follows:

Percent of Match

Licenses are matched based on a percentage of similar tokens. Tokens are simply words or punctuation. For example, consider a file that has a potential license section that contains 500 tokens. If 400 of the tokens matches a section of a license template that contains 2000 tokens, then it matches 400/500 tokens, or an 80% match. Since 20% of the text does not match, it could indicate a new license clause, alternate wording, or simply a replaced term.

When viewing the license under the UI, the matched tokens are highlighted. Any word (or character) not highlighted was not part of the match. The highlighting allows users to quickly determine what was changed. It could be as simple as spelling out “General Public License” instead of “GPL”, or it could be the inclusion of the word “not” (a small, but very critical word for legal interpretation).

A “100%” match indicates that the entire potential section matched something in the template, but does not necessarily mean that the entire template matched the section. For example, a license section may have a 100% match with BSD/BSD.new/BSD_new, but only match the warranty clause.

License Templates

License templates are arranged in directories that denote similar text. The organization is strictly based on text similarities and not semantics. Each template has a unique name – the user interface only displays the name and not the hierarchical path.

The current list of license templates are as follows:

Adaptive/Adaptive 1.0
Adaptive/Adaptive 1.0 Appendix A
Adobe/Adobe
Adobe/Adobe short
AFL/AFL/Academic Free License 1.1
AFL/AFL/Academic Free License 1.2
AFL/AFL/Academic Free License 2.0
AFL/AFL/Academic Free License 2.1
AFL/AFL/Academic Free License 3.0
AFL/OSL/Open Software License 1.0
AFL/OSL/Open Software License 1.0 reference
AFL/OSL/Open Software License 1.1
AFL/OSL/Open Software License 2.0
AFL/OSL/Open Software License 2.1
AFL/OSL/Open Software License 3.0
APSL/Apple Public Source License 1.0
APSL/Apple Public Source License 1.1
APSL/Apple Public Source License 1.2
APSL/Apple Public Source License 2.0
Artistic/Artistic 1.0
Artistic/Artistic 1.0 short
Artistic/Artistic 2.0
Artistic/Artistic 2.0beta4
BSD/BSD.new/Apache/Apache Software License 1.0
BSD/BSD.new/Apache/Apache Software License 1.1
BSD/BSD.new/Apache/Apache Software License 2.0
BSD/BSD.new/Apache/Apache Software License 2.0 reference
BSD/BSD.new/BSD new
BSD/BSD.new/BSD new short
BSD/BSD.new/Cryptix
BSD/BSD.new/Entessa Public License
BSD/BSD.new/Maia Mailguard License
BSD/BSD.new/Naumen Public License
BSD/BSD.new/OpenPBS
BSD/BSD.new/Phorum
BSD/BSD.new/PHP/PHP 3.0
BSD/BSD.new/SSLeay
BSD/BSD.new/Vovida Software License 1.0
BSD/BSD.new/Zend
BSD/BSD.old/Attribution Assurance License
BSD/BSD.old/BSD As-Is clause
BSD/BSD.old/BSD Harvard
BSD/BSD.old/BSD NRL
BSD/BSD.old/BSD old
BSD/BSD.old/BSD UCRegents
BSD/BSD.old/BSD UCRegents 2
BSD/BSD.old/BSD zlib
BSD/BSD.old/FreeBSD
BSD/BSD.old/Intel-OSL
BSD/BSD.old/OpenLDAP
BSD/BSD.old/OpenSSL
BSD/BSD.old/Sleepycat
BSD/BSD.old/Sleepycat short
BSD/BSD.old/Zope/Zope 1.0
BSD/BSD.old/Zope/Zope 2.0
CDDL/CDDL 1.0
Corporate/Apple/Apple Common Documentation License 1.0
Corporate/Apple/Apple Squeak
Corporate/CA/TOSL/Computer Associates Trusted Open Source License 1.1
Corporate/HP/Hewlett-Packard
Corporate/HP/HP-UX Java
Corporate/HP/HP-UX JRE
Corporate/IBM/IBM JRE
Corporate/IBM/IBM reciprocal
Corporate/Logica/Logica Open Source License Version 1.0
Corporate/Lucent/Lucent Public License 1.0
Corporate/Lucent/Lucent Public License 1.02
Corporate/Microsoft/Microsoft EULA
Corporate/Microsoft/Microsoft EULA 2003
Corporate/Microsoft/Microsoft EULA Software
Corporate/Motorola
Corporate/NCD/Network Computing Devices 1993
Corporate/NetComponents/NetComponents
Corporate/Nokia/Nokia Open Source License 1.0a
Corporate/Nvidia
Corporate/RSA/RSA MD5
Corporate/SGI/SGI CID 1.0
Corporate/SGI/SGI GPX 1.0
Corporate/Skype
Corporate/Sun/Bigelow&Holmes
Corporate/Sun/Sun Microsystems Binary Code License
Corporate/Sun/Sun Microsystems Binary Code License supplement
Corporate/Sun/Sun Microsystems Free with Copyright 1
Corporate/Sun/Sun Microsystems Free with Copyright 2
Corporate/Sun/Sun Microsystems Sun Public License
Corporate/Sun/Sun Microsystems variant 1
Corporate/Sun/Sun Microsystems variant 2
Corporate/Sun/Sun Solaris Source Code License Foundation Release
CPL/Common Public License 1.0
CPL/IBM/IBM_PL/IBM Public License 1.0
Creative_Commons/Creative Commons GPL
Creative_Commons/Creative Commons LGPL
Creative_Commons/Creative Commons Public Domain
Creative_Commons/Creative Commons Public License
Edu/CMU/Carnegie Mellon University 1998
Edu/CMU/Carnegie Mellon University 2000
Edu/CWI (Center for Mathematics and Computer Science, Netherlands)
Edu/Educational Community License
Edu/University of Utah Public License
Edu/Univ of Cambridge
Edu/Univ of Edinburgh
Edu/Univ of Notre Dame
Eiffel/Eiffel Forum License 1
Eiffel/Eiffel Forum License 2
FreeArtLicense/Free Art License 1.2
Free/Beerware
Free/Fair License
Free/Free clause
Free/Free clause variant 2
Free/Free clause variant 3
Free/Free use no change clause
Free/FreeWithCopyright/Free with copyright clause variant 1
Free/FreeWithCopyright/Free with copyright clause variant 10
Free/FreeWithCopyright/Free with copyright clause variant 3
Free/FreeWithCopyright/Free with copyright clause variant 4
Free/FreeWithCopyright/Free with copyright clause variant 5
Free/FreeWithCopyright/Free with copyright clause variant 8
Free/FreeWithCopyright/Free with copyright clause variant 9
Free/FreeWithCopyright/UC Regents free with copyright clause
Free/FreeWithCopyright/Unidex
Free/FreeWithCopyright/variant.11
Free/Free with files clause
FreeType/FreeType
FreeType/FreeType reference
Free/WTFPL
FSF/FSF
FSF/FSF variant 1
FSF/FSF variant 2
FSF/FSF variant 3
FSF/FSF variant 4
Gov/CeCILL-B_V1-en
Gov/CeCILL-B_V1-fr
Gov/CeCILL-C_V1-en
Gov/CeCILL-C_V1-fr
Gov/CeCILL_V1.1-US
Gov/CeCILL_V1-fr
Gov/CeCILL_V2-en
Gov/CeCILL_V2-fr
Gov/Government clause
Gov/MITRE Collaborative Virtual Workspace License
Gov/NASA Open Source 1.3
Gov/Starndard ML of New Jersey
GPL/Affero/Affero GPL
GPL/CopyLeft reference
GPL/Dual MPL GPL
GPL/Exception/GPL exception clause 1
GPL/Exception/GPL exception clause 2
GPL/GFDL/GNU Free Documentation License 1.1 reference 1
GPL/GFDL/GNU Free Documentation License 1.1 reference 2
GPL/GFDL/GNU Free Documentation License 1.2
GPL/GFDL/GNU Free Documentation License 1.2 reference
GPL/GPL for Computer Programs of the Public Administration
GPL/GPL from FSF reference
GPL/GPL reference
GPL/LGPL/LGPL 2.0
GPL/LGPL/LGPL 2.0 reference
GPL/LGPL/LGPL 2.0 with exceptions
GPL/LGPL/LGPL 2.1
GPL/LGPL/LGPL 2.1 reference
GPL/LGPL/LGPL 3.0
GPL/LGPL/LGPL gettext library variant
GPL/LGPL/LGPL GNU C Library variant
GPL/LGPL/LGPL wxWindows Library Licence 3.0 variant
GPL/v1/GPLv1
GPL/v1/GPLv1 reference
GPL/v2/eCos
GPL/v2/Free with copyright clause
GPL/v2/GPL from FSF reference 1
GPL/v2/GPL from FSF reference 2
GPL/v2/GPLv2
GPL/v2/GPLv2 Java Index Serialization Package variant
GPL/v2/GPLv2 reference
GPL/v2/GPLv2 reference 2
GPL/v2/GPLv2 reference 3
GPL/v2/GPLv2 reference 4
GPL/v2/McKornik Jr. Public License
GPL/v2/RealNetworks/RealNetworks Community Source Licensing
GPL/v2/RealNetworks/RealNetworks Public Source License 1.0
GPL/v2/RealNetworks/RealNetworks Public Source License 1.0 reference
GPL/v2/Sybase Open Watcom Public License 1.0
GPL/v3/GPLv3
GPL/v3/GPLv3 reference 1
GPL/v3/GPLv3 reference 2
GPL/W3C/World Wide Web Consortium 2001
GPL/W3C/World Wide Web Consortium 2002
Historical/Historical free with copyright clause
Historical/Historical Permission Notice and Disclaimer
ICU/ICU 1.8.1
ICU/ICU 1.8.1 variant
IETF/IETF
IETF/IETF variant
MiscOSS/Aladdin Free Public License
MiscOSS/Bitstream
MiscOSS/BitTorrent
MiscOSS/BitTorrent reference
MiscOSS/Catharon Open Source License
MiscOSS/C_Migemo License
MiscOSS/Condor
MiscOSS/Copy clause
MiscOSS/EU DataGrid Software License
MiscOSS/Frameworx Open License 1.0
MiscOSS/Giftware
MiscOSS/Glide
MiscOSS/gnuplot
MiscOSS/Hacktivismo Enhanced-Source Software License Agreement
MiscOSS/IJG
MiscOSS/iMatix
MiscOSS/Internet Software Consortium
MiscOSS/Jabber Open Source License 1.0
MiscOSS/Jahia Community Source License
MiscOSS/LaTeX Project Public License 1.3a
MiscOSS/mecab-ipadic
MiscOSS/Motosoto Open Source License
MiscOSS/MSNTP License
MiscOSS/Nethack General Public License
MiscOSS/OpenContent License
MiscOSS/Open Motif Public End User License
MiscOSS/Pine License
MiscOSS/qmail License
MiscOSS/Q Public License 1.0
MiscOSS/Ruby
MiscOSS/Scilab License
MiscOSS/TCL
MiscOSS/Vim
MiscOSS/zlib/InfoZip
MiscOSS/zlib/zLib
MIT/Imlib2
MIT/JasPer
MIT/MIT Bigelow&Holmes Luxi font variant
MIT/MIT CMU style
MIT/MIT Free with copyright clause
MIT/MIT HP-DEC variant
MIT/MIT MLton variant
MIT/MIT (modern)
MIT/MIT (modern) with sublicense
MIT/MIT New Jersey variant
MIT/MIT (oldstyle)
MIT/MIT (oldstyle) no ads clause
MIT/MIT (oldstyle) with disclaimer 1
MIT/MIT (oldstyle) with disclaimer 2
MIT/MIT (oldstyle) with disclaimer 3
MIT/MIT Unicode variant
MIT/NCSA
MIT/X11
MIT/X.Net License
MPL/CUA Office Public License 1.0
MPL/Dual MPL MIT
MPL/Interbase
MPL/MPL 1.0
MPL/MPL 1.1
MPL/MPL 1.1 reference
MPL/MPL contributor clause with dual license
MPL/Netizen Open Source License
MPL/NPL 1.1
MPL/NPL 1.1 reference
MPL/NPL contributor clause with dual license
MPL/Ricoh Source Code Public License
MPL/SISSL/SISSL 1.1
MPL/SISSL/SISSL 1.1 reference 1
MPL/SISSL/SISSL 1.1 reference 2
OCLC/OCLC Research Public License 2.0
OpenGroup/Open Group
OpenGroup/Open Group Test Suite License
OpenPublicationLicense/Open Publication License 1.0
OpenPublicationLicense/Open Publication License reference
Python/PSF/Python Software Foundation 2.1.1
Python/PSF/Python Software Foundation 2.2
Python/Python BeOpen
Python/Python CNRI
Python/Python CWI
Python/Python InfoSeek variant
RedHat/Red Hat EULA
RedHat/Red Hat reference