Table of Contents

FOSSology Backup and Restore Scope

Implement and document backup and restore

Fully implement and document how to backup and restore a running fossology system including database, repository, and any necessary system configuration specific to fossology.

Will provide 2 solutions to backup and restore the repository:

  1. Backup and restore entire repository
  2. Backup and restore only gold files repository

Backup and restore entire repository solution

Provide user a instructions about backup and restore entire repository solution.

Backup solution

1. Stop the Scheduler before backup(and verify that all the agents have stopped)
2. Backup the postgresql database
3. backup entire repository data to a backup server using rsync, include gold, files, license directory
4. Start the Scheduler after finished all backup

Restore solution

1. Restore the postgresql database
2. Restore the entire repository
3. Restart Scheduler

Notes:

Suggest user if you have enough disk space, we recommend user to use this backup solution. And I suggest we also list approximately backup and restore time will cost in the instructions, in order to give user to tradeoff.

Backup and restore only Gold files solution

We also provide a a solution that involves the backup and restore only Gold files, and the database. This is a good solution if user's don’t have enough disk space or don’t want to backup entire repo.

Backup solution

1. Stop the Scheduler before backup(and verify that all the agents have stopped)
2. Backup the postgresql database
3. backup only repository gold and license directory
4. Start the Scheduler after finished all backup

Restore solution

1. Restore the postgresql database
2. Restore only repository gold and license directory and don't do unpack in restore process
3. Restart Scheduler
4. Give a user interface if user want to reunpack the gold files

Code changes to implement only backup gold files solution

Reduce the size of the repository

To select which unpacked files we should save and which unpacked files we should not save but now in the repository, removing the unpacked files should not save in repository, implement this in backup scope.

Backup and restore the necessary system configuration

Open question: Which configuration files need to be backed up?

Don’t consider backup the system configuration files, only adding Notes in the backup and restore procedures document. FOSSology needs to backup are Scheduler.conf, Host.conf, Db.conf, RepPath.conf

**Question**: The configuration file detail backup requirements, when to backup the configuration file (same time with database and repository backup)?

Answer bobg Aug-3-09: If these files are lost, the user should recover them from their normal system backup. If they are lost due to a system failure, they have bigger problems than restoring fossology.

Build any tools to support backup

Design, Review, Build, test, and document any tools, agents, or plugins that are necessary to enable the backup and restore process documented in #1

bobg Aug-3-09:

  1. stop scheduler
  2. pg_dumpall and save backup
  3. backup repository
  4. start scheduler
Does it need to me more complex than this?

Deploy backup strategy

Implement the proposed only backup and restore gold files strategy in the two running FOSSology production systems (external and internal systems)

Question: What’s production systems configuration and deployment, should further understand the infrastructure of production system?

I drew a picture of the external FOSSology Production system deployment, please review the diagram and add comments.

Test disaster recovery

Question: What’s the relationship between an agent and its storage? Is the storage in agent's local disk or network file system?

When loose the agent, how should the FOSSology cluster react?

Note: This brings up some very important questions:

  1. How do you do multi-system backups? It will be different from single system backups because each agent has unique data stored on its local storage that must be backed up.
  2. If an agent machine fails, how does the running FOSSology system respond? (i.e., notify the user? Try to work around the failure?)
  3. If you restore an agent machine that has failed, how can you verify that its local storage is consistent with the rest of the repository?

Backup and Restore multi-system repository

Mulit-system usual deploy method: Distributed agents and distributed repository

Old notes

The following is an old conversation about backing up an internal machine.

  1. Currently, there is a 4 day rotating archive of the db on rfo. should it be longer?
  2. Start “off-site” backups on rfo (similar to what is currently done with fossbazaar & fossology); the archives are stored locally on rfo. Advice is needed (from Matt?) on how to do proper backups.
  3. Define & test the recovery process.
    • database recovery (Mary)

Additional notes from 11/13 meeting:

  1. disaster recovery: The server is in the midst of a job (which jobs?) and the plug is pulled; can we recover?
  2. Flat out loose everything - the whole data center blows up, no more disks, no more nothing
  3. blow away just the database
  4. loose an agent and its storage. then what happens?

Notes on document procedure to backup the fossology metadata from 2008.05.6 IRC dicussion

<danger> BTW, do we have an easy way to let users back up their FOSSology database?

<danger> or is that documented anywhere?

<taggart> danger: no, I raised this issue 6+ months ago

<bobg> danger: that's in the postgres docs

<taggart> danger: since we need it ourselves, we're not doing backups of fossology yet

<danger> bobg: I know Postgres has a way to let you back things up, but it would probably be a good idea to summarize the fossology specifics

<danger> bobg: and somewhere down the road (a long ways) create a “Back up my FOSSology data” menu item

<danger> taggart: ack.

<danger> taggart: we now have a way to capture this :)

<danger> sorry for the delay

<taggart> I had proposed having fossology automatically dump state to the filesystem, so that a normal filesystem backup could grab those snapshots

<taggart> by that I mean the fossology postgresql db

<taggart> I think the repo and golden area, etc should be fine with a normal filesystem backup

Update 7/17 danger

Postgres backups are now in force on our repos. This is accomplished with a simple

pg_dumpall | gzip > backup_filename

put into a cron job on the repository system.

For gold files, in any large system there are a lot of gold files. it would be advantageous to only store the source URL for all gold files that have it set. Only backup the physical gold files for those that have no source URL.

Note this has not been implemented yet. it would require querying the database to find out where gold files came from.