Opened 18 months ago

Last modified 17 months ago

#104 new defect

Problem with CRIN1 and CRIN2 backups

Reported by: chris Owned by: chris
Priority: critical Milestone: Maintenance
Component: backups Version:
Keywords: Cc:
Estimated Number of Hours: 0 Add Hours to Ticket: 0
Billable?: yes Total Hours: 1.55

Description

Errors from the s3ql backup scripts:

Backend reports that file system is still mounted elsewhere. Either
the file system has not been unmounted cleanly or the data has not yet
propagated through the backend. In the later case, waiting for a while
should fix the problem, in the former case you should try to run fsck
on the computer where the file system has been mounted most recently.
You may also continue and use whatever metadata is available in the
backend. However, in that case YOU MAY LOOSE ALL DATA THAT HAS BEEN
UPLOADED OR MODIFIED SINCE THE LAST SUCCESSFULL METADATA UPLOAD.
Moreover, files and directories that you have deleted since then MAY
REAPPEAR WITH SOME OF THEIR CONTENT LOST.

Change History (8)

comment:1 Changed 18 months ago by chris

  • Summary changed from Problem with CRIN1 and CEIN2 backups to Problem with CRIN1 and CRIN2 backups

comment:2 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours set to 0.25

Manually running the filesystem check without --batch:

fsck.s3ql --backend-options=dumb-copy s3c://s.qstack.advania.com:443/crin1

Starting fsck of s3c://s.qstack.advania.com:443/crin1/
Ignoring locally cached metadata (outdated).
Backend reports that file system is still mounted elsewhere. Either
the file system has not been unmounted cleanly or the data has not yet
propagated through the backend. In the later case, waiting for a while
should fix the problem, in the former case you should try to run fsck
on the computer where the file system has been mounted most recently.
You may also continue and use whatever metadata is available in the
backend. However, in that case YOU MAY LOOSE ALL DATA THAT HAS BEEN
UPLOADED OR MODIFIED SINCE THE LAST SUCCESSFULL METADATA UPLOAD.
Moreover, files and directories that you have deleted since then MAY
REAPPEAR WITH SOME OF THEIR CONTENT LOST.
Enter "continue, I know what I am doing" to use the outdated data anyway:
continue, I know what I am doing
WARNING: Found outdated cache directory (/root/.s3ql/s3c:=2F=2Fs.qstack.advania.com:443=2Fcrin1=2F-cache), renaming to .bak0
WARNING: You should delete this directory once you are sure that everything is in order.
Downloading and decompressing metadata...
Reading metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..

Going to leave it running and come back to this later...

comment:3 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.15
  • Total Hours changed from 0.25 to 0.4
..ext_attributes..
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 934000 objects so far..WARNING: Deleted spurious object 1164082
WARNING: Deleted spurious object 1164083
...
WARNING: Deleted spurious object 1164609

Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 165 MiB of compressed metadata.
Cycling metadata backups...
Backing up old metadata...
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 3)...
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 4)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 5)...
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 3)...
...

comment:4 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.15
  • Total Hours changed from 0.4 to 0.55

This sis ongoing...

WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 22)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 23)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 24)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 25)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 26)...
WARNING: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend.copy (attempt 27)...

I have stopped it, run screen and started it again so it can be left running for longer.

comment:5 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours changed from 0.55 to 0.8

The Crin1 backup is OK now, but this problem with the Crin2 one is going to have to be raised on the s3ql list:

echo "continue, I know what I am doing" | fsck.s3ql --backend-options=dumb-copy --force --debug s3c://s.qstack.advania.com:443/crin2
Starting fsck of s3c://s.qstack.advania.com:443/crin2/
Ignoring locally cached metadata (outdated).
Backend reports that file system is still mounted elsewhere. Either
the file system has not been unmounted cleanly or the data has not yet
propagated through the backend. In the later case, waiting for a while
should fix the problem, in the former case you should try to run fsck
on the computer where the file system has been mounted most recently.
You may also continue and use whatever metadata is available in the
backend. However, in that case YOU MAY LOOSE ALL DATA THAT HAS BEEN
UPLOADED OR MODIFIED SINCE THE LAST SUCCESSFULL METADATA UPLOAD.
Moreover, files and directories that you have deleted since then MAY
REAPPEAR WITH SOME OF THEIR CONTENT LOST.
Enter "continue, I know what I am doing" to use the outdated data anyway:
Downloading and decompressing metadata...
WARNING: Object closed prematurely, can't check MD5, and have to reset connection
> ERROR: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/fsck.s3ql", line 11, in <module>
    load_entry_point('s3ql==2.21', 'console_scripts', 'fsck.s3ql')()
  File "/usr/lib/s3ql/s3ql/fsck.py", line 1257, in main
    db = download_metadata(backend, cachepath + '.db')
  File "/usr/lib/s3ql/s3ql/metadata.py", line 300, in download_metadata
    backend.perform_read(do_read, name)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 108, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 319, in perform_read
    res = fn(fh)
  File "/usr/lib/s3ql/s3ql/metadata.py", line 297, in do_read
    stream_read_bz2(fh, tmpfh)
  File "/usr/lib/s3ql/s3ql/metadata.py", line 285, in stream_read_bz2
    buf = decompressor.decompress(buf)
OSError: Invalid data stream
Last edited 18 months ago by chris (previous) (diff)

comment:6 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours changed from 0.8 to 1.05

comment:7 Changed 18 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours changed from 1.05 to 1.3

On the list I was asked to create an issue for this, so I have done.

Last edited 18 months ago by chris (previous) (diff)

comment:8 Changed 17 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours changed from 1.3 to 1.55

Looking at how much data there is to backup, on Crin1:

7.6M    ./etc
696M    ./home
1.5G    ./root

And in /var:

8.7G    ./backups
965M    ./www

And on Crin2:

60M     ./root
7.9M    ./etc
399M    ./home

And in var:

21G     ./www
8.4M    ./backups

So perhaps 35G or so would be needed.

Note: See TracTickets for help on using tickets.