Opened 3 years ago

Closed 3 years ago

#35 closed defect (fixed)

Problem with the S3QL backups for Crin1

Reported by: chris Owned by: chris
Priority: major Milestone: Maintenance
Component: backups Version:
Keywords: Cc:
Estimated Number of Hours: 0 Add Hours to Ticket: 0
Billable?: yes Total Hours: 1.94

Description

Just tried mounting all the backup on Crin4 and there was a problem with the Crin1 backup:

umnt-s3ql crin1
/media/s3ql/crin1 is not on an S3QL file system
Can not check mounted file system.

mnt-s3ql crin1
Can not check mounted file system.
Filesystem check of s3c://s.qstack.advania.com:443/crin1 failed
Using 2 upload threads (memory limited).
Autodetected 65498 file descriptors available for cache entries
Using cached metadata.
File system damaged or not unmounted cleanly, run fsck!
Mounting s3c://s.qstack.advania.com:443/crin1 on /media/s3ql/crin1 failed

mnt-s3ql crin1 --force
Can not check mounted file system.
Filesystem check of s3c://s.qstack.advania.com:443/crin1 failed
Using 2 upload threads (memory limited).
Autodetected 65498 file descriptors available for cache entries
Using cached metadata.
File system damaged or not unmounted cleanly, run fsck!
Mounting s3c://s.qstack.advania.com:443/crin1 on /media/s3ql/crin1 failed

Change History (3)

comment:1 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0 to 0.94
  • Total Hours set to 0.94

Try to verify the filesystem:

s3ql_verify s3c://s.qstack.advania.com:443/crin1
Using cached metadata.
File system damaged or not unmounted cleanly, run fsck!

Try that:

fsck.s3ql s3c://s.qstack.advania.com:443/crin1
Can not check mounted file system.

But it is not mounted:

df -h
df: ‘/tmp/s3ql_backup_1236’: Transport endpoint is not connected
Filesystem                             Size  Used Avail Use% Mounted on
udev                                    10M     0   10M   0% /dev
tmpfs                                   99M   13M   87M  13% /run
/dev/dm-0                               15G  2.0G   13G  14% /
tmpfs                                  247M     0  247M   0% /dev/shm
tmpfs                                  5.0M     0  5.0M   0% /run/lock
tmpfs                                  247M     0  247M   0% /sys/fs/cgroup
/dev/sda1                              236M   56M  168M  25% /boot
crin1:/                                121G   25G   91G  21% /media/crin1
crin4:/                                 15G  3.1G   11G  22% /media/crin4
crin2:/                                121G   23G   92G  21% /media/crin2
s3c://s.qstack.advania.com:443/crin2/  1.0T   31G  994G   3% /media/s3ql/crin2
s3c://s.qstack.advania.com:443/crin4/  1.0T  696M  1.0T   1% /media/s3ql/crin4

However it was, so:

umount.s3ql /tmp/s3ql_backup_1236
File system appears to have crashed.

umount.s3ql --debug /tmp/s3ql_backup_1236
2015-09-07 11:11:09.375 5671 MainThread root.excepthook: File system appears to have crashed.

Looking at the email logs, this backup last run without an error on Thu, 13 Aug 2015 01:48:39 +0000, very sorry not to have noticed the problem sooner, this is the log entry from when the problem started:

sent 117 bytes  received 12 bytes  258.00 bytes/sec                                  
total size is 3,472  speedup is 26.91                                
Uncaught top-level exception:                                             
Traceback (most recent call last):                                      
  File "/usr/bin/umount.s3ql", line 9, in <module>                     
    load_entry_point('s3ql==2.13', 'console_scripts', 'umount.s3ql')() 
  File "/usr/lib/s3ql/s3ql/umount.py", line 184, in main              
    blocking_umount(options.mountpoint)                                                                                                 
  File "/usr/lib/s3ql/s3ql/umount.py", line 123, in blocking_umount                                                                     
    pid = pickle.loads(llfuse.getxattr(ctrlfile, 's3ql_pid?'))                                                                          
  File "fuse_api.pxi", line 206, in llfuse.capi.getxattr (src/llfuse/capi_linux.c:21046)                                                
ConnectionAbortedError: [Errno 103] Software caused connection abort: '/tmp/s3ql_backup_1236/.__s3ql__ctrl__'          

I'm not sure what to so I have emailed the s3ql list about this.

comment:2 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours changed from 0.94 to 1.19

On the list it was suggested to check the ~/.s3ql/mount.log and this is the corresponding entry:

2015-08-13 01:40:51.051 1246:Thread-3 root.excepthook: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/lib/s3ql/s3ql/mount.py", line 64, in run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 404, in _upload_loop
    self._do_upload(*tmp)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 431, in _do_upload
    % obj_id).get_obj_size()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 52, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 271, in perform_write
    return fn(fh)
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 547, in __exit__
    self.close()
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 541, in close
    self.fh.close()
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 706, in close
    self.fh.close()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 52, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 906, in close
    headers=self.headers, body=self.fh)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 505, in _do_request
    self._parse_error_response(resp)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 539, in _parse_error_response
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'), resp.headers)
s3ql.backends.s3c.S3Error: ServiceUnavailable: Please reduce your request rate.
2015-08-13 01:48:39.010 1246:MainThread s3ql.mount.unmount: Unmounting file system...

comment:3 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0 to 0.75
  • Resolution set to fixed
  • Status changed from new to closed
  • Total Hours changed from 1.19 to 1.94

Unmounting via fuse:

fusermount -u /tmp/s3ql_backup_1236

Filesystem check:

fsck.s3ql s3c://s.qstack.advania.com:443/crin1
Starting fsck of s3c://s.qstack.advania.com:443/crin1/
Using cached metadata.
Remote metadata is outdated.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 64000 objects so far..
Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 10.5 MiB of compressed metadata.
Cycling metadata backups...
Backing up old metadata...
Unexpected server reply: expected XML, got:
200 OK
x-amz-meta-006: 'data': b'XXXX
x-amz-meta-007: XXXX',
Content-Length: 0
x-amz-meta-005: 'signature': b'XXXX',
x-amz-meta-002: 'format_version': 2,
x-amz-meta-003: 'nonce': b'XXXX',
x-amz-meta-000: 'object_id': 's3ql_metadata_bak_10',
x-amz-meta-001: 'compression': 'None',
x-amz-id-2: XXXX
x-amz-meta-md5: XXXX
x-amz-meta-004: 'encryption': 'AES_v2',
Last-Modified: Wed, 09 Sep 2015 10:57:10 GMT
ETag: "XXX"
x-amz-request-id: XXX 
x-amz-meta-format: raw2
Content-Type: text/html; charset="UTF-8"
X-Trans-Id: XXX 
Date: Wed, 09 Sep 2015 10:57:10 +0000
Connection: keep-alive


Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/fsck.s3ql", line 9, in <module>
    load_entry_point('s3ql==2.13', 'console_scripts', 'fsck.s3ql')()
  File "/usr/lib/s3ql/s3ql/fsck.py", line 1309, in main
    cycle_metadata(backend)
  File "/usr/lib/s3ql/s3ql/metadata.py", line 121, in cycle_metadata
    cycle_fn("s3ql_metadata_bak_%d" % i, "s3ql_metadata_bak_%d" % (i + 1))
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 291, in copy
    self._copy_or_rename(src, dest, rename=False, metadata=metadata)
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 325, in _copy_or_rename
    self.backend.copy(src, dest, metadata=meta_raw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 52, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 404, in copy
    root = self._parse_xml_response(resp, body)
  File "/usr/lib/s3ql/s3ql/backends/s3c.py", line 554, in _parse_xml_response
    raise RuntimeError('Unexpected server response')
RuntimeError: Unexpected server response

Running it with the --backend-options="dumb-copy" however worked:

fsck.s3ql --backend-options="dumb-copy" s3c://s.qstack.advania.com:443/crin1
Starting fsck of s3c://s.qstack.advania.com:443/crin1/
Using cached metadata.
Remote metadata is outdated.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 65000 objects so far..
Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 10.5 MiB of compressed metadata.
Cycling metadata backups...
Backing up old metadata...
Cleaning up local metadata...
Completed fsck of s3c://s.qstack.advania.com:443/crin1/

So --backend-options="dumb-copy" has been added to the scripts and the backup job is running in screen and I have created a follow up ticket to address this issue generally ticket:39, have opened a S3QL ticket as advised, https://bitbucket.org/nikratio/s3ql/issues/161/serviceunavailable-please-reduce-your so closing this ticket.

Note: See TracTickets for help on using tickets.