https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864
Andriy Gapon <***@FreeBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org,
| |***@FreeBSD.org
Status|New |Open
--- Comment #13 from Andriy Gapon <***@FreeBSD.org> ---
I think that I've been able to reproduce this problem or, at least, something
that looks very much like it. I did the standard procstat debugging and I
noticed something that did not appear in any of the previous reports:
6 100572 zfskern txg_thread_enter mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5
zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308
dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef
dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84
fork_trampoline+0xe
1226 100746 zfs - mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf
dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0x88
devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200
closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb
1228 100579 zfs - mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157
zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171
amd64_syscall+0x2db Xfast_syscall+0xfb
Thread 100746 is it. zfsdev_close() holds spa_namespace_lock and then calls
dsl_sync_task() -> txg_wait_synced(). On the other hand the sync thread
(100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors().
My opinion is that the sync thread must never try to take spa_namespace_lock.
The problem seems to be introduced quite a while ago in base r219317. Some
later commits like base r272474 also followed the same pattern. The problem is
certainly FreeBSD-specific as illumos handles ZVOL names in a very different
manner.
Also, the problem is rather deep-rooted and at the moment I do not see any easy
way to fix without breaking ZVOL name tracking.
P.S.
A bit of information from ddb:
db> p spa_namespace_lock
ffffffff822b1ee0
db> show lock 0xffffffff822b1ee0
class: sx
name: spa_namespace_lock
state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs")
waiters: exclusive
db> thread 100746
[ thread pid 1226 tid 100746 ]
sched_switch+0x48a: movl %gs:0x34,%eax
db> bt
Tracing pid 1226 tid 100746 td 0xfffff8001da60500
sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590
mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0
sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600
sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630
_cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690
txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0
dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790
dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame
0xfffffe004def4910
dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/frame
0xfffffe004def4940
zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970
zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame
0xfffffe004def49b0
devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0
_fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00
closef() at closef+0x200/frame 0xfffffe004def4a90
closefp() at closefp+0xa3/frame 0xfffffe004def4ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8013f996a, rsp =
0x7fffffffd438, rbp = 0x7fffffffd450 ---
--
You are receiving this mail because:
You are the assignee for the bug.