Discussion:
[Bug 203864] ZFS deadlock between zfs send, zfs rename and ctrl-C
(too old to reply)
b***@freebsd.org
2015-10-19 07:25:03 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Garrett Cooper,425-314-3911 <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|freebsd-***@FreeBSD.org |freebsd-***@FreeBSD.org
CC| |***@FreeBSD.org

--- Comment #2 from Garrett Cooper,425-314-3911 <***@FreeBSD.org> ---
uname -a?

FWIW, I ran into ZFS deadlocks with r288943 when running tests/sys/acl . When I
upgrade to r289441, this went away.

I was running into apparent deadlocks with the spa namespace lock though, not
tx->tx_sync_done_cv.

mav's modifications to zfs send/recv might need to be fixed.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-19 07:29:37 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #3 from Peter Wemm <***@FreeBSD.org> ---
It's later than the revision you indicated:

FreeBSD halo.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r289488: Sun
Oct 18 06:47:35 UTC 2015
***@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-19 07:47:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Andriy Gapon <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #4 from Andriy Gapon <***@FreeBSD.org> ---
https://wiki.freebsd.org/AvgZfsDeadlockDebug
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-19 07:43:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Kubilay Kocak <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |dogfood, needs-qa
CC| |***@FreeBSD.org
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-02-21 12:45:21 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

***@qsp.nl changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@qsp.nl

--- Comment #5 from ***@qsp.nl ---
Created attachment 167247
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=167247&action=edit
procstat -kk -a output

output from procstat -kk -a at the moment of the deadlock
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-02-21 12:47:25 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #6 from ***@qsp.nl ---
We've run into this exact same issue using the zrep script. It's easily
reproducable. The deadlock occurs when a "zfs rename" is executed while there
is still a "zfs send" process running.

In zrep, this occurs when a zfs receive on the remote hosts fails and zrep
immediately attempts to rename a snapshot. However, the zfs send process is
still running (cleaning up?) and this causes an immediate deadlock.

Any zfs process run after that immediately blocks in state
"zpa_namespace_lock". IO continues to work, but fails at some point in the
future. I've not diagnosed the exact trigger that causes IO to seize.

The only resolution is to power cycle the system.

I've just attached the output to a procstat -kk -a from yesterday when I ran
into this.

Running on 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1 r294572
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-02-21 15:00:13 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Andriy Gapon <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #7 from Andriy Gapon <***@FreeBSD.org> ---
(In reply to roel from comment #6)
It would be interesting to run kgdb and examine spa_namespace_lock. Forcing a
crash dump for post-mortem examination is probably more useful because it would
be easier to answer any possible follow-up questions.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-06-30 23:56:43 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Cassiano Peixoto <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #8 from Cassiano Peixoto <***@gmail.com> ---
(In reply to roel from comment #6)
Hi guys,

Any news about this issue? I've same issue when running zrep script as roel
said. Roel, did you find any workaround for this?

Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-01 11:39:14 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #9 from ***@qsp.nl ---
Yes, I just introduced a short sleep in the script as a workaround.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-01 15:40:09 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #10 from Cassiano Peixoto <***@gmail.com> ---
(In reply to roel from comment #9)
Hi Roel,

Can you send me the patch to my email address? i'd like to add it. I think
would be a good idea to submit a patch to zrep port.

The worst thing with this issue is: when it happens i have to force server
reboot and when it comes back i lost all my data since the beginning of issue.
It's unbelievable, i know, but it's happening.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-01 16:07:40 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #11 from ***@qsp.nl ---
That's not that hard:

*** /home/roel/zrep Wed Jan 13 22:12:27 2016
--- zrep Sun Feb 21 13:24:35 2016
***************
*** 1160,1164 ****
if [[ $? -ne 0 ]] ; then
zfs rename ${newsnap} ${newsnap}_unsent
zrep_errquit Problem doing sync for $newsnap. renamed to
${newsnap}_unsent
fi
---
if [[ $? -ne 0 ]] ; then
+ echo "Error on send.. waiting 60 seconds to prevent deadlock"
+ sleep 60
zfs rename ${newsnap} ${newsnap}_unsent
zrep_errquit Problem doing sync for $newsnap. renamed to
${newsnap}_unsent
fi
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-04 12:46:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #12 from Cassiano Peixoto <***@gmail.com> ---
(In reply to roel from comment #11)
Hi Roel,

Thanks to send me. Do you know some way to simulate this issue? i tried to just
make my second server down and run zrep, but didn't work. Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-08 14:03:48 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Andriy Gapon <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org,
| |***@FreeBSD.org
Status|New |Open

--- Comment #13 from Andriy Gapon <***@FreeBSD.org> ---
I think that I've been able to reproduce this problem or, at least, something
that looks very much like it. I did the standard procstat debugging and I
noticed something that did not appear in any of the previous reports:

6 100572 zfskern txg_thread_enter mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5
zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308
dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef
dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84
fork_trampoline+0xe

1226 100746 zfs - mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf
dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0x88
devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200
closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb

1228 100579 zfs - mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157
zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171
amd64_syscall+0x2db Xfast_syscall+0xfb

Thread 100746 is it. zfsdev_close() holds spa_namespace_lock and then calls
dsl_sync_task() -> txg_wait_synced(). On the other hand the sync thread
(100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors().

My opinion is that the sync thread must never try to take spa_namespace_lock.
The problem seems to be introduced quite a while ago in base r219317. Some
later commits like base r272474 also followed the same pattern. The problem is
certainly FreeBSD-specific as illumos handles ZVOL names in a very different
manner.

Also, the problem is rather deep-rooted and at the moment I do not see any easy
way to fix without breaking ZVOL name tracking.

P.S.
A bit of information from ddb:
db> p spa_namespace_lock
ffffffff822b1ee0
db> show lock 0xffffffff822b1ee0
class: sx
name: spa_namespace_lock
state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs")
waiters: exclusive
db> thread 100746
[ thread pid 1226 tid 100746 ]
sched_switch+0x48a: movl %gs:0x34,%eax
db> bt
Tracing pid 1226 tid 100746 td 0xfffff8001da60500
sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590
mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0
sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600
sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630
_cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690
txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0
dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790
dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame
0xfffffe004def4910
dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/frame
0xfffffe004def4940
zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970
zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame
0xfffffe004def49b0
devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0
_fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00
closef() at closef+0x200/frame 0xfffffe004def4a90
closefp() at closefp+0xa3/frame 0xfffffe004def4ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8013f996a, rsp =
0x7fffffffd438, rbp = 0x7fffffffd450 ---
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-08 14:32:32 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #14 from Steven Hartland <***@FreeBSD.org> ---
You're conclusion seems valid there Andriy but as you say there doesn't seem to
be an easy way to fix it.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-09 05:58:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #15 from Andriy Gapon <***@FreeBSD.org> ---
(In reply to Steven Hartland from comment #14)
One possible approach is to move all the "minors" manipulations to ZFS ioctl
handlers, the only place where they were called originally, out of the sync
thread.
This sounds rather straightforward, but could turn out to be a bit laborious.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-09 06:02:44 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #16 from Andriy Gapon <***@FreeBSD.org> ---
(In reply to Steven Hartland from comment #14)
Another possibility is to apply all zvol changes asynchronously. Either by
queuing them or by having a dedicated thread scanning for them. The former
should be almost trivial to implement.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-09 11:08:52 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #17 from Andriy Gapon <***@FreeBSD.org> ---
Another observation is that the current code seems to be broken in a couple of
places as well.

1. dsl_dataset_rename_snapshot_sync_impl() uses ddrsa_fsname to construct the
old and new snapshot names, but this would be wrong in the case of a recursive
snpashot rename (zfs rename -r ***@snap1 ***@snap2) as ddrsa_fsname always points
to the name of the top level filesystem.

2. dsl_dataset_promote_sync(): as far as I can see, oldname is never populated,
so the code can do arbitrary renames.
--
You are receiving this mail because:
You are the assignee for the bug.
K. Macy
2016-07-10 04:51:50 UTC
Permalink
Post by b***@freebsd.org
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864
Another observation is that the current code seems to be broken in a
couple of
places as well.
1. dsl_dataset_rename_snapshot_sync_impl() uses ddrsa_fsname to construct
the
old and new snapshot names, but this would be wrong in the case of a
recursive
points
to the name of the top level filesystem.
2. dsl_dataset_promote_sync(): as far as I can see, oldname is never
populated,
so the code can do arbitrary renames.
Have you tried enabling opensolaris witness?
Post by b***@freebsd.org
--
You are the assignee for the bug.
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
<javascript:;>"
b***@freebsd.org
2016-07-10 17:10:58 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #18 from Andriy Gapon <***@FreeBSD.org> ---
And another deadlock of the same nature, but with an even more obvious LOR:

5087 100846 zfs - mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 rrw_enter_read_impl+0x9b
dsl_pool_hold+0x77 dmu_objset_own+0x31 zvol_create_minor+0x16f
zvol_create_minors+0x9d zfs_ioc_create+0x3df zfsdev_ioctl+0x65c
devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171 amd64_syscall+0x2db
Xfast_syscall+0xfb

6 100572 zfskern
txg_thread_enter mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43
_sx_xlock_hard+0x49d _sx_xlock+0xc5 zvol_rename_minors+0x104
dsl_dir_rename_sync+0x406 dsl_sync_task_sync+0xef dsl_pool_sync+0x45b
spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84 fork_trampoline+0xe


Thread 100846 took spa_namespace_lock in zvol_create_minor() and wants to get
dp_config_rwlock (as a reader) in dsl_pool_hold().
Thread 100572 took as dp_config_rwlock (as a writer) in dsl_sync_task_sync()
and wants to take spa_namespace_lock in zvol_rename_minors().
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-10 17:14:21 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #19 from Andriy Gapon <***@FreeBSD.org> ---
The deadlock from comment #13 can happen even if there are no zvols in a pool
at all, the deadlock from comment #18 happens when a zvol is created.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-11 11:44:15 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #20 from Andriy Gapon <***@FreeBSD.org> ---
I've coded the suggestion from comment #15 here
https://reviews.freebsd.org/D7179
The change should also fix bugs described in comment #17.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-14 07:34:02 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

Laurie Odgers <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #21 from Laurie Odgers <***@gmail.com> ---
Hi All
We are seeing the same problem using NPM 6.2.2. This has been a problem for
some time, but seems to randomly pop up. We have run for months without this
error happening, however today it seems to happen repeatedly and relatively
reproducable.
Procstat trace below.


# procstat -kk 12590
PID TID COMM TDNAME KSTACK
12590 101356 node - mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent_fp+0x399
kern_kevent+0x9f sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
12590 101358 node - mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125
do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f
Xfast_syscall+0xfb
12590 101359 node V8 WorkerThread mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125
do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f
Xfast_syscall+0xfb
12590 101360 node V8 WorkerThread mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125
do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f
Xfast_syscall+0xfb
12590 101361 node V8 WorkerThread mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125
do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f
Xfast_syscall+0xfb
12590 101362 node V8 WorkerThread mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125
do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f
Xfast_syscall+0xfb
12590 101363 node - mi_switch+0xe1 sleepq_wait+0x3a
sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab
_vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac
VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_renameat+0x1b7
amd64_syscall+0x40f Xfast_syscall+0xfb
12590 101364 node - mi_switch+0xe1 sleepq_wait+0x3a
sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c VOP_LOCK1_APV+0xab
_vn_lock+0x43 vputx+0x21f
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-14 07:39:32 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #22 from Andriy Gapon <***@FreeBSD.org> ---
(In reply to Laurie Odgers from comment #21)
I do not think that you are seeing the same problem. The data that you provided
is far too incomplete to tell, but it is more likely that you are running into
bug #209158. Or something else altogether.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2016-07-14 23:09:45 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864

--- Comment #23 from Laurie Odgers <***@gmail.com> ---
(In reply to Andriy Gapon from comment #22)

Yes apologies - I had both threads open and accidentally posted in this one.
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...