Age | Commit message (Collapse) | Author |
|
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4)
Change-Id: Iad69bade5840fea19186c8e4ecbf9b3e5f49013b
|
|
Based on ANDROID-31252388 code snippet.
The global variable fwu is accessible in the fwu_sysfs_image_size_store
and fwu_sysfs_store_image functions without any locks. This results in
a potential race condition leading to a heap overflow.
Add locks to prevent the potential race condition.
Depends-On: Ibdbaf5da1400a9a81e128ff79dec3edba654c86d
Change-Id: I0947ab423d90991afdae8ffba43ea4356606acb9
|
|
Plugging a Logitech DJ receiver with KASAN activated raises a bunch of
out-of-bound readings.
The fields are allocated up to MAX_USAGE, meaning that potentially, we do
not have enough fields to fit the incoming values.
Add checks and silence KASAN.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 50220dead1650609206efe91f0cc116132d59b3f)
Change-Id: I2fef9c29fb389631ff8822aa212d77ad22cf9568
|
|
In the fwu_sysfs_store_image function, there is no validation
of the count variable leading to a potential heap overflow.
Add additional bounds checks to prevent the potential heap overflow.
This commit combines snippets ANDROID-30799828 and
ANDROID-30937462
Change-Id: I14d60ce39ecc724a8fee1b8373da940d788b18cd
|
|
This is altered version of patch
43761473c254b45883a64441dd0bc85a42f3645c delivered by Google
in code snippet ANDROID-30956807_kernel3.10
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window
of opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit records(s).
In addition to fixing the double fetch, this patch improves on
the original code in a few other ways: better handling of large
arguments which require encoding, stricter record length checking,
and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked
on GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due
to the way the audit record is structured we really have no choice
unless we copy the entire argument at once (which would require
a rather wasteful allocation). The good news is that with this patch
the kernel no longer relies on this strnlen_user() value
for anything beyond recording it in the log, we also update it with
a trustworthy value whenever possible.
Change-Id: I22d505104ba44d4333a3800100dd42df26e3ef0b
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
snd_usb_add_audio_stream() call
The original commit is also squashed with commit mentioned below
create_fixed_stream_quirk(), snd_usb_parse_audio_interface() and
create_uaxx_quirk() functions allocate the audioformat object by themselves
and free it upon error before returning. However, once the object is linked
to a stream, it's freed again in snd_usb_audio_pcm_free(), thus it'll be
double-freed, eventually resulting in a memory corruption.
This patch fixes these failures in the error paths by unlinking the audioformat
object before freeing it.
Based on a patch by Takashi Iwai <tiwai@suse.de>
[Note for stable backports:
this patch requires the commit 902eb7fd1e4a ('ALSA: usb-audio: Minor
code cleanup in create_fixed_stream_quirk()')]
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1283358
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Cc: <stable@vger.kernel.org> # see the note above
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 836b34a935abc91e13e63053d0a83b24dfb5ea78)
Conflicts:
sound/usb/stream.c
ALSA: usb-audio: Minor code cleanup in create_fixed_stream_quirk()
Just a minor code cleanup: unify the error paths.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 902eb7fd1e4af3ac69b9b30f8373f118c92b9729)
Conflicts:
sound/usb/quirks.c
Change-Id: Ia9b3c72333eeeb6479bcfc464f007df6c12e1626
|
|
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 77da160530dd1dc94f6ae15a981f24e5f0021e84)
Change-Id: I7cecc6cdb05d7a6c2ec6b365a3b0e5e6c7b39d21
|
|
In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().
Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.
This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1420!
Call Trace:
[<ffffffff81281eab>] blk_put_request+0x5b/0x80
[<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
[<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
[<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
[<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
[<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
[<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
[<ffffffff81258967>] ? file_has_perm+0x97/0xb0
[<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
[<ffffffff81602afb>] tracesys+0xdd/0xe2
RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0
The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.
Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.
KASAN was extremely helpful in finding the root cause of this bug.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f3951a3709ff50990bf3e188c27d346792103432)
Change-Id: I0854f7e73bd45a6e96404f2b0b9c5c1914704d4c
|
|
If struct xc2028_config is passed without a firmware name,
the following trouble may happen:
[11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
[11009.907491] ==================================================================
[11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
[11009.907992] Read of size 1 by task modprobe/28992
[11009.907994] =============================================================================
[11009.907997] BUG kmalloc-16 (Tainted: G W ): kasan: bad access detected
[11009.907999] -----------------------------------------------------------------------------
[11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
[11009.908012] ___slab_alloc+0x581/0x5b0
[11009.908014] __slab_alloc+0x51/0x90
[11009.908017] __kmalloc+0x27b/0x350
[11009.908022] xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
[11009.908026] usb_hcd_submit_urb+0x1e8/0x1c60
[11009.908029] usb_submit_urb+0xb0e/0x1200
[11009.908032] usb_serial_generic_write_start+0xb6/0x4c0
[11009.908035] usb_serial_generic_write+0x92/0xc0
[11009.908039] usb_console_write+0x38a/0x560
[11009.908045] call_console_drivers.constprop.14+0x1ee/0x2c0
[11009.908051] console_unlock+0x40d/0x900
[11009.908056] vprintk_emit+0x4b4/0x830
[11009.908061] vprintk_default+0x1f/0x30
[11009.908064] printk+0x99/0xb5
[11009.908067] kasan_report_error+0x10a/0x550
[11009.908070] __asan_report_load1_noabort+0x43/0x50
[11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
[11009.908077] __slab_free+0x2ec/0x460
[11009.908080] kfree+0x266/0x280
[11009.908083] xc2028_set_config+0x90/0x630 [tuner_xc2028]
[11009.908086] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908090] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908094] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908098] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908101] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908105] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908108] do_one_initcall+0x141/0x300
[11009.908111] do_init_module+0x1d0/0x5ad
[11009.908114] load_module+0x6666/0x9ba0
[11009.908117] SyS_finit_module+0x108/0x130
[11009.908120] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x (null) flags=0x2ffff8000004080
[11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001
[11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00 ....*....(......
[11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff ...........j....
[11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G B W 4.5.0-rc1+ #43
[11009.908140] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
[11009.908142] ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
[11009.908148] ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
[11009.908153] ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
[11009.908158] Call Trace:
[11009.908162] [<ffffffff81932007>] dump_stack+0x4b/0x64
[11009.908165] [<ffffffff81556759>] print_trailer+0xf9/0x150
[11009.908168] [<ffffffff8155ccb4>] object_err+0x34/0x40
[11009.908171] [<ffffffff8155f260>] kasan_report_error+0x230/0x550
[11009.908175] [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908179] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908182] [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
[11009.908185] [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
[11009.908189] [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
[11009.908192] [<ffffffff8194cea6>] strcmp+0x96/0xb0
[11009.908196] [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
[11009.908200] [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908203] [<ffffffff8155ea78>] ? memset+0x28/0x30
[11009.908206] [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
[11009.908211] [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908215] [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
[11009.908219] [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
[11009.908222] [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
[11009.908226] [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
[11009.908230] [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
[11009.908233] [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
[11009.908238] [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908242] [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
[11009.908245] [<ffffffff8195222d>] ? string+0x14d/0x1f0
[11009.908249] [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
[11009.908253] [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
[11009.908257] [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908260] [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
[11009.908264] [<ffffffff812e9846>] ? __module_address+0xb6/0x360
[11009.908268] [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
[11009.908271] [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908275] [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
[11009.908278] [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
[11009.908282] [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908285] [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908289] [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908292] [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
[11009.908296] [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908299] [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
[11009.908302] [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
[11009.908306] [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
[11009.908309] [<ffffffff8159e708>] ? put_object+0x48/0x70
[11009.908314] [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908317] [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908320] [<ffffffffa0150000>] ? 0xffffffffa0150000
[11009.908324] [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908327] [<ffffffff810021b1>] do_one_initcall+0x141/0x300
[11009.908330] [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
[11009.908333] [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908337] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908340] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908343] [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908346] [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
[11009.908350] [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
[11009.908353] [<ffffffff812f2626>] load_module+0x6666/0x9ba0
[11009.908356] [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
[11009.908361] [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
[11009.908366] [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
[11009.908369] [<ffffffff815bc940>] ? open_exec+0x50/0x50
[11009.908374] [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
[11009.908377] [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
[11009.908379] [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
[11009.908383] [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
[11009.908394] [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908396] Memory state around the buggy address:
[11009.908398] ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908401] ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
[11009.908405] ^
[11009.908407] ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908409] ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908411] ==================================================================
In order to avoid it, let's set the cached value of the firmware
name to NULL after freeing it. While here, return an error if
the memory allocation fails.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
(cherry picked from commit 8dfbcc4351a0b6d2f2d77f367552f48ffefafe18)
Change-Id: I60ed306e70c0afd4d75b35e89332f3cfe3b70a6d
|
|
The unix_dgram_sendmsg routine use the following test
if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.
Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a5527dda344fff0514b7989ef7a755729769daa1)
Change-Id: I5a5ff28d66816982d9422778449fa84d85cb29a5
|
|
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask;
s/faultin_page/__get_user_page]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 9691eac5593ff1e2f82391ad327f21d90322aec1)
Change-Id: I4c449c7d5b07c158a3cd95622a13c30a7c9518ea
|
|
There's a race on CPU unplug where we free the swevent hash array
while it can still have events on. This will result in a
use-after-free which is BAD.
Simply do not free the hash array on unplug. This leaves the thing
around and no use-after-free takes place.
When the last swevent dies, we do a for_each_possible_cpu() iteration
anyway to clean these up, at which time we'll free it, so no leakage
will occur.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 12ca6ad2e3a896256f086497a7c7406a547ee373)
Conflicts:
kernel/events/core.c
Change-Id: Iebdba2ccf214b37c1851d881fce10fac3c07e0cf
|
|
Commit changed to affect fs/ioprio.c instead of fs/ioprio.c
get_task_ioprio() accesses the task->io_context without holding the task
lock and thus can race with exit_io_context(), leading to a
use-after-free. The reproducer below hits this within a few seconds on
my 4-core QEMU VM:
int main(int argc, char **argv)
{
pid_t pid, child;
long nproc, i;
/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
syscall(SYS_ioprio_set, 1, 0, 0x6000);
nproc = sysconf(_SC_NPROCESSORS_ONLN);
for (i = 0; i < nproc; i++) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
_exit(0);
} else {
child = wait(NULL);
assert(child == pid);
}
}
}
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
}
}
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
return 0;
}
This gets us KASAN dumps like this:
[ 35.526914] ==================================================================
[ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c
[ 35.530009] Read of size 2 by task ioprio-gpf/363
[ 35.530009] =============================================================================
[ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
[ 35.530009] -----------------------------------------------------------------------------
[ 35.530009] Disabling lock debugging due to kernel taint
[ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
[ 35.530009] ___slab_alloc+0x55d/0x5a0
[ 35.530009] __slab_alloc.isra.20+0x2b/0x40
[ 35.530009] kmem_cache_alloc_node+0x84/0x200
[ 35.530009] create_task_io_context+0x2b/0x370
[ 35.530009] get_task_io_context+0x92/0xb0
[ 35.530009] copy_process.part.8+0x5029/0x5660
[ 35.530009] _do_fork+0x155/0x7e0
[ 35.530009] SyS_clone+0x19/0x20
[ 35.530009] do_syscall_64+0x195/0x3a0
[ 35.530009] return_from_SYSCALL_64+0x0/0x6a
[ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
[ 35.530009] __slab_free+0x27b/0x3d0
[ 35.530009] kmem_cache_free+0x1fb/0x220
[ 35.530009] put_io_context+0xe7/0x120
[ 35.530009] put_io_context_active+0x238/0x380
[ 35.530009] exit_io_context+0x66/0x80
[ 35.530009] do_exit+0x158e/0x2b90
[ 35.530009] do_group_exit+0xe5/0x2b0
[ 35.530009] SyS_exit_group+0x1d/0x20
[ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
[ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
[ 35.530009] ==================================================================
Fix it by grabbing the task lock while we poke at the io_context.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
block/ioprio.c
Change-Id: Ia7848e1a757c5824b8aee73801521db30d3a917d
|
|
The format specifier %p can leak kernel addresses
while not valuing the kptr_restrict system settings.
Use %pK instead of %p, which also evaluates whether
kptr_restrict is set.
Change-Id: I8854ebe4416de11b59a322bb3006e023dc618d23
|
|
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8148a73c9901a8794a50f950083c00ccf97d43b3)
Change-Id: I59361ee8b6294499d9caa0f9556b4f9b7ba6954b
|
|
Line discipline drivers may mistakenly misuse ldisc-related fields
when initializing. For example, a failure to initialize tty->receive_room
in the N_GIGASET_M101 line discipline was recently found and fixed [1].
Now, the N_X25 line discipline has been discovered accessing the previous
line discipline's already-freed private data [2].
Harden the ldisc interface against misuse by initializing revelant
tty fields before instancing the new line discipline.
[1]
commit fd98e9419d8d622a4de91f76b306af6aa627aa9c
Author: Tilman Schmidt <tilman@imap.cc>
Date: Tue Jul 14 00:37:13 2015 +0200
isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
[2] Report from Sasha Levin <sasha.levin@oracle.com>
[ 634.336761] ==================================================================
[ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
[ 634.339558] Read of size 4 by task syzkaller_execu/8981
[ 634.340359] =============================================================================
[ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
...
[ 634.405018] Call Trace:
[ 634.405277] dump_stack (lib/dump_stack.c:52)
[ 634.405775] print_trailer (mm/slub.c:655)
[ 634.406361] object_err (mm/slub.c:662)
[ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
[ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
[ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
[ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
[ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
[ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
[ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dd42bf1197144ede075a9d4793123f7689e164bc)
Conflicts:
drivers/tty/tty_ldisc.c
Change-Id: I69dc16a203e0c3cec273b176384e0e573a19fedf
|
|
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758)
Conflicts:
net/ipv4/tcp_input.c
Depends-on: I7c8deab8bdf216c6220ccc8af7c6aa25ce2ffea3
Depends-on: I7a2c219e6f40b5c59872d2de24590f822a37081a
Depends-on: I23887a1a8e71010aa82d9ce4517fdbde5e76faec
Change-Id: Ibdbaf5da1400a9a81e128ff79dec3edba654c86d
|
|
Many functions have open coded a function that returns a random
number in range [0,N-1]. Under the assumption that we have a PRNG
such as taus113 with being well distributed in [0, ~0U] space,
we can implement such a function as uword t = (n*m')>>32, where
m' is a random number obtained from PRNG, n the right open interval
border and t our resulting random number, with n,m',t in u32 universe.
Lets go with Joe and simply call it prandom_u32_max(), although
technically we have an right open interval endpoint, but that we
have documented. Other users can further be migrated to the new
prandom_u32_max() function later on; for now, we need to make sure
to migrate reciprocal_divide() users for the reciprocal_divide()
follow-up fixup since their function signatures are going to change.
Joint work with Hannes Frederic Sowa.
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f337db64af059c9a94278a8b0ab97d87259ff62f)
Conflicts:
net/packet/af_packet.c
Change-Id: I23887a1a8e71010aa82d9ce4517fdbde5e76faec
|
|
Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
(cherry picked from commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c)
Depends-on: I7c8deab8bdf216c6220ccc8af7c6aa25ce2ffea3
Change-Id: I7a2c219e6f40b5c59872d2de24590f822a37081a
|
|
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
scalar types as suggested by Linus Torvalds. Accesses larger than
the machines word size cannot be guaranteed to be atomic. These
macros will use memcpy and emit a build warning.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 230fa253df6352af12ad0a16128760b5cb3f92df)
Change-Id: I7c8deab8bdf216c6220ccc8af7c6aa25ce2ffea3
|
|
This fixes CVE-2016-0758.
In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
it isn't validated against the remaining amount of data before being added
to the cursor. With a sufficiently large size indicated, the check:
datalen - dp < 2
may then fail due to integer overflow.
Fix this by checking the length indicated against the amount of remaining
data in both places a definite length is determined.
Whilst we're at it, make the following changes:
(1) Check the maximum size of extended length does not exceed the capacity
of the variable it's being stored in (len) rather than the type that
variable is assumed to be (size_t).
(2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
integer 0.
(3) To reduce confusion, move the initialisation of len outside of:
for (len = 0; n > 0; n--) {
since it doesn't have anything to do with the loop counter n.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Peter Jones <pjones@redhat.com>
Change-Id: I19059db89efc2e1ec4ee34a1b1a277c3859d8082
|
|
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34b88a68f26a75e4fded796f1a49c40f82234b7d)
Change-Id: I8708e282656277b49b08a43f59a15d71a9477704
|
|
There was a possible stack overflow vulnerability in
the synaptics_rmi4_i2c_write function because the stack array
size is from user space.
Allocate heap memory for the temporary buffer instead of stack
memory to prevent the stack overflow vulnerability.
Based on code snippet for A-30141991
Change-Id: Iacaeef562b9401d278a53fc10ad61f858ac36766
|
|
The usage of weak references instead of strong references
in Binder can potentially lead to a use-after-free vulnerability
in system_server.
Disallow weak references in cases where strong references are needed.
Based on code snippet for ANDROID-30445380
Change-Id: Ifea2e55167b8f6131ca4a601d558558f79919eba
|
|
The perf core implicitly rejects events spanning multiple HW PMUs, as in
these cases the event->ctx will differ. However this validation is
performed after pmu::event_init() is called in perf_init_event(), and
thus pmu::event_init() may be called with a group leader from a
different HW PMU.
The ARM64 PMU driver does not take this fact into account, and when
validating groups assumes that it can call to_arm_pmu(event->pmu) for
any HW event. When the event in question is from another HW PMU this is
wrong, and results in dereferencing garbage.
This patch updates the ARM64 PMU driver to first test for and reject
events from other PMUs, moving the to_arm_pmu and related logic after
this test. Fixes a crash triggered by perf_fuzzer on Linux-4.0-rc2, with
a CCI PMU present:
Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
CPU: 0 PID: 1371 Comm: perf_fuzzer Not tainted 3.19.0+ #249
Hardware name: V2F-1XV7 Cortex-A53x2 SMM (DT)
task: ffffffc07c73a280 ti: ffffffc07b0a0000 task.ti: ffffffc07b0a0000
PC is at 0x0
LR is at validate_event+0x90/0xa8
pc : [<0000000000000000>] lr : [<ffffffc000090228>] pstate: 00000145
sp : ffffffc07b0a3ba0
[< (null)>] (null)
[<ffffffc0000907d8>] armpmu_event_init+0x174/0x3cc
[<ffffffc00015d870>] perf_try_init_event+0x34/0x70
[<ffffffc000164094>] perf_init_event+0xe0/0x10c
[<ffffffc000164348>] perf_event_alloc+0x288/0x358
[<ffffffc000164c5c>] SyS_perf_event_open+0x464/0x98c
Code: bad PC value
Also cleans up the code to use the arm_pmu only when we know
that we are dealing with an arm pmu event.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Ziljstra (Intel) <peterz@infradead.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 8fff105e13041e49b82f92eef034f363a6b1c071)
Change-Id: If41a827b9c47018578a40db75f8bbda6bf255a8b
|
|
Based on ANDROID-27532522 code snippet
Change-Id: I4deb603687f5866ad8dad47cc9caa31a4f2bd1fe
|
|
In the rfcomm_sock_bind function, the addr struct
copied back to user space contains uninitialized
fields potentially leading to an information leak.
Initialize all fields of the addr struct to prevent
the potential information leak.
Change-Id: I282eba00659f436868151171157e67c7d5f56ae4
|
|
The interaction between the kernel /dev/binder and the usermode
Parcel.cpp means that when a Binder object is passed
as BINDER_TYPE_BINDER or BINDER_TYPE_WEAK_BINDER, a pointer
to that object (in the server process) is leaked to the client
process as the cookie value. This leads to a leak of a heap
address in many of the privileged Binder services, including
system_server.
Zero out the Binder pointer and cookie before sending it to
the client process.
Change-Id: I743be45d68dc2cd2dc1d78564a2e6fad5bf79eca
|
|
The format specifier %p can leak kernel addresses
while not valuing the kptr_restrict system settings.
Use %pK instead of %p, which also evaluates whether
kptr_restrict is set.
Based on code snippet ANDROID-30143283
Change-Id: I5f5ea1cee831366d8ae8dd18985a86997cec4d4e
|
|
There was no validation of the codec variable
passed to the snd_soc_read and snd_soc_write functions.
Add a check for null function pointers in the dummy
sound driver.
Commit based on ANDROID-28838221 snippet.
Change-Id: I6dba002e0123ad36795b5f2537f81a3e9c020b0a
|
|
Prevent possible stack overflow vlurnerability
by moving temporary buffer from stack to heap.
The buffer size originates from user space which
means that any buffer size could be requested
which could lead to an attack using stack overflow.
Change-Id: I10cec5a48c518656eb72481817900c2ddcc5c1a0
|
|
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/unix/af_unix.c
Change-Id: Ie0d0baf67a5b1325b2413b7e7e2911a826fa19bf
|
|
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Change-Id: Ie4048466ab74c996c4285c53c9fc41857d28e090
Signed-off-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Let channels hold a reference on their network namespace.
Some channel types, like ppp_async and ppp_synctty, can have their
userspace controller running in a different namespace. Therefore they
can't rely on them to preclude their netns from being removed from
under them.
==================================================================
BUG: KASAN: use-after-free in ppp_unregister_channel+0x372/0x3a0 at
addr ffff880064e217e0
Read of size 8 by task syz-executor/11581
=============================================================================
BUG net_namespace (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in copy_net_ns+0x6b/0x1a0 age=92569 cpu=3 pid=6906
[< none >] ___slab_alloc+0x4c7/0x500 kernel/mm/slub.c:2440
[< none >] __slab_alloc+0x4c/0x90 kernel/mm/slub.c:2469
[< inline >] slab_alloc_node kernel/mm/slub.c:2532
[< inline >] slab_alloc kernel/mm/slub.c:2574
[< none >] kmem_cache_alloc+0x23a/0x2b0 kernel/mm/slub.c:2579
[< inline >] kmem_cache_zalloc kernel/include/linux/slab.h:597
[< inline >] net_alloc kernel/net/core/net_namespace.c:325
[< none >] copy_net_ns+0x6b/0x1a0 kernel/net/core/net_namespace.c:360
[< none >] create_new_namespaces+0x2f6/0x610 kernel/kernel/nsproxy.c:95
[< none >] copy_namespaces+0x297/0x320 kernel/kernel/nsproxy.c:150
[< none >] copy_process.part.35+0x1bf4/0x5760 kernel/kernel/fork.c:1451
[< inline >] copy_process kernel/kernel/fork.c:1274
[< none >] _do_fork+0x1bc/0xcb0 kernel/kernel/fork.c:1723
[< inline >] SYSC_clone kernel/kernel/fork.c:1832
[< none >] SyS_clone+0x37/0x50 kernel/kernel/fork.c:1826
[< none >] entry_SYSCALL_64_fastpath+0x16/0x7a kernel/arch/x86/entry/entry_64.S:185
INFO: Freed in net_drop_ns+0x67/0x80 age=575 cpu=2 pid=2631
[< none >] __slab_free+0x1fc/0x320 kernel/mm/slub.c:2650
[< inline >] slab_free kernel/mm/slub.c:2805
[< none >] kmem_cache_free+0x2a0/0x330 kernel/mm/slub.c:2814
[< inline >] net_free kernel/net/core/net_namespace.c:341
[< none >] net_drop_ns+0x67/0x80 kernel/net/core/net_namespace.c:348
[< none >] cleanup_net+0x4e5/0x600 kernel/net/core/net_namespace.c:448
[< none >] process_one_work+0x794/0x1440 kernel/kernel/workqueue.c:2036
[< none >] worker_thread+0xdb/0xfc0 kernel/kernel/workqueue.c:2170
[< none >] kthread+0x23f/0x2d0 kernel/drivers/block/aoe/aoecmd.c:1303
[< none >] ret_from_fork+0x3f/0x70 kernel/arch/x86/entry/entry_64.S:468
INFO: Slab 0xffffea0001938800 objects=3 used=0 fp=0xffff880064e20000
flags=0x5fffc0000004080
INFO: Object 0xffff880064e20000 @offset=0 fp=0xffff880064e24200
CPU: 1 PID: 11581 Comm: syz-executor Tainted: G B 4.4.0+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
00000000ffffffff ffff8800662c7790 ffffffff8292049d ffff88003e36a300
ffff880064e20000 ffff880064e20000 ffff8800662c77c0 ffffffff816f2054
ffff88003e36a300 ffffea0001938800 ffff880064e20000 0000000000000000
Call Trace:
[< inline >] __dump_stack kernel/lib/dump_stack.c:15
[<ffffffff8292049d>] dump_stack+0x6f/0xa2 kernel/lib/dump_stack.c:50
[<ffffffff816f2054>] print_trailer+0xf4/0x150 kernel/mm/slub.c:654
[<ffffffff816f875f>] object_err+0x2f/0x40 kernel/mm/slub.c:661
[< inline >] print_address_description kernel/mm/kasan/report.c:138
[<ffffffff816fb0c5>] kasan_report_error+0x215/0x530 kernel/mm/kasan/report.c:236
[< inline >] kasan_report kernel/mm/kasan/report.c:259
[<ffffffff816fb4de>] __asan_report_load8_noabort+0x3e/0x40 kernel/mm/kasan/report.c:280
[< inline >] ? ppp_pernet kernel/include/linux/compiler.h:218
[<ffffffff83ad71b2>] ? ppp_unregister_channel+0x372/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392
[< inline >] ppp_pernet kernel/include/linux/compiler.h:218
[<ffffffff83ad71b2>] ppp_unregister_channel+0x372/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392
[< inline >] ? ppp_pernet kernel/drivers/net/ppp/ppp_generic.c:293
[<ffffffff83ad6f26>] ? ppp_unregister_channel+0xe6/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392
[<ffffffff83ae18f3>] ppp_asynctty_close+0xa3/0x130 kernel/drivers/net/ppp/ppp_async.c:241
[<ffffffff83ae1850>] ? async_lcp_peek+0x5b0/0x5b0 kernel/drivers/net/ppp/ppp_async.c:1000
[<ffffffff82c33239>] tty_ldisc_close.isra.1+0x99/0xe0 kernel/drivers/tty/tty_ldisc.c:478
[<ffffffff82c332c0>] tty_ldisc_kill+0x40/0x170 kernel/drivers/tty/tty_ldisc.c:744
[<ffffffff82c34943>] tty_ldisc_release+0x1b3/0x260 kernel/drivers/tty/tty_ldisc.c:772
[<ffffffff82c1ef21>] tty_release+0xac1/0x13e0 kernel/drivers/tty/tty_io.c:1901
[<ffffffff82c1e460>] ? release_tty+0x320/0x320 kernel/drivers/tty/tty_io.c:1688
[<ffffffff8174de36>] __fput+0x236/0x780 kernel/fs/file_table.c:208
[<ffffffff8174e405>] ____fput+0x15/0x20 kernel/fs/file_table.c:244
[<ffffffff813595ab>] task_work_run+0x16b/0x200 kernel/kernel/task_work.c:115
[< inline >] exit_task_work kernel/include/linux/task_work.h:21
[<ffffffff81307105>] do_exit+0x8b5/0x2c60 kernel/kernel/exit.c:750
[<ffffffff813fdd20>] ? debug_check_no_locks_freed+0x290/0x290 kernel/kernel/locking/lockdep.c:4123
[<ffffffff81306850>] ? mm_update_next_owner+0x6f0/0x6f0 kernel/kernel/exit.c:357
[<ffffffff813215e6>] ? __dequeue_signal+0x136/0x470 kernel/kernel/signal.c:550
[<ffffffff8132067b>] ? recalc_sigpending_tsk+0x13b/0x180 kernel/kernel/signal.c:145
[<ffffffff81309628>] do_group_exit+0x108/0x330 kernel/kernel/exit.c:880
[<ffffffff8132b9d4>] get_signal+0x5e4/0x14f0 kernel/kernel/signal.c:2307
[< inline >] ? kretprobe_table_lock kernel/kernel/kprobes.c:1113
[<ffffffff8151d355>] ? kprobe_flush_task+0xb5/0x450 kernel/kernel/kprobes.c:1158
[<ffffffff8115f7d3>] do_signal+0x83/0x1c90 kernel/arch/x86/kernel/signal.c:712
[<ffffffff8151d2a0>] ? recycle_rp_inst+0x310/0x310 kernel/include/linux/list.h:655
[<ffffffff8115f750>] ? setup_sigcontext+0x780/0x780 kernel/arch/x86/kernel/signal.c:165
[<ffffffff81380864>] ? finish_task_switch+0x424/0x5f0 kernel/kernel/sched/core.c:2692
[< inline >] ? finish_lock_switch kernel/kernel/sched/sched.h:1099
[<ffffffff81380560>] ? finish_task_switch+0x120/0x5f0 kernel/kernel/sched/core.c:2678
[< inline >] ? context_switch kernel/kernel/sched/core.c:2807
[<ffffffff85d794e9>] ? __schedule+0x919/0x1bd0 kernel/kernel/sched/core.c:3283
[<ffffffff81003901>] exit_to_usermode_loop+0xf1/0x1a0 kernel/arch/x86/entry/common.c:247
[< inline >] prepare_exit_to_usermode kernel/arch/x86/entry/common.c:282
[<ffffffff810062ef>] syscall_return_slowpath+0x19f/0x210 kernel/arch/x86/entry/common.c:344
[<ffffffff85d88022>] int_ret_from_sys_call+0x25/0x9f kernel/arch/x86/entry/entry_64.S:281
Memory state around the buggy address:
ffff880064e21680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880064e21700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880064e21780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880064e21800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880064e21880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Change-Id: Ieed6ccbb1ba91acf99f3ce151909153937245b31
Fixes: 273ec51dd7ce ("net: ppp_generic - introduce net-namespace functionality v2")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Change-Id: I4a2b6f3364720d607155415e84c1978b400305cf
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems. Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.
Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.
To limit the kernel stack usage we must limit the depth of the
filesystem stack. Initially the limit is set to 2.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
conflicts:
fs/ecryptfs/main.c
include/linux/fs.h
Change-Id: I83a60e4dfec13b2c5a367200c89de0f5b33e426a
|
|
usbnet_link_change will call schedule_work and should be
avoided if bind is failing. Otherwise we will end up with
scheduled work referring to a netdev which has gone away.
Instead of making the call conditional, we can just defer
it to usbnet_probe, using the driver_info flag made for
this purpose.
Fixes: 8a34b0ae8778 ("usbnet: cdc_ncm: apply usbnet_link_change")
Reported-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/cdc_ncm.c
Change-Id: Ibf4a6b5069073b6e97931fc7ad52980833545433
|
|
If the ASN.1 decoder is asked to parse a sequence of objects, non-optional
matches get skipped if there's no more data to be had rather than a
data-overrun error being reported.
This is due to the code segment that decides whether to skip optional
matches (ie. matches that could get ignored because an element is marked
OPTIONAL in the grammar) due to a lack of data also skips non-optional
elements if the data pointer has reached the end of the buffer.
This can be tested with the data decoder for the new RSA akcipher algorithm
that takes three non-optional integers. Currently, it skips the last
integer if there is insufficient data.
Without the fix, #defining DEBUG in asn1_decoder.c will show something
like:
next_op: pc=0/13 dp=0/270 C=0 J=0
- match? 30 30 00
- TAG: 30 266 CONS
next_op: pc=2/13 dp=4/270 C=1 J=0
- match? 02 02 00
- TAG: 02 257
- LEAF: 257
next_op: pc=5/13 dp=265/270 C=1 J=0
- match? 02 02 00
- TAG: 02 3
- LEAF: 3
next_op: pc=8/13 dp=270/270 C=1 J=0
next_op: pc=11/13 dp=270/270 C=1 J=0
- end cons t=4 dp=270 l=270/270
The next_op line for pc=8/13 should be followed by a match line.
This is not exploitable for X.509 certificates by means of shortening the
message and fixing up the ASN.1 CONS tags because:
(1) The relevant records being built up are cleared before use.
(2) If the message is shortened sufficiently to remove the public key, the
ASN.1 parse of the RSA key will fail quickly due to a lack of data.
(3) Extracted signature data is either turned into MPIs (which cope with a
0 length) or is simpler integers specifying algoritms and suchlike
(which can validly be 0); and
(4) The AKID and SKID extensions are optional and their removal is handled
without risking passing a NULL to asymmetric_key_generate_id().
(5) If the certificate is truncated sufficiently to remove the subject,
issuer or serialNumber then the ASN.1 decoder will fail with a 'Cons
stack underflow' return.
This is not exploitable for PKCS#7 messages by means of removal of elements
from such a message from the tail end of a sequence:
(1) Any shortened X.509 certs embedded in the PKCS#7 message are survivable
as detailed above.
(2) The message digest content isn't used if it shows a NULL pointer,
similarly, the authattrs aren't used if that shows a NULL pointer.
(3) A missing signature results in a NULL MPI - which the MPI routines deal
with.
(4) If data is NULL, it is expected that the message has detached content and
that is handled appropriately.
(5) If the serialNumber is excised, the unconditional action associated
with it will pick up the containing SEQUENCE instead, so no NULL
pointer will be seen here.
If both the issuer and the serialNumber are excised, the ASN.1 decode
will fail with an 'Unexpected tag' return.
In either case, there's no way to get to asymmetric_key_generate_id()
with a NULL pointer.
(6) Other fields are decoded to simple integers. Shortening the message
to omit an algorithm ID field will cause checks on this to fail early
in the verification process.
This can also be tested by snipping objects off of the end of the ASN.1 stream
such that mandatory tags are removed - or even from the end of internal
SEQUENCEs. If any mandatory tag is missing, the error EBADMSG *should* be
produced. Without this patch ERANGE or ENOPKG might be produced or the parse
may apparently succeed, perhaps with ENOKEY or EKEYREJECTED being produced
later, depending on what gets snipped.
Just snipping off the final BIT_STRING or OCTET_STRING from either sample
should be a start since both are mandatory and neither will cause an EBADMSG
without the patches
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
Conflicts:
lib/asn1_decoder.c
Change-Id: I8d67111416c196bfc0f74c6ab9102cfbfcaab6f6
|
|
Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g. via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).
Just panic directly.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
kernel/sched/core.c
Change-Id: Ia3f6826c60c4f5a7e5fcb21ae44dc6aefb1d9e0c
|
|
We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/ip_tables.c
net/ipv6/netfilter/ip6_tables.c
Change-Id: I19d7aa8db59cf84da27671a12f22f718134b12e2
|
|
Otherwise this function may read data beyond the ruleset blob.
Change-Id: Ieff619233e66932f4a1f174584876e6579e429b5
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Prevent race conditions by disabling falloc
FALLOC_FL_PUNCH_HOLE. The race conditions allowed
local users to cause denial of service (disc corruption).
Change-Id: I74779ce7c9301704e8657cd9ea799d5d2d8b6a9c
|
|
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Change-Id: I1be5315a659125ea2861f21e4b3a1e519f4c0dff
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.
Change-Id: Id9d190576cffeccc7d9be1425ce8a49538e8fdfc
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.
Change-Id: I80be90c505f48b8f3f98f2613a11865cd14ca150
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Check bounds of batch_params.nchan variable in
wls_parse_batching_cmd function to prevent kernel memory
corruption when the variable is later used in
dhd_dev_pno_set_for_batch.
Change-Id: I74485ae23b7a96ad02a59723255bb4d16aa38302
|
|
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.
This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).
This would cause either a panic, or corrupt memory.
Fixes CVE-2014-9529.
Change-Id: I9d31b879f4d5f570951d20a90046d6ab2210e3ec
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
There are two issues with the current implementation for replacing user
controls. The first is that the code does not check if the control is actually a
user control and neither does it check if the control is owned by the process
that tries to remove it. That allows userspace applications to remove arbitrary
controls, which can cause a user after free if a for example a driver does not
expect a control to be removed from under its feed.
The second issue is that on one hand when a control is replaced the
user_ctl_count limit is not checked and on the other hand the user_ctl_count is
increased (even though the number of user controls does not change). This allows
userspace, once the user_ctl_count limit as been reached, to repeatedly replace
a control until user_ctl_count overflows. Once that happens new controls can be
added effectively bypassing the user_ctl_count limit.
Both issues can be fixed by instead of open-coding the removal of the control
that is to be replaced to use snd_ctl_remove_user_ctl(). This function does
proper permission checks as well as decrements user_ctl_count after the control
has been removed.
Note that by using snd_ctl_remove_user_ctl() the check which returns -EBUSY at
beginning of the function if the control already exists is removed. This is not
a problem though since the check is quite useless, because the lock that is
protecting the control list is released between the check and before adding the
new control to the list, which means that it is possible that a different
control with the same settings is added to the list after the check. Luckily
there is another check that is done while holding the lock in snd_ctl_add(), so
we'll rely on that to make sure that the same control is not added twice.
Change-Id: I38df4387f672d012f2c10d83adf9b7b04cf948f8
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()").
Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU. Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.
Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.
This issue was discovered by Marcelo Leitner.
Change-Id: I37150d0cddd5a48fa495f49a9a81b2bb9850cc89
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ben Hawkes says:
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.
However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.
However, an unconditional rule must also not be using any matches
(no -m args).
The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.
Unify this so that all the callers have same idea of 'unconditional rule'.
Change-Id: Id633ad546f5d874ad3afd6745e12952f731ba35e
Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|