Bug of a month: Cache upsurge problem crashes Samsung phone apps

It’s not been a good summer for Samsung. It packaged a Galaxy Note 7 smartphones with detonating batteries, sparking a tellurian recall.

And a whizzy Exynos 8890 processor, that powers a Note 7 and a Galaxy S7 and S7 Edge, is tripping adult apps with seemingly weird crashes – from nothing pointers to bootleg instruction exceptions, all triggering randomly. It’s an emanate that has stumped engineers for months.

And now they’ve finally cracked a case.

Apps built by Mono, a module growth toolkit, were crashing indiscriminatingly on Samsung’s latest Android handsets with bootleg instruction errors notwithstanding a formula being good. The Gamecube-Wii emulator Dolphin, and a PSP emulator PPSSPP, were also descending over on a phones.

The faults concerned formula combined usually in time (JIT) on demand: Mono uses a JIT compiler to renovate an app’s unstable bytecode to local ARM instructions on handheld devices. Dolphin and PPSSPP do identical to run a game’s PowerPC or MIPS executable on a underlying CPU. Any module with self-modifying formula was during risk of bombing out on a Exynos 8890, it seemed.

On a ARM architecture, due to a separate instruction and information caches, JIT engines have to transparent a processor’s instruction cache to safeguard that any creatively generated instructions are installed and run.

Mono’s engineers beheld that, when flushing 128-byte blocks from a I-cache, usually 64 bytes were being cleared, permitting a processor core to run seared and incompatible formula and pile-up a regulating application.

The Exynos 8890 system-on-chip has 8 cores: 4 Cortex-A53s designed by ARM, and four Samsung-designed M1 cores. These are organised in ARM’s big.LITTLE style: 4 brawny cores – a M1s – for when a lot of estimate energy is quickly needed, and 4 lighter cores – a A53s – for normal work. Threads in apps pierce from core to core, large or LITTLE, depending on a volume of work that needs to be done.

The A53 has a 64-byte instruction cache line width, meaning, a cache is burning and transposed in 64-byte blocks. The M1, on a other hand, has a 128-byte instruction cache line. This is problematic.

Ooops outlines a mark … Samsung’s slip to chip designers final month

Apps built regulating GCC – as is a box with Mono, during slightest – use a duty that looks like a following pseudocode to flush a core’s instruction cache:

void __clear_cache (char *address, size_t size)
{
        immobile int cache_line_size = 0;
        if (!cache_line_size)
                cache_line_size = get_current_cpu_cache_line_size ();

        for (int i = 0; i  size; i += cache_line_size)
                flush_cache_line (address + i);
}

The initial time __clear_cache is used by an application, it reads a CPU core’s cache line breadth proceed from a processor and stores it in cache_line_size. Then when flushing a cache, it loops by a memory it has to clear, revelation a processor to dump a instruction cache, one cache line during a time.

So if a app starts on an A53, it’ll design to transparent a instruction cache in 64-byte blocks, and loop in 64-byte increments. If it starts on an M1, it’ll use 128-byte blocks.

Now, if an app that was regulating on an M1 is changed to an A53, it will design to transparent out a instruction cache in 128-byte blocks. In reality, a smaller core will usually transparent out a initial 64 bytes, and __clear_cache will skip a rest to a subsequent 128-byte block. That leaves seared formula in a cache, that will upset and pile-up a program.

Mono, Dolphin and PPSSPP have patched their formula to try to equivocate a problem.

Software built by a LLVM compiler and Google’s V8 JavaScript engine does not humour as badly as GCC’s generated formula since it requests a CPU’s I-cache breadth immediately before any increment. Mono does something identical now: it tries to work out a smallest instruction cache breadth in a device and usually uses that.

Unfortunately, it’s wily to totally solve from userspace since attractive a I-cache breadth from a CPU and flushing a subsequent line is not atomic: a thread could be rescheduled onto a core with a opposite cache line breadth during a loop in a proceed that would means memory to be skipped. Mono’s proceed is during slightest volatile to this.

One correct proceed out is to patch a handling complement kernel so that reading a CPU’s I-cache line breadth always earnings a smallest distance for a hardware, so withdrawal no byte behind when swabbing out passed instructions.

On a one hand, this isn’t particularly Samsung’s fault. The technical anxiety primer for a Cortex-A15, an early big.LITTLE core, notes:

The Cortex-A15 processor L1 caches enclose 64-byte lines. Other processors, however, can underline caches that support cache line lengths opposite than those of a Cortex-A15 processor.

That implies that module engineers should be wakeful that a cache line breadth can change opposite a device. However, a Cortex-A53, as used in Samsung’s Exynos 8890, creates no discuss of other cores carrying opposite widths in a tech manual. It’s gathering in a ARM system-on-chip universe to keep a I-cache lines a same widths within a package to equivocate all of a above headaches: ARM positively does in a Cortex-A designs (well, OK, it didn’t with a A7 and A15, though there after.)

So, one could interpretation that Samsung should have famous improved – or during slightest given us a small some-more warning. ®

Share with your friends:
Share on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on LinkedInShare on StumbleUpon

Leave a Reply

Your email address will not be published. Required fields are marked *