• Re: [PATCH v5 1/4] usb: dbc: early driver for xhci debug capability

    From Lu Baolu@110:300/11 to All on Thu Jan 26 04:30:02 2017
    Hi Ingo,

    On 01/25/2017 05:23 PM, Ingo Molnar wrote:
    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Hiding essentially an early udelay() implementation in an early-printk driver is
    ugly and counterproductive.
    Sure. How about below change?

    diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
    index d3f0c84..940989e 100644
    --- a/drivers/usb/early/xhci-dbc.c
    +++ b/drivers/usb/early/xhci-dbc.c
    @@ -587,6 +587,35 @@ static int xdbc_bulk_transfer(void *data, int size, bool read)
    return size;
    }

    +static void __init xdbc_udelay_calibration(void)
    +{
    + unsigned long lpj = 0;
    + unsigned int tsc_khz, cpu_khz;
    +
    + if (!boot_cpu_has(X86_FEATURE_TSC))
    + goto calibration_out;
    +
    + cpu_khz = x86_platform.calibrate_cpu();
    + tsc_khz = x86_platform.calibrate_tsc();
    +
    + if (tsc_khz == 0)
    + tsc_khz = cpu_khz;
    + else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
    + cpu_khz = tsc_khz;
    +
    + if (!tsc_khz)
    + goto calibration_out;
    +
    + lpj = tsc_khz * 1000;
    + do_div(lpj, HZ);
    +
    +calibration_out:
    + if (!lpj)
    + lpj = 1 << 22;
    +
    + loops_per_jiffy = lpj;
    +}
    +
    static int __init xdbc_early_setup(void)
    {
    int ret;
    @@ -686,6 +715,8 @@ int __init early_xdbc_parse_parameter(char *s)
    }
    xdbc.xdbc_reg = (struct xdbc_regs __iomem *)(xdbc.xhci_base + offset);

    + xdbc_udelay_calibration();
    +
    return 0;
    }
    Yeah - so could we do this in a more generic fashion, not in the early-printk

    driver but in core x86 code?

    How about below changes?

    diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
    index 4cfba94..aab7faa 100644
    - --- a/arch/x86/kernel/setup.c
    +++ b/arch/x86/kernel/setup.c
    @@ -835,6 +835,26 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
    return 0;
    }

    +static void __init simple_udelay_calibration(void)
    +{
    + unsigned int tsc_khz, cpu_khz;
    + unsigned long lpj;
    +
    + if (!boot_cpu_has(X86_FEATURE_TSC))
    + return;
    +
    + cpu_khz = x86_platform.calibrate_cpu();
    + tsc_khz = x86_platform.calibrate_tsc();
    +
    + tsc_khz = tsc_khz ? : cpu_khz;
    + if (!tsc_khz)
    + return;
    +
    + lpj = tsc_khz * 1000;
    + do_div(lpj, HZ);
    + loops_per_jiffy = lpj;
    +}
    +
    /*
    * Determine if we were loaded by an EFI loader. If so, then we have also been
    * passed the efi memmap, systab, etc., so we should use these data structures @@ -983,6 +1003,8 @@ void __init setup_arch(char **cmdline_p)
    */
    x86_configure_nx();

    + simple_udelay_calibration();
    +
    parse_early_param();


    If it's okay for you, do you want it in a separated patch or part of patch 2/4?

    Best regards,
    Lu Baolu

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Lu Baolu@110:300/11 to All on Thu Jan 26 04:40:01 2017
    Hi,

    On 01/26/2017 12:16 AM, Peter Zijlstra wrote:
    On Wed, Jan 25, 2017 at 11:51:34PM +0800, Lu Baolu wrote:

    What is timeout and why?
    Put it in simple:

    The driver sets the RUN bit in control register and polls READY
    bit in status register for the successful USB device enumeration.
    As the USB device enumeration might fail and the READY bit will
    never be set, the driver must have a timeout logic to avoid
    endless loop.

    More details:

    The operational model is that driver sets up all necessary registers
    and data structures, and then starts the debug engine by setting
    the RUN/STOP bit in the control register.

    The debug engine then brings up itself as a ready-for-enumeration
    USB device. The USB link between host and device starts link training
    and then host will detect the connected device. The hub driver in
    host will then starts the USB device enumeration processes (as defined
    in USB spec). If everything goes smoothly, the device gets enumerated
    and host can talk with the debug device.

    After that, xdbc firmware will set the READY bit in status register. And
    the driver can go ahead with data transfer over USB.
    I have vague memories from a prior discussion where you said this READY
    state can be lost at any time (cable unplug or whatnot) and at that
    point the driver should re-start the setup, right?

    Yes. So the documentation requires users not to unplug the usb
    cable during debugging. This rule applies to other debug methods
    as well.


    If there is an error other than !ready, I would
    expect the hardware to inform you of this through another status bit,
    no?
    Yeah, this might be another choice of hardware design. But it's not a
    topic for this driver.
    So is there really no way to way to distinguish between "I did setup and
    am waiting for READY", "I did setup, am waiting for READY, but things
    got hosed" and "I was READY, things be hosed" ?

    I suppose the first and last can be distinguished by remembering if you
    ever saw READY, but the first and second are the interesting case I
    think.

    So why can't you poll indefinitely for either ready or error?

    Even if the hardware has both ready and error status bits, it's still
    nice to have a time out watch dog. Buggy hardware or firmware
    might not set any of these bits. Polling indefinitely might result in
    a endless loop.
    Loosing output, esp. without indication, is very _very_ annoying when
    you're debugging things. Its just about on par with a stuck system, at
    least then you know something bad happened.

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.

    Best regards,
    Lu Baolu

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Ingo Molnar@110:300/11 to All on Thu Jan 26 08:20:01 2017

    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.

    What does the hardware do in this case? The XHCI registers are in the host hardware, so they won't disappear, right? Is there some cable connection status

    bit we can extract without interrupts?

    I.e. if there's any polling component then it would be reasonable to add an error
    component: poll the status and if it goes 'disconnected' then disable early-printk
    altogether in this case and trigger an emergency printk() so that there's chance
    that the user notices [if the system does not misbehave otherwise].

    I.e. try to be as robust and informative as lockdep - yet don't lock up the host
    kernel: lockdep too is called from very deep internals, there are various conditions where it sees corrupt data structures (i.e. a 'disconnect' - a system
    environment outside the normal bounds of operation), yet of the kernel and over

    the last 10+ years of lockdep's existence we had very, very few cases of lockdep
    itself locking up and behaving unpredictably.

    Thanks,

    Ingo

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Ingo Molnar@110:300/11 to All on Thu Jan 26 08:30:01 2017

    * Lu Baolu <baolu.lu@linux.intel.com> wrote:


    Hi,

    On 01/25/2017 10:38 PM, Peter Zijlstra wrote:
    On Wed, Jan 25, 2017 at 08:27:38PM +0800, Lu Baolu wrote:
    In my driver, udelay() is mostly used to handle time out.

    Xdbc hides most USB things in its firmware. Early printk driver only needs >> to setup the registers/data structures and wait until link ready or time
    out.
    Without udelay(), I have no means to convert the polling times into
    waiting
    time.
    What is timeout and why?

    Put it in simple:

    The driver sets the RUN bit in control register and polls READY
    bit in status register for the successful USB device enumeration.
    As the USB device enumeration might fail and the READY bit will
    never be set, the driver must have a timeout logic to avoid
    endless loop.

    Is there any error status available in the host registers anywhere that tells us
    that enumeration did not succeed?

    Thanks,

    Ingo

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Lu Baolu@110:300/11 to All on Thu Jan 26 08:50:02 2017
    Hi Ingo,

    On 01/26/2017 03:19 PM, Ingo Molnar wrote:
    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.
    What does the hardware do in this case? The XHCI registers are in the host hardware, so they won't disappear, right? Is there some cable connection
    status
    bit we can extract without interrupts?

    Yes, there are register bits for us to know the cable status. I will go
    through the spec again and give you more accurate answer later.

    I'm sorry. I will be off during the next 7 days for Chinese New Year
    holiday. My email access will be very limited during this time. I will
    revisit this thread after I am back from holiday.

    Sorry for the inconvenience.

    Best regards,
    Lu Baolu

    I.e. if there's any polling component then it would be reasonable to add an
    error
    component: poll the status and if it goes 'disconnected' then disable
    early-printk
    altogether in this case and trigger an emergency printk() so that there's
    chance
    that the user notices [if the system does not misbehave otherwise].

    I.e. try to be as robust and informative as lockdep - yet don't lock up the
    host
    kernel: lockdep too is called from very deep internals, there are various conditions where it sees corrupt data structures (i.e. a 'disconnect' - a
    system
    environment outside the normal bounds of operation), yet of the kernel and
    over
    the last 10+ years of lockdep's existence we had very, very few cases of
    lockdep
    itself locking up and behaving unpredictably.

    Thanks,

    Ingo


    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Ingo Molnar@110:300/11 to All on Thu Jan 26 09:20:01 2017

    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Hi Ingo,

    On 01/26/2017 03:19 PM, Ingo Molnar wrote:
    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.
    What does the hardware do in this case? The XHCI registers are in the host hardware, so they won't disappear, right? Is there some cable connection
    status
    bit we can extract without interrupts?

    Yes, there are register bits for us to know the cable status. I will go through the spec again and give you more accurate answer later.

    Ok, that's good news - so we don't really have to time out and we don't have to

    rely on the user holding the phone right either.

    I'm sorry. I will be off during the next 7 days for Chinese New Year
    holiday. My email access will be very limited during this time. I will revisit this thread after I am back from holiday.

    Sorry for the inconvenience.

    No problem, have fun!

    Thanks,

    Ingo

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Peter Zijlstra@110:300/11 to All on Thu Jan 26 11:30:03 2017
    On Thu, Jan 26, 2017 at 08:19:37AM +0100, Ingo Molnar wrote:

    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.

    What does the hardware do in this case? The XHCI registers are in the host hardware, so they won't disappear, right? Is there some cable connection
    status
    bit we can extract without interrupts?

    I.e. if there's any polling component then it would be reasonable to add an
    error
    component: poll the status and if it goes 'disconnected' then disable
    early-printk
    altogether in this case and trigger an emergency printk() so that there's
    chance
    that the user notices [if the system does not misbehave otherwise].

    That'll be fun when printk() == early_printk() :-)

    I myself wouldn't mind the system getting stuck until the link is re-established. My own damn fault for taking that cable out etc.

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Ingo Molnar@110:300/11 to All on Thu Jan 26 17:10:01 2017

    * Peter Zijlstra <peterz@infradead.org> wrote:

    On Thu, Jan 26, 2017 at 08:19:37AM +0100, Ingo Molnar wrote:

    * Lu Baolu <baolu.lu@linux.intel.com> wrote:

    Fair enough.

    USB connection is stable enough, unless the user unplugs the
    USB cable during debugging.

    What does the hardware do in this case? The XHCI registers are in the host hardware, so they won't disappear, right? Is there some cable connection
    status
    bit we can extract without interrupts?

    I.e. if there's any polling component then it would be reasonable to add an
    error
    component: poll the status and if it goes 'disconnected' then disable
    early-printk
    altogether in this case and trigger an emergency printk() so that there's
    chance
    that the user notices [if the system does not misbehave otherwise].

    That'll be fun when printk() == early_printk() :-)

    My suggestion would be to just print into the printk buffer directly in this case,
    without console output - the developer will notice it in 'dmesg'.

    I myself wouldn't mind the system getting stuck until the link is re-established. My own damn fault for taking that cable out etc.

    That's fine too, although beyond the obvious "yanked the cable without realizing
    it" case there are corner cases where usability is increased massively if the kernel is more proactive about error conditions: for example there are sub-standard USB cables and there are too long USB pathways from overloaded USB

    hubs which can result in intermittent behavior, etc.

    A clear diagnostic message in 'dmesg' that the USB host controller is unhappy about the USB-debug dongle device is a _lot_ more useful when troubleshooting such
    problems than the occasional weird, non-deterministic hang...

    Thanks,

    Ingo

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Peter Zijlstra@110:300/11 to All on Thu Jan 26 18:50:01 2017
    On Thu, Jan 26, 2017 at 05:01:05PM +0100, Ingo Molnar wrote:

    I.e. if there's any polling component then it would be reasonable to add
    an error
    component: poll the status and if it goes 'disconnected' then disable
    early-printk
    altogether in this case and trigger an emergency printk() so that there's
    chance
    that the user notices [if the system does not misbehave otherwise].

    That'll be fun when printk() == early_printk() :-)

    My suggestion would be to just print into the printk buffer directly in this
    case,
    without console output - the developer will notice it in 'dmesg'.

    When you map printk() onto early_printk() dmesg will be empty, there
    will be nothing there, and therefore no reason what so ever to look
    there. I certainly don't ever look there.

    Note that the printk buffer itself is a major part of why printk sucks
    donkey balls. Not to mention that you really cannot have an
    early_printk() implementation that depends on printk().

    I myself wouldn't mind the system getting stuck until the link is re-established. My own damn fault for taking that cable out etc.

    That's fine too, although beyond the obvious "yanked the cable without
    realizing
    it" case there are corner cases where usability is increased massively if the

    kernel is more proactive about error conditions: for example there are sub-standard USB cables and there are too long USB pathways from overloaded
    USB
    hubs which can result in intermittent behavior, etc.

    A clear diagnostic message in 'dmesg' that the USB host controller is unhappy

    about the USB-debug dongle device is a _lot_ more useful when troubleshooting
    such
    problems than the occasional weird, non-deterministic hang...

    Sure, I'm just not sure what or where makes sense.

    If your serial cable is bad you notice because you don't receive the
    right amount of characters and or stuff gets mangled. You chuck the
    cable and get a new one.

    I think the most important part is re-establishing the link when the
    cable gets re-inserted. Maybe we should just drop all characters written
    when there's no link and leave it at that, same as serial.

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)
  • From Ingo Molnar@110:300/11 to All on Fri Jan 27 08:00:01 2017

    * Peter Zijlstra <peterz@infradead.org> wrote:

    On Thu, Jan 26, 2017 at 05:01:05PM +0100, Ingo Molnar wrote:

    I.e. if there's any polling component then it would be reasonable to
    add an error
    component: poll the status and if it goes 'disconnected' then disable
    early-printk
    altogether in this case and trigger an emergency printk() so that
    there's chance
    that the user notices [if the system does not misbehave otherwise].

    That'll be fun when printk() == early_printk() :-)

    My suggestion would be to just print into the printk buffer directly in
    this case,
    without console output - the developer will notice it in 'dmesg'.

    When you map printk() onto early_printk() dmesg will be empty, there
    will be nothing there, and therefore no reason what so ever to look
    there.

    Unless you want a third layer of a console driver putting the debug message into
    dmesg isn't all that bad of a solution.

    Let's admit it: something like USB that involves external pieces of hardware _does_ have failure modes, and troubleshooting messages instead of indefinite hangs are obviously more robust.

    I certainly don't ever look there.

    You'll have to teach yourself that if the box boots up fine but there are no messages whatsoever from the early-printk console that you'll need to look at dmesg output or the syslog for more clues.

    This should not be a common occurrance in any case - but when it happens it's very
    useful to have diagnostic messages. I don't think this is a controversial point
    in
    any fashion.

    Note that the printk buffer itself is a major part of why printk sucks donkey

    balls. Not to mention that you really cannot have an early_printk() implementation that depends on printk().

    There are several easy solutions to do that, my favorite would be to put it into
    the printk buffer totally unlocked. When your early-printk is active it's unused
    and in the end it's a known data structure after all:

    /*
    * Just zap whatever's in the printk buffer and put your emergency message into

    * it, prominently. No locking, no worries - don't generate emergency messages
    * while printk is active and syslogd is running - this facility is a poor man's
    * fallback printk() when early-printk has taken over all kernel logging:
    */
    void printk_emergency_puts(const char *str)
    {
    struct printk_log *msg, *msg_end;

    msg = log_buf;

    memset(msg, 0, sizeof(*msg));
    msg.text_len = strlen(str);

    msg_end = (void *)msg + sizeof(*msg) + msg->text_len;

    /* Zero ->len denotes end of log buffer: */
    memset(msg_end, 0, sizeof(*msg_end));

    snprintf(ptr, str);
    }

    ....
    printk_emergency_puts"earlyprintk emergency: Hardware timed out, shutting
    down. Fix your debug cable?\n");
    ....

    (Or so - totally untested, some details might be wrong.)

    But yes, I agree with your wider point, I just looked at kernel/printk/printk.c

    and puked. Why did we merge that crappy piece of binary logging code, when we already have two other binary logging facilities in the kernel already, both of

    them better and cleaner than this?? Why did we mess up our nicely readable, simple, reliable ASCII log buffer printk code? :-(

    I myself wouldn't mind the system getting stuck until the link is re-established. My own damn fault for taking that cable out etc.

    That's fine too, although beyond the obvious "yanked the cable without realizing it" case there are corner cases where usability is increased massively if the kernel is more proactive about error conditions: for
    example
    there are sub-standard USB cables and there are too long USB pathways from overloaded USB hubs which can result in intermittent behavior, etc.

    A clear diagnostic message in 'dmesg' that the USB host controller is
    unhappy
    about the USB-debug dongle device is a _lot_ more useful when
    troubleshooting
    such problems than the occasional weird, non-deterministic hang...

    Sure, I'm just not sure what or where makes sense.

    If your serial cable is bad you notice because you don't receive the right amount of characters and or stuff gets mangled. You chuck the cable and get a

    new one.

    I think the most important part is re-establishing the link when the cable
    gets
    re-inserted. Maybe we should just drop all characters written when there's no

    link and leave it at that, same as serial.

    That would be fine with me too - but even in this case there should be a stat counter somewhere (in /proc or /debug) that counts the number of characters dropped. Maybe that file could also display an emergency string - avoiding the interaction with the printk buffer.

    We can do better than passive-aggressive logging behavior...

    Thanks,

    Ingo

    --- MBSE BBS v1.0.6.13 (GNU/Linux-x86_64
    * Origin: linux.* mail to news gateway (110:300/11@linuxnet)