Date   

Re: B@D run on Renesas board - issue

Trung. Huynh <trung.huynh.uw@...>
 

Hi Daniel, Robert,

We got into test section already, but have no idea why it attempts to execute the thing non-exist at all:
# /lava-237/bin/lava-test-runner /lava-237/0
as you can see our error log in attachment.
I also take example log over here https://gitlab.com/snippets/1679749
to see that it can create a directory by mkdir /lava-1

Could you please tell us what does it make scene? How can we fix it?
Thank you.

P/S: healthcheck script same as Binh-san attached to you before

Best Regards,
Trung

-----Original Message-----
From: Daniel Sangorrin [mailto:daniel.sangorrin@...]
Sent: Monday, November 6, 2017 9:19 AM
To: 'Robert Marshall' <robert.marshall@...>; Binh Thanh. Nguyen <binh.nguyen.uw@...>
Cc: O365-Toru Oishi <toru.oishi.zj@...>; 'Chris Paterson' <Chris.Paterson2@...>; Trung. Huynh <trung.huynh.uw@...>; cip-dev@...
Subject: RE: B@D run on Renesas board - issue

Hello Binh-san,

-----Original Message-----
From: Robert Marshall [mailto:robert.marshall@...]
Sent: Saturday, November 04, 2017 1:05 AM
To: Binh Thanh. Nguyen
Cc: Daniel Sangorrin; O365-Toru Oishi; Chris Paterson; Trung. Huynh;
cip-dev@...
Subject: Re: B@D run on Renesas board - issue

"Binh Thanh. Nguyen" <binh.nguyen.uw@...> writes:

Hello Daniel, Robert,

We are facing an issue when trying to run healthcheck using B@D on Renesas board.
I would like to attach the log and the healthcheck script (we
modified the healthcheck from Daniel)

The boot action was passed, but after that, look like LAVA cannot
send command to Board, only send "#" (it supposed to be "uname").
I wonder if you met same issue before?
And if possible, please give us any hints you may have for debugging this issue!

Best regards,
Binh Nguyen
Thanh

Sorry for the delay in responding!

That test is missing a deploy section (all commented out) are you
trying to make the test too minimal? LAVA tends to swallow output if
it is what is expected.

Robert
As Robert said, please do not comment out the deploy section. LAVA does not just deploy the binaries as they are, it also copies some scripts (LAVA Test shell) into the ramdisk (or network filesystem).
# LAVA does not use scp (or a serial protocol) to copy them into the target

For deploying you need a PDU (remote power switch). If you don't have one you need to buy one and then add the corresponding commands.
# Note: I use an IP Power 9258 remote PDU.
# Note: do not think you can get around without a PDU*

Thanks,
Daniel

*There are ways to do it but they have limitations https://validation.linaro.org/static/docs/v2/dispatcher-design.html#primary-connection


CIP Testing Project next steps: some ideas

Agustin Benito Bethencourt <agustin.benito@...>
 

Hi,

(this is a long mail)

in order to understand the ideas about the future of the CIP Testing project, let me enumerate the basic requirements that led CIP to pursue the current strategy, now under revision.

++ Fully de-centralise service architecture

A decision to go for a transparent & fully de-centralised service architecture as testing service architecture for CIP based on LAVA + KernelCI was based in the following arguments:

- Conclusion from evaluating other projects: kernelci.org , LTSI, AGL, openSUSE, openStack.
- Back then, AGL picture around testing was unclear.
- FUEGO back then had 3 different repositories/forks.
- Cost and current manpower availability.
- After evaluating costs, it was easy to come to the conclusion that CIP could not create its own (semi)centralised service. This was the main driver to design a fully distributed service architecture for CIP testing at that point.
- Priorities within CIP.
- Back then the priority was to test the CIP kernel.
- Adaptability to CIP nature.
- The CIP Testing team concluded that LAVA+KernelCI could be adapted to our use case and service architecture.
- Dependencies.
- Back then we agreed that we could only maintain the kernel for a very long time if we took ownership of the kernel maintenance activities. The same principle applies to the testing service, environment, tests...

I summarised the above arguments in the following text that some of you have read before:

Sharing results is far from enough. CIP needs to ensure transparently that any engineer is:
...using the same tests...
...to test the same CIP system...
...on the same boards...
...with the same tool set...
...under the same environment...
...producing the same reports...
...comparable through canonical logs.

++ New picture

Over a year later there are some new elements to consider

1. AGL is making a significant investment in a semi-centralised testing service, based on LAVA+KernelCI. They are committed to maintain it.

2. As part of that work, several containers (KernelCI) to be maintained by AGL will be available soon, which would open the door for CIP to adopt a container based solution instead of a VM based one, reducing the effort to meet our use case.

Part of the work Codethink did in the provisioning and deployment front could then be delegated to AGL, reducing the effort to the configuration side only.

3. As part of the AGL work, KernelCI (data visualization) is being improved. The expectation is that these improvements end upstream, in kernelci.org

4.There are plans to integrate LAVA and FUEGO in the future according to FUEGO maintainer, Tim Bird.

5. LAVA development seem to go through a more stable phase. Dramatic changes in core areas has not been the norm in the last few months, once the transition from V1 to V2 was completed.

6. kernelci.org today includes older kernels like 4.4, 3.2...

7. KernelCI maintainer is no longer a Linaro employee.

8. Toshiba, a CIP Member, is investing directly in FUEGO.

++ Conversations at ELCE 2017

* CIP and AGL CIAT held a meeting to stablish areas of collaboration. The main conclusion was that CIP can adopt the containers based architecture in B@D, relying on AGL on maintaining each container (KernelCI).

On the other hand, CIP could focus more on the reporting side and also on creating tests. These two areas should be shared with AGL and upstream to kernelci.org

I see this agreement as a natural step forward. Action 2 of CIP Testing strategy already defined those two work areas as ToDo. It would be about raising their priority.

By relying on AGL on the provisioning and deployment side (containers) it is expected that the current workload on B@D could be reduced in the mid term. There is an initial technical work to be done that would take effort, in order to release B@D v2.0 as containers.

Keeping a similar effort than in the past, the amount of effort available for Action 2 would then be greater than Codethink could put during the past year.

++ Collaborating with kernelci.org

There are several collaboration points with kernelci.org:

a. Upstream any work done by CIP on the testing creation front.
b. Upstream any work done by CIP on the reporting side.
c. Sharing CIP results from B@D through kernelci.org
d. Include the CIP kernel tree in kernelci.org Some consideration about this will be done later.
e. Fully participate in kernelci.org

Let me add some considerations for each point.

a. Upstream any work done by CIP on the testing creation front.

I do not see in theory any significant issue in this front as long as the tests we create refers to upstream kernel code. If tests refers to code that is not upstream, I see some challenges in using them upstream.

b. Upstream any work done by CIP on the reporting side

The question CIP will need to answer is if the reports CIP kernel maintainers need are generic enough that will be also useful for other old kernel maintainers and kernel developers.

In general the answer should be yes, they should be useful, so it would surprise me if most of our reports does not end up upstream. Obviously we would need to comply with some technical decisions implemented in kernelci.org.

c. Sharing CIP results through kernelci.org

kernelci.org has a specific service architecture. B@D does not match that architecture since it was designed to test kernel locally.

I am not very optimistic about being able to use only the visualization part of the kernelci.org service since B@D uses its own LAVA, not Linaro one.

CIP would need to raise this use case upstream and evaluate the answer from Linaro and the kernelci.org project.

d. Include the CIP kernel tree in kernelci.org

I see no problem in including Ben's tree in kernelci.org. The 4.4 LTS tree is already there.

The problem comes with the fact that CIP/Ben H. would be expected of taking care of the results analysis associated with that tree. That is what good Open Source citizens would do, assume the workload of their requests. I doubt we can assume that workload anytime soon. In any case we only care about specific boards.

Managing this use case would require some discussion with the kernelci.org community. I would explore it.

e. Fully participate in kernelci.org

With the above in mind, any organization, company or individual can participate in kernelci.org. CIP can too. But different requirements lead to adaptations which cannot be done and maintained without control over the service.

Linaro and the CIP project (Linux Foundation) do not necessarily have the same requirements today (or in 15 years). No matter if CIP fully participates upstream or not, my opinion is that the same strategy than in CIP Core applies here.

CIP needs control over its own testing service.

++ Current options

1. If kernelci.org is the answer, I propose to explore an agreement with Linaro. That would provide CIP some level of control.

Participating directly upstream in the current setup of the project, where it is Linaro the organization providing the service and the main investment without such agreement has risks. CIP should become upstream.

2. If the idea is joining forces with AGL, then an agreement with them should be the way to go.

I see two approaches:
i The one discussed during ELCE 2017, described above, where both organization collaborate but each one has its own strategy. Both of them upstream as much as possible.
ii One in which AGL provides the service to CIP, becoming AGL CIAT service consumers, contributing where we can and make sense to us.

3. Set up our own (semi)centralised testing service.

I think the same rationale CIP applied over a year ago is valid today. Setting up our own service is not an option unless the current level of investment in the CIP Testing project significantly increases, via monetary investment or/and contributions from Member.

Now we have a clear cost example in AGL. We just need to ask to find out how much we are talking about for the set up. In a couple of years we will know more about its maintenance.

++ Final remarks

I am glad that the technical and strategic decisions we took over a year ago has not closed CIP any door in the current scenario. Back then the situation was more blur than it is today.

Based on my experience, a (semi)centralised testing service is very expensive. I wonder how viable it is to have several LF initiatives setting up and maintaining independent embedded focused testing services. I hope the situation in a year is significantly better than today in this regard across initiatives.

When AGL decided to go for LAVA + KernelCI for its testing service, as CIP did, several new options opened for us. They need to be carefully considered. We are in a good position to share costs and risks with them.

As we have done in the kernel maintenance front, the closer we stay to upstream in the testing front, the less work we will do and the greater support we will get. kernelci.org is upstream in this regard. We can contribute in key areas, like tests, with very little cost and great benefit.

As in any Open Source project, before relying completely on upstream, an evaluation of the ownership and set up (ground rules) of that upstream project need to be carefully considered to understand, and ultimately assume, the potential risks inherited at the organization, legal... levels. This is true also for kernelci.org

I believe that with the current level of investment, CIP needs to share part of the workload related with the tooling in order to make meaningful contributions in the testing front, which is at the very end what we all want.

The CIP investment will need to grow in order to raise the quality threshold of what we are shipping. We all agree that testing is a central part of the CIP activity. It is not just a technical commitment. It is also about credibility.

Based on the new facts, if as a result of the re-thinking of the current testing strategy, B@D stop being a central part of the picture, that is fine for Codethink. My company is not afraid of killing their own babies when better options appear. What over a year ago made sense might not next year. We've been there several times before. The tragedy would be carrying on an iron ball chained to our foot.

I would explore in depth options 2.i and 2.ii, in that order, as initial steps.

Excuses for these two long mails. There will be no more :-)

Best Regards
--
Agustin Benito Bethencourt
Principal Consultant - FOSS at Codethink
agustin.benito@...


CIP Testing Project: recap

Agustin Benito Bethencourt <agustin.benito@...>
 

Hi,

(this i a long mail)

As the below summary shows with more or less accuracy, the steps taken on the CIP Testing Project front has been:

1 Discuss the testing strategy. (completed)

2 Identify as initial need the availability of a testing framework that would allow us to test at least, the CIP kernel. Identify other steps to be taken. (completed)

3 Define the testing service architecture. (completed)

4 Work on the test environment according to the above requirements and strategy. (completed - ongoing). Action 1.

5 Release the test environment. (completed - ongoing).
Action 1: https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptesting#action-1board-at-desk-single-developer-b-d

6 Basic kernel testing (in progress).
Action 2: https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptesting#action-2cip-kernel-testing

7 Extend the number of technologies used to deploy B@D.
Action 3: https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptesting#action-3extend-the-number-of-technologies-used-to-deploy-b-d

8 From kernel testing to system testing.
Action 4: https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptesting#action-4from-kernel-testing-to-system-testing

9 Define kernel/system testing as a semi-distributed service within CIP.
Action 5: https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptesting#action-5define-kernelsystem-testing-as-a-semi-distributed-service-within-cip

So point 6 (action 2) is where the CIP Testing project is today.

At ELCE 2017, based in several news and trends, and since the current service contract with Codethink was coming to an end, we all agreed that it was time to re-evaluate the current plan. As mentioned during today's TSC meeting, I will send a second e-mail with my views.

+ Historical view of the CIP Testing Project

A few months ago I did the following description that I now update and send to this list. The goal was back then, and now, to provide a historical overview of where are we in the CIP Testing Project. To do so, I have used the slides[1] from the different talks CIP has delivered, which are public.

++ July 2016 LinuxCon Japan[2]

* Slides 16 and 17 define the goals and plans. No specific mention to testing or how the kernel maintenance will be carried on
* Speakers: Yoshitake Kobayashi, Toshiba, Hiroshi Mine, Hitachi, Ltd. and Jan Kiszka, Siemens AG.

++ Aug 2016 LinuxCon North America[3]

* Slide 14 defines the Validation (testing) goals, defining as initial step of the testing project to set up the test infrastructure.
* Slide 17 describes the testing project steps in more detail. What later was called B@D is already mentioned, based on the goals previously set in slide 14.
* Jan from Siemens delivered the talk.

++ Oct 6th LinuxCon EU[4]

* No new content on the testing front was introduced in this event but the slides 14 and 17 of the deck used in August are described here again: slide 14 and 17.
* Talk delivered by Urs Gleim, Siemens AG, Corporate Technology & Yoshitake Kobayashi, Toshiba

++ Oct 2016 ELCE[5]

* Slide 30. The testing strategy figures as a milestone.
* Slide 23 replicates the strategy for kernel validation, which establishes that setting up the infrastructure is the first goal.
* Slide 24 develops those goals.
* Slide 25 shows the current steps. It even shows screenshots of the, by then, already functional solution.
* Slide 29 reflects the discussion points. You can see how the discussions point on the direction of future extensions of the already work in progress solution (KernelCI, with other solutions).
* The talk is delivered by Urs from Siemens and Yoshi from Toshiba.

++ Feb 2017 OSLS[6]

* Slide 18 describes the work in progress on the testing front
* Slide 20 defines the next steps to be taken in the CIP testing project.
* The talk was delivered by Noriaki Fukuyasu (Linux Foundation) and Agustin Benito Bethencourt (Codethink)

++ Feb 2017 ELC[7]

* Slide 17 defines the initial steps in the strategy towards validating the base layer (not the kernel alone). Again, it defines to create a test framework as initial step, like in the case of the kernel a year earlier.
* Slide 25 defines a more specific goals for CIP testing than before.
* Slide 26 provides an overview of the four key actions that the CIP testing project has defined.
* Slide 27 describes the status of the "test framework definition work" and the next steps.
* Slide 36 describes a little better some of the next steps planned related with the CIP testing project.

++ May 2017 OSSJ 2017[8]

* Slide 26 replicates the specific goals of the CIP testing project when it comes to the kernel.
* Slide 27 should be compared to slide 26 of the previous deck. It shows the progress on the CIP testing front.
* Slide 28 Reflects the release of CIP kernel testing framework called B@D.
* Slide 29 describes the next steps on the CIP kernel testing front.
* Slide 37 enumerates the key news from the project, being one of the the release of the kernel testing framework for the CIP kernel.
* Slide 39 describes the next steps to be taken.

++ October 2017 ELCE [9]

* Slide 34: enumerates the key actions of the CIP Testing Project plan.
* Slide 35: highlights of B@D v1.0 release.
* Slide 36: next steps of the CIP Testing project.
* The talk was delivered by Yoshi (Toshiba) and Urs (Siemens)

[1] https://wiki.linuxfoundation.org/civilinfrastructureplatform/cipconferences
[2] http://events.linuxfoundation.org/sites/events/files/slides/Intriducing_the_CIP-LCJ2016.pdf
[3] http://events.linuxfoundation.org/sites/events/files/slides/CIP-LinuxConNA.pdf
[4] https://wiki.linuxfoundation.org/_media/civilinfrastructureplatform/2016-10-06_cip-linuxconeu-r06.pdf
[5] https://wiki.linuxfoundation.org/_media/civilinfrastructureplatform/2016-10-06_cip-linuxconeu-r06.pdf
[6] https://wiki.linuxfoundation.org/_media/cip/osls2017_cip_v1_0.pdf
[7] http://events.linuxfoundation.jp/sites/events/files/slides/2017-02-22_CIP-ELC-r7.pdf
[8] http://events.linuxfoundation.org/sites/events/files/slides/2017-06-02_CIP-OSSJ-final_1.pdf
[9] http://events.linuxfoundation.org/sites/events/files/slides/2017-10-24_CIP-ELCE%20-%20v10.3.pdf

Best Regards
--
Agustin Benito Bethencourt
Principal Consultant - FOSS at Codethink
agustin.benito@...


CIP kernel maintenance talk at ELCE 2017 in our wiki

Agustin Benito Bethencourt <agustin.benito@...>
 

Hi,

I just uploaded the links to the slides and video of the talk delivered by Ben H. and myself at ELCE 2017.

Check them out: https://wiki.linuxfoundation.org/civilinfrastructureplatform/cipconferences#presentations

Best Regards
--
Agustin Benito Bethencourt
Principal Consultant - FOSS at Codethink
agustin.benito@...


Re: RFC: OpenEmbedded layer for CIP project

Chris Paterson
 

Hello Sean,

From: cip-dev-bounces@... [mailto:cip-dev-
bounces@...] On Behalf Of Hudson, Sean
Sent: 19 September 2017 16:46

Hey all,

I'm still new to the project and working my way through the email archives.
However, I've started creating a meta-cip layer to capture metadata for
building the CIP project with OpenEmbedded. This will allow it to build with
the Yocto Project, as well. I'll create a project on gitlab soon to start collecting
my changes.

Any comments, concerns or thoughts?
Following on from our conversations at ELC-E I wanted to reboot this thread.

Have you had a chance to look into this topic again? Did you create the gitlab project in the end?

What were you aiming to include in your meta-cip layer?

Kind regards, Chris


[PATCH 10/10] watchdog: core: add option to avoid early handling of watchdog

Maxim Yu, Osipov
 

From: Sebastian Reichel <sebastian.reichel@...>

commit 2501b015313fe2fa40ed11fa4dd1748e09b7c773 upstream.

On some systems its desirable to have watchdog reboot the system
when it does not come up fast enough. This adds a kernel parameter
to disable the auto-update of watchdog before userspace takes over
and a kernel option to set the default. The info messages were
added to shorten error searching on misconfigured systems.

Signed-off-by: Sebastian Reichel <sebastian.reichel@...>
Reviewed-by: Guenter Roeck <linux@...>
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/Kconfig | 11 +++++++++++
drivers/watchdog/watchdog_dev.c | 21 +++++++++++++++++----
2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 1c427beffadd..88743cd0a7e6 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -46,6 +46,17 @@ config WATCHDOG_NOWAYOUT
get killed. If you say Y here, the watchdog cannot be stopped once
it has been started.

+config WATCHDOG_HANDLE_BOOT_ENABLED
+ bool "Update boot-enabled watchdog until userspace takes over"
+ default y
+ help
+ The default watchdog behaviour (which you get if you say Y here) is
+ to ping watchdog devices that were enabled before the driver has
+ been loaded until control is taken over from userspace using the
+ /dev/watchdog file. If you say N here, the kernel will not update
+ the watchdog on its own. Thus if your userspace does not start fast
+ enough your device will reboot.
+
#
# General Watchdog drivers
#
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 1edc3d78636b..8ca87ae467f2 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -53,6 +53,9 @@ static struct watchdog_device *old_wdd;

static struct workqueue_struct *watchdog_wq;

+static bool handle_boot_enabled =
+ IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
+
static inline bool watchdog_need_worker(struct watchdog_device *wdd)
{
/* All variables in milli-seconds */
@@ -683,10 +686,15 @@ int watchdog_dev_register(struct watchdog_device *wdd)
* and schedule an immediate ping.
*/
if (watchdog_hw_running(wdd)) {
- __module_get(wdd->ops->owner);
- if (wdd->ops->ref)
- wdd->ops->ref(wdd);
- queue_delayed_work(watchdog_wq, &wdd->work, 0);
+ if (handle_boot_enabled) {
+ __module_get(wdd->ops->owner);
+ if (wdd->ops->ref)
+ wdd->ops->ref(wdd);
+ queue_delayed_work(watchdog_wq, &wdd->work, 0);
+ } else {
+ pr_info("watchdog%d running and kernel based pre-userspace handler disabled\n",
+ wdd->id);
+ }
}

return 0;
@@ -756,3 +764,8 @@ void __exit watchdog_dev_exit(void)
unregister_chrdev_region(watchdog_devt, MAX_DOGS);
destroy_workqueue(watchdog_wq);
}
+
+module_param(handle_boot_enabled, bool, 0444);
+MODULE_PARM_DESC(handle_boot_enabled,
+ "Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
+ __MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
--
2.11.0


[PATCH 09/10] watchdog: core: Clear WDOG_HW_RUNNING before calling the stop function

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit 3c10bbde10fe4dca52726e246cefa6b0a1dfbd3e upstream.

WDOG_HW_RUNNING indicates that the hardware watchdog is running while the
watchdog device is closed. The flag may be set by the driver when it is
instantiated to indicate that the watchdog is running, and that the
watchdog core needs to send heartbeat requests to the driver until the
watchdog device is opened.

When the watchdog device is closed, the flag can be used by the driver's
stop function to indicate to the watchdog core that it was unable to stop
the watchdog, and that the watchdog core needs to send heartbeat requests.
This only works if the flag is actually cleared when the watchdog is
stopped. To avoid having to clear the flag in each driver's stop function,
clear it in the watchdog core before calling the stop function.

Reported-by: Rasmus Villemoes <rasmus.villemoes@...>
Fixes: ee142889e32f ("watchdog: Introduce WDOG_HW_RUNNING flag")
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/watchdog_dev.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 263d7fa0ef31..1edc3d78636b 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -241,10 +241,12 @@ static int watchdog_stop(struct watchdog_device *wdd)
goto out_stop;
}

- if (wdd->ops->stop)
+ if (wdd->ops->stop) {
+ clear_bit(WDOG_HW_RUNNING, &wdd->status);
err = wdd->ops->stop(wdd);
- else
+ } else {
set_bit(WDOG_HW_RUNNING, &wdd->status);
+ }

if (err == 0) {
clear_bit(WDOG_ACTIVE, &wdd->status);
--
2.11.0


[PATCH 08/10] watchdog: core: Fix error handling of watchdog_dev_init()

Maxim Yu, Osipov
 

From: Wei Yongjun <yongjun_wei@...>

commit 138913cb632be12344982e54ccd12f6f15971bf upstream.

Fix the error handling paths of watchdog_dev_init().

Signed-off-by: Wei Yongjun <yongjun_wei@...>
Reviewed-by: Guenter Roeck <linux@...>
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/watchdog_dev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 2a204bb56705..263d7fa0ef31 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -731,8 +731,15 @@ int __init watchdog_dev_init(void)
}

err = alloc_chrdev_region(&watchdog_devt, 0, MAX_DOGS, "watchdog");
- if (err < 0)
+ if (err < 0) {
pr_err("watchdog: unable to allocate char dev region\n");
+ goto err_alloc;
+ }
+
+ return 0;
+
+err_alloc:
+ destroy_workqueue(watchdog_wq);
return err;
}

--
2.11.0


[PATCH 07/10] watchdog: change watchdog_need_worker logic

Maxim Yu, Osipov
 

From: Rasmus Villemoes <rasmus.villemoes@...>

commit 3fbfe9264756d3fd99a9210345016c94ec4ada73 upstream.

If the driver indicates that the watchdog is running, the framework
should feed it until userspace opens the device, regardless of whether
the driver has set max_hw_heartbeat_ms.

This patch only affects the case where wdd->max_hw_heartbeat_ms is
zero, wdd->timeout is non-zero, the watchdog is not active and the
hardware device is running (*):

- If wdd->timeout is zero, watchdog_need_worker() returns false both
before and after this patch, and watchdog_next_keepalive() is not
called.

- If watchdog_active(wdd), the return value from watchdog_need_worker
is also the same as before (namely, hm && t > hm). Hence in that case,
watchdog_next_keepalive() is only called if hm == max_hw_heartbeat_ms
is non-zero, so the change to min_not_zero there is a no-op.

- If the watchdog is not active and the device is not running, we
return false from watchdog_need_worker just as before.

That leaves the watchdog_hw_running(wdd) && !watchdog_active(wdd) &&
wdd->timeout case. Again, it's easy to see that if
wdd->max_hw_heartbeat_ms is non-zero, we return true from
watchdog_need_worker with and without this patch, and the logic in
watchdog_next_keepalive is unchanged. Finally, if
wdd->max_hw_heartbeat_ms is 0, we used to end up in the
cancel_delayed_work branch, whereas with this patch we end up
scheduling a ping timeout_ms/2 from now.

(*) This should imply that no current kernel drivers are affected,
since the only drivers which explicitly set WDOG_HW_RUNNING are
imx2_wdt.c and dw_wdt.c, both of which also provide a non-zero value
for max_hw_heartbeat_ms. The watchdog core also sets WDOG_HW_RUNNING,
but only when the driver doesn't provide ->stop, in which case it
must, according to Documentation/watchdog/watchdog-kernel-api.txt, set
max_hw_heartbeat_ms.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@...>
Reviewed-by: Guenter Roeck <linux@...>
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/watchdog_dev.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index e98e4294ec92..2a204bb56705 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -67,9 +67,13 @@ static inline bool watchdog_need_worker(struct watchdog_device *wdd)
* thus is aware that the framework supports generating heartbeat
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
+ *
+ * Alternatively, if userspace has not opened the watchdog
+ * device, we take care of feeding the watchdog if it is
+ * running.
*/
- return hm && ((watchdog_active(wdd) && t > hm) ||
- (t && !watchdog_active(wdd) && watchdog_hw_running(wdd)));
+ return (hm && watchdog_active(wdd) && t > hm) ||
+ (t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
}

static long watchdog_next_keepalive(struct watchdog_device *wdd)
@@ -81,7 +85,7 @@ static long watchdog_next_keepalive(struct watchdog_device *wdd)
unsigned int hw_heartbeat_ms;

virt_timeout = wdd->last_keepalive + msecs_to_jiffies(timeout_ms);
- hw_heartbeat_ms = min(timeout_ms, wdd->max_hw_heartbeat_ms);
+ hw_heartbeat_ms = min_not_zero(timeout_ms, wdd->max_hw_heartbeat_ms);
keepalive_interval = msecs_to_jiffies(hw_heartbeat_ms / 2);

if (!watchdog_active(wdd))
--
2.11.0


[PATCH 06/10] watchdog: skip min and max timeout validity check when max_hw_heartbeat_ms is defined

Maxim Yu, Osipov
 

From: Pratyush Anand <panand@...>

commit 1894cad9bf2c10359b2b7a0c00e564698f712751 upstream.

When max_hw_heartbeat_ms has a none zero value, max_timeout is not used.
So it's value can be 0. In such case if a driver uses min_timeout
functionality, then check will always fail.

This patch fixes above issue.

Signed-off-by: Pratyush Anand <panand@...>
Signed-off-by: Fu Wei <fu.wei@...>
Reviewed-by: Guenter Roeck <linux@...>
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/watchdog_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 41f1854d026b..5de14c51bf95 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -88,7 +88,7 @@ static void watchdog_check_min_max_timeout(struct watchdog_device *wdd)
* Check that we have valid min and max timeout values, if
* not reset them both to 0 (=not used or unknown)
*/
- if (wdd->min_timeout > wdd->max_timeout) {
+ if (!wdd->max_hw_heartbeat_ms && wdd->min_timeout > wdd->max_timeout) {
pr_info("Invalid min and max timeout values, resetting to 0!\n");
wdd->min_timeout = 0;
wdd->max_timeout = 0;
--
2.11.0


[PATCH 05/10] watchdog: core: Fix circular locking dependency

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit e1f30282a1d3d0c75d5a08e47c6ac1563065be52 upstream.

lockdep reports the following circular locking dependency.

======================================================
INFO: possible circular locking dependency detected ]
4.6.0-rc3-00191-gfabf418 #162 Not tainted
-------------------------------------------------------
systemd/1 is trying to acquire lock:
((&(&wd_data->work)->work)){+.+...}, at: [<80141650>] flush_work+0x0/0x280

but task is already holding lock:

(&wd_data->lock){+.+...}, at: [<804acfa8>] watchdog_release+0x18/0x190

which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:

-> #1 (&wd_data->lock){+.+...}:
-------[<80662310>] mutex_lock_nested+0x64/0x4a8
-------[<804aca4c>] watchdog_ping_work+0x18/0x4c
-------[<80143128>] process_one_work+0x1ac/0x500
-------[<801434b4>] worker_thread+0x38/0x554
-------[<80149510>] kthread+0xf4/0x108
-------[<80107c10>] ret_from_fork+0x14/0x24
-> #0 ((&(&wd_data->work)->work)){+.+...}:
-------[<8017c4e8>] lock_acquire+0x70/0x90
-------[<8014169c>] flush_work+0x4c/0x280
-------[<801440f8>] __cancel_work_timer+0x9c/0x1e0
-------[<804acfcc>] watchdog_release+0x3c/0x190
-------[<8022c5e8>] __fput+0x80/0x1c8
-------[<80147b28>] task_work_run+0x94/0xc8
-------[<8010b998>] do_work_pending+0x8c/0xb4
-------[<80107ba8>] slow_work_pending+0xc/0x20
other info that might help us debug this:
Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&wd_data->lock);
lock((&(&wd_data->work)->work));
lock(&wd_data->lock);
lock((&(&wd_data->work)->work));

*** DEADLOCK ***

1 lock held by systemd/1:

stack backtrace:
CPU: 2 PID: 1 Comm: systemd Not tainted 4.6.0-rc3-00191-gfabf418 #162
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<8010f5e4>] (unwind_backtrace) from [<8010c038>] (show_stack+0x10/0x14)
[<8010c038>] (show_stack) from [<8039d7fc>] (dump_stack+0xa8/0xd4)
[<8039d7fc>] (dump_stack) from [<80177ee0>] (print_circular_bug+0x214/0x334)
[<80177ee0>] (print_circular_bug) from [<80179230>] (check_prevs_add+0x4dc/0x8e8)
[<80179230>] (check_prevs_add) from [<8017b3d8>] (__lock_acquire+0xc6c/0x14ec)
[<8017b3d8>] (__lock_acquire) from [<8017c4e8>] (lock_acquire+0x70/0x90)
[<8017c4e8>] (lock_acquire) from [<8014169c>] (flush_work+0x4c/0x280)
[<8014169c>] (flush_work) from [<801440f8>] (__cancel_work_timer+0x9c/0x1e0)
[<801440f8>] (__cancel_work_timer) from [<804acfcc>] (watchdog_release+0x3c/0x190)
[<804acfcc>] (watchdog_release) from [<8022c5e8>] (__fput+0x80/0x1c8)
[<8022c5e8>] (__fput) from [<80147b28>] (task_work_run+0x94/0xc8)
[<80147b28>] (task_work_run) from [<8010b998>] (do_work_pending+0x8c/0xb4)
[<8010b998>] (do_work_pending) from [<80107ba8>] (slow_work_pending+0xc/0x20)

Turns out the call to cancel_delayed_work_sync() in watchdog_release()
is not necessary and can be dropped. If the worker is no longer necessary,
the subsequent call to watchdog_update_worker() will cancel it. If it is
already running, it won't do anything, since the worker function checks
if it needs to ping the watchdog or not.

Reported-by: Clemens Gruber <clemens.gruber@...>
Tested-by: Clemens Gruber <clemens.gruber@...>
Fixes: 11d7aba9ceb7 ("watchdog: imx2: Convert to use infrastructure triggered keepalives")
Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
Cc: stable <stable@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/watchdog_dev.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 7cefe9aad123..e98e4294ec92 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -596,7 +596,6 @@ static int watchdog_release(struct inode *inode, struct file *file)
if (!watchdog_hw_running(wdd))
module_put(wdd->ops->owner);

- cancel_delayed_work_sync(&wdd->work);
watchdog_update_worker(wdd);

/* make sure that /dev/watchdog can be re-opened */
--
2.11.0


[PATCH 04/10] watchdog: imx2: Convert to use infrastructure triggered keepalives

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit 11d7aba9ceb726d86aaaca3eb5f7d79de38989c5 upstream.

The watchdog infrastructure now supports handling watchdog keepalive
if the watchdog is running while the watchdog device is closed.
Convert the driver to use this infrastructure.

Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
drivers/watchdog/imx2_wdt.c | 74 ++++++++-------------------------------------
1 file changed, 13 insertions(+), 61 deletions(-)

diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c
index 29ef719a6a3c..378e7c5dfc50 100644
--- a/drivers/watchdog/imx2_wdt.c
+++ b/drivers/watchdog/imx2_wdt.c
@@ -25,7 +25,6 @@
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/io.h>
-#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
@@ -34,7 +33,6 @@
#include <linux/platform_device.h>
#include <linux/reboot.h>
#include <linux/regmap.h>
-#include <linux/timer.h>
#include <linux/watchdog.h>

#define DRIVER_NAME "imx2-wdt"
@@ -62,7 +60,6 @@
struct imx2_wdt_device {
struct clk *clk;
struct regmap *regmap;
- struct timer_list timer; /* Pings the watchdog when closed */
struct watchdog_device wdog;
struct notifier_block restart_handler;
};
@@ -151,16 +148,6 @@ static int imx2_wdt_ping(struct watchdog_device *wdog)
return 0;
}

-static void imx2_wdt_timer_ping(unsigned long arg)
-{
- struct watchdog_device *wdog = (struct watchdog_device *)arg;
- struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
-
- /* ping it every wdog->timeout / 2 seconds to prevent reboot */
- imx2_wdt_ping(wdog);
- mod_timer(&wdev->timer, jiffies + wdog->timeout * HZ / 2);
-}
-
static int imx2_wdt_set_timeout(struct watchdog_device *wdog,
unsigned int new_timeout)
{
@@ -177,40 +164,19 @@ static int imx2_wdt_start(struct watchdog_device *wdog)
{
struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);

- if (imx2_wdt_is_running(wdev)) {
- /* delete the timer that pings the watchdog after close */
- del_timer_sync(&wdev->timer);
+ if (imx2_wdt_is_running(wdev))
imx2_wdt_set_timeout(wdog, wdog->timeout);
- } else
+ else
imx2_wdt_setup(wdog);

- return imx2_wdt_ping(wdog);
-}
-
-static int imx2_wdt_stop(struct watchdog_device *wdog)
-{
- /*
- * We don't need a clk_disable, it cannot be disabled once started.
- * We use a timer to ping the watchdog while /dev/watchdog is closed
- */
- imx2_wdt_timer_ping((unsigned long)wdog);
- return 0;
-}
-
-static inline void imx2_wdt_ping_if_active(struct watchdog_device *wdog)
-{
- struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
+ set_bit(WDOG_HW_RUNNING, &wdog->status);

- if (imx2_wdt_is_running(wdev)) {
- imx2_wdt_set_timeout(wdog, wdog->timeout);
- imx2_wdt_timer_ping((unsigned long)wdog);
- }
+ return imx2_wdt_ping(wdog);
}

static const struct watchdog_ops imx2_wdt_ops = {
.owner = THIS_MODULE,
.start = imx2_wdt_start,
- .stop = imx2_wdt_stop,
.ping = imx2_wdt_ping,
.set_timeout = imx2_wdt_set_timeout,
};
@@ -257,7 +223,7 @@ static int __init imx2_wdt_probe(struct platform_device *pdev)
wdog->info = &imx2_wdt_info;
wdog->ops = &imx2_wdt_ops;
wdog->min_timeout = 1;
- wdog->max_timeout = IMX2_WDT_MAX_TIME;
+ wdog->max_hw_heartbeat_ms = IMX2_WDT_MAX_TIME * 1000;
wdog->parent = &pdev->dev;

ret = clk_prepare_enable(wdev->clk);
@@ -277,9 +243,10 @@ static int __init imx2_wdt_probe(struct platform_device *pdev)
watchdog_set_nowayout(wdog, nowayout);
watchdog_init_timeout(wdog, timeout, &pdev->dev);

- setup_timer(&wdev->timer, imx2_wdt_timer_ping, (unsigned long)wdog);
-
- imx2_wdt_ping_if_active(wdog);
+ if (imx2_wdt_is_running(wdev)) {
+ imx2_wdt_set_timeout(wdog, wdog->timeout);
+ set_bit(WDOG_HW_RUNNING, &wdog->status);
+ }

/*
* Disable the watchdog power down counter at boot. Otherwise the power
@@ -320,7 +287,6 @@ static int __exit imx2_wdt_remove(struct platform_device *pdev)
watchdog_unregister_device(wdog);

if (imx2_wdt_is_running(wdev)) {
- del_timer_sync(&wdev->timer);
imx2_wdt_ping(wdog);
dev_crit(&pdev->dev, "Device removed: Expect reboot!\n");
}
@@ -334,10 +300,9 @@ static void imx2_wdt_shutdown(struct platform_device *pdev)

if (imx2_wdt_is_running(wdev)) {
/*
- * We are running, we need to delete the timer but will
- * give max timeout before reboot will take place
+ * We are running, configure max timeout before reboot
+ * will take place.
*/
- del_timer_sync(&wdev->timer);
imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME);
imx2_wdt_ping(wdog);
dev_crit(&pdev->dev, "Device shutdown: Expect reboot!\n");
@@ -355,10 +320,6 @@ static int imx2_wdt_suspend(struct device *dev)
if (imx2_wdt_is_running(wdev)) {
imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME);
imx2_wdt_ping(wdog);
-
- /* The watchdog is not active */
- if (!watchdog_active(wdog))
- del_timer_sync(&wdev->timer);
}

clk_disable_unprepare(wdev->clk);
@@ -384,19 +345,10 @@ static int imx2_wdt_resume(struct device *dev)
* watchdog again.
*/
imx2_wdt_setup(wdog);
+ }
+ if (imx2_wdt_is_running(wdev)) {
imx2_wdt_set_timeout(wdog, wdog->timeout);
imx2_wdt_ping(wdog);
- } else if (imx2_wdt_is_running(wdev)) {
- /* Resuming from non-deep sleep state. */
- imx2_wdt_set_timeout(wdog, wdog->timeout);
- imx2_wdt_ping(wdog);
- /*
- * But the watchdog is not active, then start
- * the timer again.
- */
- if (!watchdog_active(wdog))
- mod_timer(&wdev->timer,
- jiffies + wdog->timeout * HZ / 2);
}

return 0;
--
2.11.0


[PATCH 03/10] watchdog: Make stop function optional

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit d0684c8a9354953efdea214b437445c00743cf49 upstream.

Not all hardware watchdogs can be stopped. The driver for
such watchdogs would typically only set the WATCHDOG_HW_RUNNING
flag in its stop function. Make the stop function optional and set
WATCHDOG_HW_RUNNING in the watchdog core if it is not provided.

Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
Documentation/watchdog/watchdog-kernel-api.txt | 20 ++++++++++++--------
drivers/watchdog/watchdog_core.c | 2 +-
drivers/watchdog/watchdog_dev.c | 6 +++++-
3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
index b12789745f86..a376f281f5a7 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -85,7 +85,8 @@ It contains following fields:
If set, the infrastructure will send heartbeats to the watchdog driver
if 'timeout' is larger than max_hw_heartbeat_ms, unless WDOG_ACTIVE
is set and userspace failed to send a heartbeat for at least 'timeout'
- seconds.
+ seconds. max_hw_heartbeat_ms must be set if a driver does not implement
+ the stop function.
* bootstatus: status of the device after booting (reported with watchdog
WDIOF_* status bits).
* driver_data: a pointer to the drivers private data of a watchdog device.
@@ -141,17 +142,20 @@ are:
device.
The routine needs a pointer to the watchdog timer device structure as a
parameter. It returns zero on success or a negative errno code for failure.
-* stop: with this routine the watchdog timer device is being stopped.
- The routine needs a pointer to the watchdog timer device structure as a
- parameter. It returns zero on success or a negative errno code for failure.
- Some watchdog timer hardware can only be started and not be stopped.
- If a watchdog can not be stopped, the watchdog driver must set the
- WDOG_HW_RUNNING flag in its stop function to inform the watchdog core that
- the watchdog is still running.

Not all watchdog timer hardware supports the same functionality. That's why
all other routines/operations are optional. They only need to be provided if
they are supported. These optional routines/operations are:
+* stop: with this routine the watchdog timer device is being stopped.
+ The routine needs a pointer to the watchdog timer device structure as a
+ parameter. It returns zero on success or a negative errno code for failure.
+ Some watchdog timer hardware can only be started and not be stopped. A
+ driver supporting such hardware does not have to implement the stop routine.
+ If a driver has no stop function, the watchdog core will set WDOG_HW_RUNNING
+ and start calling the driver's keepalive pings function after the watchdog
+ device is closed.
+ If a watchdog driver does not implement the stop function, it must set
+ max_hw_heartbeat_ms.
* ping: this is the routine that sends a keepalive ping to the watchdog timer
hardware.
The routine needs a pointer to the watchdog timer device structure as a
diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 873f13972cf4..41f1854d026b 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -145,7 +145,7 @@ static int __watchdog_register_device(struct watchdog_device *wdd)
return -EINVAL;

/* Mandatory operations need to be supported */
- if (wdd->ops->start == NULL || wdd->ops->stop == NULL)
+ if (!wdd->ops->start || (!wdd->ops->stop && !wdd->max_hw_heartbeat_ms))
return -EINVAL;

watchdog_check_min_max_timeout(wdd);
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index f38d4abf5ede..7cefe9aad123 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -237,7 +237,11 @@ static int watchdog_stop(struct watchdog_device *wdd)
goto out_stop;
}

- err = wdd->ops->stop(wdd);
+ if (wdd->ops->stop)
+ err = wdd->ops->stop(wdd);
+ else
+ set_bit(WDOG_HW_RUNNING, &wdd->status);
+
if (err == 0) {
clear_bit(WDOG_ACTIVE, &wdd->status);
watchdog_update_worker(wdd);
--
2.11.0


[PATCH 02/10] watchdog: Introduce WDOG_HW_RUNNING flag

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit ee142889e32f564f9b5e57b68b06693ec5473074 upstream.

The WDOG_HW_RUNNING flag is expected to be set by watchdog drivers if
the hardware watchdog is running. If the flag is set, the watchdog
subsystem will ping the watchdog even if the watchdog device is closed.

The watchdog driver stop function is now optional and may be omitted
if the watchdog can not be stopped. If stopping the watchdog is not
possible but the driver implements a stop function, it is responsible
to set the WDOG_HW_RUNNING flag in its stop function.

Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
Documentation/watchdog/watchdog-kernel-api.txt | 22 ++++++++----
drivers/watchdog/watchdog_dev.c | 47 ++++++++++++++++++++------
include/linux/watchdog.h | 10 ++++++
3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
index 9887fa6d8f68..b12789745f86 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -144,10 +144,10 @@ are:
* stop: with this routine the watchdog timer device is being stopped.
The routine needs a pointer to the watchdog timer device structure as a
parameter. It returns zero on success or a negative errno code for failure.
- Some watchdog timer hardware can only be started and not be stopped. The
- driver supporting this hardware needs to make sure that a start and stop
- routine is being provided. This can be done by using a timer in the driver
- that regularly sends a keepalive ping to the watchdog timer hardware.
+ Some watchdog timer hardware can only be started and not be stopped.
+ If a watchdog can not be stopped, the watchdog driver must set the
+ WDOG_HW_RUNNING flag in its stop function to inform the watchdog core that
+ the watchdog is still running.

Not all watchdog timer hardware supports the same functionality. That's why
all other routines/operations are optional. They only need to be provided if
@@ -191,9 +191,8 @@ they are supported. These optional routines/operations are:
The status bits should (preferably) be set with the set_bit and clear_bit alike
bit-operations. The status bits that are defined are:
* WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device
- is active or not. When the watchdog is active after booting, then you should
- set this status bit (Note: when you register the watchdog timer device with
- this bit set, then opening /dev/watchdog will skip the start operation)
+ is active or not from user perspective. User space is expected to send
+ heartbeat requests to the driver while this flag is set.
* WDOG_DEV_OPEN: this status bit shows whether or not the watchdog device
was opened via /dev/watchdog.
(This bit should only be used by the WatchDog Timer Driver Core).
@@ -202,6 +201,15 @@ bit-operations. The status bits that are defined are:
(This bit should only be used by the WatchDog Timer Driver Core).
* WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog.
If this bit is set then the watchdog timer will not be able to stop.
+* WDOG_HW_RUNNING: Set by the watchdog driver if the hardware watchdog is
+ running. The bit must be set if the watchdog timer hardware can not be
+ stopped. The bit may also be set if the watchdog timer is running after
+ booting, before the watchdog device is opened. If set, the watchdog
+ infrastructure will send keepalives to the watchdog hardware while
+ WDOG_ACTIVE is not set.
+ Note: when you register the watchdog timer device with this bit set,
+ then opening /dev/watchdog will skip the start operation but send a keepalive
+ request instead.
* WDOG_UNREGISTERED: this bit gets set by the WatchDog Timer Driver Core
after calling watchdog_unregister_device, and then checked before calling
any watchdog_ops, so that you can be sure that no operations (other then
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index df43c586e53f..f38d4abf5ede 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -68,7 +68,8 @@ static inline bool watchdog_need_worker(struct watchdog_device *wdd)
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
*/
- return watchdog_active(wdd) && hm && t > hm;
+ return hm && ((watchdog_active(wdd) && t > hm) ||
+ (t && !watchdog_active(wdd) && watchdog_hw_running(wdd)));
}

static long watchdog_next_keepalive(struct watchdog_device *wdd)
@@ -83,6 +84,9 @@ static long watchdog_next_keepalive(struct watchdog_device *wdd)
hw_heartbeat_ms = min(timeout_ms, wdd->max_hw_heartbeat_ms);
keepalive_interval = msecs_to_jiffies(hw_heartbeat_ms / 2);

+ if (!watchdog_active(wdd))
+ return keepalive_interval;
+
/*
* To ensure that the watchdog times out wdd->timeout seconds
* after the most recent ping from userspace, the last
@@ -139,7 +143,7 @@ static int watchdog_ping(struct watchdog_device *wdd)
goto out_ping;
}

- if (!watchdog_active(wdd))
+ if (!watchdog_active(wdd) && !watchdog_hw_running(wdd))
goto out_ping;

wdd->last_keepalive = jiffies;
@@ -158,7 +162,7 @@ static void watchdog_ping_work(struct work_struct *work)
work);

mutex_lock(&wdd->lock);
- if (wdd && watchdog_active(wdd))
+ if (wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd)))
__watchdog_ping(wdd);
mutex_unlock(&wdd->lock);
}
@@ -188,7 +192,10 @@ static int watchdog_start(struct watchdog_device *wdd)
goto out_start;

started_at = jiffies;
- err = wdd->ops->start(wdd);
+ if (watchdog_hw_running(wdd) && wdd->ops->ping)
+ err = wdd->ops->ping(wdd);
+ else
+ err = wdd->ops->start(wdd);
if (err == 0) {
set_bit(WDOG_ACTIVE, &wdd->status);
wdd->last_keepalive = started_at;
@@ -233,7 +240,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
err = wdd->ops->stop(wdd);
if (err == 0) {
clear_bit(WDOG_ACTIVE, &wdd->status);
- cancel_delayed_work(&wdd->work);
+ watchdog_update_worker(wdd);
}

out_stop:
@@ -519,7 +526,7 @@ static int watchdog_open(struct inode *inode, struct file *file)
* If the /dev/watchdog device is open, we don't want the module
* to be unloaded.
*/
- if (!try_module_get(wdd->ops->owner))
+ if (!watchdog_hw_running(wdd) && !try_module_get(wdd->ops->owner))
goto out;

err = watchdog_start(wdd);
@@ -528,7 +535,7 @@ static int watchdog_open(struct inode *inode, struct file *file)

file->private_data = wdd;

- if (wdd->ops->ref)
+ if (!watchdog_hw_running(wdd) && wdd->ops->ref)
wdd->ops->ref(wdd);

/* dev/watchdog is a virtual (and thus non-seekable) filesystem */
@@ -577,15 +584,22 @@ static int watchdog_release(struct inode *inode, struct file *file)
}

/* Allow the owner module to be unloaded again */
- module_put(wdd->ops->owner);
+ /*
+ * Allow the owner module to be unloaded again unless the watchdog
+ * is still running. If the watchdog is still running, it can not
+ * be stopped, and its driver must not be unloaded.
+ */
+ if (!watchdog_hw_running(wdd))
+ module_put(wdd->ops->owner);

cancel_delayed_work_sync(&wdd->work);
+ watchdog_update_worker(wdd);

/* make sure that /dev/watchdog can be re-opened */
clear_bit(WDOG_DEV_OPEN, &wdd->status);

/* Note wdd may be gone after this, do not use after this! */
- if (wdd->ops->unref)
+ if (!watchdog_hw_running(wdd) && wdd->ops->unref)
wdd->ops->unref(wdd);

return 0;
@@ -652,8 +666,21 @@ int watchdog_dev_register(struct watchdog_device *wdd)
misc_deregister(&watchdog_miscdev);
old_wdd = NULL;
}
+ return err;
}
- return err;
+
+ /*
+ * If the watchdog is running, prevent its driver from being unloaded,
+ * and schedule an immediate ping.
+ */
+ if (watchdog_hw_running(wdd)) {
+ __module_get(wdd->ops->owner);
+ if (wdd->ops->ref)
+ wdd->ops->ref(wdd);
+ queue_delayed_work(watchdog_wq, &wdd->work, 0);
+ }
+
+ return 0;
}

/*
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 26aba9b17ac3..cbe31d5b352c 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -106,6 +106,7 @@ struct watchdog_device {
#define WDOG_ALLOW_RELEASE 2 /* Did we receive the magic char ? */
#define WDOG_NO_WAY_OUT 3 /* Is 'nowayout' feature set ? */
#define WDOG_UNREGISTERED 4 /* Has the device been unregistered */
+#define WDOG_HW_RUNNING 5 /* True if HW watchdog running */
struct list_head deferred;
};

@@ -118,6 +119,15 @@ static inline bool watchdog_active(struct watchdog_device *wdd)
return test_bit(WDOG_ACTIVE, &wdd->status);
}

+/*
+ * Use the following function to check whether or not the hardware watchdog
+ * is running
+ */
+static inline bool watchdog_hw_running(struct watchdog_device *wdd)
+{
+ return test_bit(WDOG_HW_RUNNING, &wdd->status);
+}
+
/* Use the following function to set the nowayout feature */
static inline void watchdog_set_nowayout(struct watchdog_device *wdd, bool nowayout)
{
--
2.11.0


[PATCH 01/10] watchdog: Introduce hardware maximum heartbeat in watchdog core

Maxim Yu, Osipov
 

From: Guenter Roeck <linux@...>

commit 664a39236e718f9f03fa73fc01006da9ced04efc upstream.

Introduce an optional hardware maximum heartbeat in the watchdog core.
The hardware maximum heartbeat can be lower than the maximum timeout.

Drivers can set the maximum hardware heartbeat value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware heartbeat,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Signed-off-by: Guenter Roeck <linux@...>
Signed-off-by: Wim Van Sebroeck <wim@...>
[mosipov@... backported to 4.4.y]
Signed-off-by: Maxim Yu. Osipov <mosipov@...>
---
Documentation/watchdog/watchdog-kernel-api.txt | 19 +++-
drivers/watchdog/watchdog_dev.c | 122 +++++++++++++++++++++++--
include/linux/watchdog.h | 30 ++++--
3 files changed, 154 insertions(+), 17 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..9887fa6d8f68 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,6 +53,7 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+ unsigned int max_hw_heartbeat_ms;
void *driver_data;
struct mutex lock;
unsigned long status;
@@ -73,8 +74,18 @@ It contains following fields:
additional information about the watchdog timer itself. (Like it's unique name)
* ops: a pointer to the list of watchdog operations that the watchdog supports.
* timeout: the watchdog timer's timeout value (in seconds).
+ This is the time after which the system will reboot if user space does
+ not send a heartbeat request if WDOG_ACTIVE is set.
* min_timeout: the watchdog timer's minimum timeout value (in seconds).
-* max_timeout: the watchdog timer's maximum timeout value (in seconds).
+ If set, the minimum configurable value for 'timeout'.
+* max_timeout: the watchdog timer's maximum timeout value (in seconds),
+ as seen from userspace. If set, the maximum configurable value for
+ 'timeout'. Not used if max_hw_heartbeat_ms is non-zero.
+* max_hw_heartbeat_ms: Maximum hardware heartbeat, in milli-seconds.
+ If set, the infrastructure will send heartbeats to the watchdog driver
+ if 'timeout' is larger than max_hw_heartbeat_ms, unless WDOG_ACTIVE
+ is set and userspace failed to send a heartbeat for at least 'timeout'
+ seconds.
* bootstatus: status of the device after booting (reported with watchdog
WDIOF_* status bits).
* driver_data: a pointer to the drivers private data of a watchdog device.
@@ -160,7 +171,11 @@ they are supported. These optional routines/operations are:
and -EIO for "could not write value to the watchdog". On success this
routine should set the timeout value of the watchdog_device to the
achieved timeout value (which may be different from the requested one
- because the watchdog does not necessarily has a 1 second resolution).
+ because the watchdog does not necessarily have a 1 second resolution).
+ Drivers implementing max_hw_heartbeat_ms set the hardware watchdog heartbeat
+ to the minimum of timeout and max_hw_heartbeat_ms. Those drivers set the
+ timeout value of the watchdog_device either to the requested timeout value
+ (if it is larger than max_hw_heartbeat_ms), or to the achieved timeout value.
(Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
watchdog's info structure).
* get_timeleft: this routines returns the time that's left before a reset.
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 56a649e66eb2..df43c586e53f 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -35,9 +35,11 @@
#include <linux/module.h> /* For module stuff/... */
#include <linux/types.h> /* For standard types (like size_t) */
#include <linux/errno.h> /* For the -ENODEV/... values */
+#include <linux/jiffies.h> /* For timeout functions */
#include <linux/kernel.h> /* For printk/panic/... */
#include <linux/fs.h> /* For file operations */
#include <linux/watchdog.h> /* For watchdog specific items */
+#include <linux/workqueue.h> /* For workqueue */
#include <linux/miscdevice.h> /* For handling misc devices */
#include <linux/init.h> /* For __init/__exit/... */
#include <linux/uaccess.h> /* For copy_to_user/put_user/... */
@@ -49,6 +51,73 @@ static dev_t watchdog_devt;
/* the watchdog device behind /dev/watchdog */
static struct watchdog_device *old_wdd;

+static struct workqueue_struct *watchdog_wq;
+
+static inline bool watchdog_need_worker(struct watchdog_device *wdd)
+{
+ /* All variables in milli-seconds */
+ unsigned int hm = wdd->max_hw_heartbeat_ms;
+ unsigned int t = wdd->timeout * 1000;
+
+ /*
+ * A worker to generate heartbeat requests is needed if all of the
+ * following conditions are true.
+ * - Userspace activated the watchdog.
+ * - The driver provided a value for the maximum hardware timeout, and
+ * thus is aware that the framework supports generating heartbeat
+ * requests.
+ * - Userspace requests a longer timeout than the hardware can handle.
+ */
+ return watchdog_active(wdd) && hm && t > hm;
+}
+
+static long watchdog_next_keepalive(struct watchdog_device *wdd)
+{
+ unsigned int timeout_ms = wdd->timeout * 1000;
+ unsigned long keepalive_interval;
+ unsigned long last_heartbeat;
+ unsigned long virt_timeout;
+ unsigned int hw_heartbeat_ms;
+
+ virt_timeout = wdd->last_keepalive + msecs_to_jiffies(timeout_ms);
+ hw_heartbeat_ms = min(timeout_ms, wdd->max_hw_heartbeat_ms);
+ keepalive_interval = msecs_to_jiffies(hw_heartbeat_ms / 2);
+
+ /*
+ * To ensure that the watchdog times out wdd->timeout seconds
+ * after the most recent ping from userspace, the last
+ * worker ping has to come in hw_heartbeat_ms before this timeout.
+ */
+ last_heartbeat = virt_timeout - msecs_to_jiffies(hw_heartbeat_ms);
+ return min_t(long, last_heartbeat - jiffies, keepalive_interval);
+}
+
+static inline void watchdog_update_worker(struct watchdog_device *wdd)
+{
+ if (watchdog_need_worker(wdd)) {
+ long t = watchdog_next_keepalive(wdd);
+
+ if (t > 0)
+ mod_delayed_work(watchdog_wq, &wdd->work, t);
+ } else {
+ cancel_delayed_work(&wdd->work);
+ }
+}
+
+static int __watchdog_ping(struct watchdog_device *wdd)
+{
+ int err;
+
+ if (wdd->ops->ping)
+ err = wdd->ops->ping(wdd); /* ping the watchdog */
+ else
+ err = wdd->ops->start(wdd); /* restart watchdog */
+
+ watchdog_update_worker(wdd);
+
+ return err;
+}
+
/*
* watchdog_ping: ping the watchdog.
* @wdd: the watchdog device to ping
@@ -73,16 +142,27 @@ static int watchdog_ping(struct watchdog_device *wdd)
if (!watchdog_active(wdd))
goto out_ping;

- if (wdd->ops->ping)
- err = wdd->ops->ping(wdd); /* ping the watchdog */
- else
- err = wdd->ops->start(wdd); /* restart watchdog */
+ wdd->last_keepalive = jiffies;
+ err = __watchdog_ping(wdd);

out_ping:
mutex_unlock(&wdd->lock);
return err;
}

+static void watchdog_ping_work(struct work_struct *work)
+{
+ struct watchdog_device *wdd;
+
+ wdd = container_of(to_delayed_work(work), struct watchdog_device,
+ work);
+
+ mutex_lock(&wdd->lock);
+ if (wdd && watchdog_active(wdd))
+ __watchdog_ping(wdd);
+ mutex_unlock(&wdd->lock);
+}
+
/*
* watchdog_start: wrapper to start the watchdog.
* @wdd: the watchdog device to start
@@ -95,6 +175,7 @@ out_ping:
static int watchdog_start(struct watchdog_device *wdd)
{
int err = 0;
+ unsigned long started_at;

mutex_lock(&wdd->lock);

@@ -106,9 +187,13 @@ static int watchdog_start(struct watchdog_device *wdd)
if (watchdog_active(wdd))
goto out_start;

+ started_at = jiffies;
err = wdd->ops->start(wdd);
- if (err == 0)
+ if (err == 0) {
set_bit(WDOG_ACTIVE, &wdd->status);
+ wdd->last_keepalive = started_at;
+ watchdog_update_worker(wdd);
+ }

out_start:
mutex_unlock(&wdd->lock);
@@ -146,8 +231,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
}

err = wdd->ops->stop(wdd);
- if (err == 0)
+ if (err == 0) {
clear_bit(WDOG_ACTIVE, &wdd->status);
+ cancel_delayed_work(&wdd->work);
+ }

out_stop:
mutex_unlock(&wdd->lock);
@@ -211,6 +298,8 @@ static int watchdog_set_timeout(struct watchdog_device *wdd,

err = wdd->ops->set_timeout(wdd, timeout);

+ watchdog_update_worker(wdd);
+
out_timeout:
mutex_unlock(&wdd->lock);
return err;
@@ -490,6 +579,8 @@ static int watchdog_release(struct inode *inode, struct file *file)
/* Allow the owner module to be unloaded again */
module_put(wdd->ops->owner);

+ cancel_delayed_work_sync(&wdd->work);
+
/* make sure that /dev/watchdog can be re-opened */
clear_bit(WDOG_DEV_OPEN, &wdd->status);

@@ -527,6 +618,11 @@ int watchdog_dev_register(struct watchdog_device *wdd)
{
int err, devno;

+ if (!watchdog_wq)
+ return -ENODEV;
+
+ INIT_DELAYED_WORK(&wdd->work, watchdog_ping_work);
+
if (wdd->id == 0) {
old_wdd = wdd;
watchdog_miscdev.parent = wdd->parent;
@@ -573,6 +669,8 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)
set_bit(WDOG_UNREGISTERED, &wdd->status);
mutex_unlock(&wdd->lock);

+ cancel_delayed_work_sync(&wdd->work);
+
cdev_del(&wdd->cdev);
if (wdd->id == 0) {
misc_deregister(&watchdog_miscdev);
@@ -589,7 +687,16 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)

int __init watchdog_dev_init(void)
{
- int err = alloc_chrdev_region(&watchdog_devt, 0, MAX_DOGS, "watchdog");
+ int err;
+
+ watchdog_wq = alloc_workqueue("watchdogd",
+ WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+ if (!watchdog_wq) {
+ pr_err("Failed to create watchdog workqueue\n");
+ return -ENOMEM;
+ }
+
+ err = alloc_chrdev_region(&watchdog_devt, 0, MAX_DOGS, "watchdog");
if (err < 0)
pr_err("watchdog: unable to allocate char dev region\n");
return err;
@@ -604,4 +711,5 @@ int __init watchdog_dev_init(void)
void __exit watchdog_dev_exit(void)
{
unregister_chrdev_region(watchdog_devt, MAX_DOGS);
+ destroy_workqueue(watchdog_wq);
}
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 027b1f43f12d..26aba9b17ac3 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -10,8 +10,9 @@


#include <linux/bitops.h>
-#include <linux/device.h>
#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
#include <uapi/linux/watchdog.h>

struct watchdog_ops;
@@ -61,12 +62,17 @@ struct watchdog_ops {
* @bootstatus: Status of the watchdog device at boot.
* @timeout: The watchdog devices timeout value (in seconds).
* @min_timeout:The watchdog devices minimum timeout value (in seconds).
- * @max_timeout:The watchdog devices maximum timeout value (in seconds).
+ * @max_timeout:The watchdog devices maximum timeout value (in seconds)
+ * as configurable from user space. Only relevant if
+ * max_hw_heartbeat_ms is not provided.
+ * @max_hw_heartbeat_ms:
+ * Hardware limit for maximum timeout, in milli-seconds.
+ * Replaces max_timeout if specified.
* @driver-data:Pointer to the drivers private data.
* @lock: Lock for watchdog core internal use only.
* @status: Field that contains the devices internal status bits.
- * @deferred: entry in wtd_deferred_reg_list which is used to
- * register early initialized watchdogs.
+ * @deferred: Entry in wtd_deferred_reg_list which is used to
+ * register early initialized watchdogs.
*
* The watchdog_device structure contains all information about a
* watchdog timer device.
@@ -88,8 +94,11 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+ unsigned int max_hw_heartbeat_ms;
void *driver_data;
struct mutex lock;
+ unsigned long last_keepalive;
+ struct delayed_work work;
unsigned long status;
/* Bit numbers for status flags */
#define WDOG_ACTIVE 0 /* Is the watchdog running/active */
@@ -121,13 +130,18 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
{
/*
* The timeout is invalid if
+ * - the requested value is larger than UINT_MAX / 1000
+ * (since internal calculations are done in milli-seconds),
+ * or
* - the requested value is smaller than the configured minimum timeout,
* or
- * - a maximum timeout is configured, and the requested value is larger
- * than the maximum timeout.
+ * - a maximum hardware timeout is not configured, a maximum timeout
+ * is configured, and the requested value is larger than the
+ * configured maximum timeout.
*/
- return t < wdd->min_timeout ||
- (wdd->max_timeout && t > wdd->max_timeout);
+ return t > UINT_MAX / 1000 || t < wdd->min_timeout ||
+ (!wdd->max_hw_heartbeat_ms && wdd->max_timeout &&
+ t > wdd->max_timeout);
}

/* Use the following functions to manipulate watchdog driver specific data */
--
2.11.0


[PATCH 00/10] Backport of watchdog core triggered keepalive infrastructure v2

Maxim Yu, Osipov
 

Hello,

This is updated series after Ben's review.

* On 10/25/17 12:43, Ben Hutchings wrote:
On Wed, 2017-10-04 at 16:40 +0200, Maxim Yu. Osipov wrote:
From: Guenter Roeck <linux@...>

Backport from kernel.org, upstream commit ee142889e32f

The WDOG_HW_RUNNING flag is expected to be set by watchdog drivers if
the hardware watchdog is running. If the flag is set, the watchdog
subsystem will ping the watchdog even if the watchdog device is
closed.

The watchdog driver stop function is now optional and may be omitted
if the watchdog can not be stopped. If stopping the watchdog is not
possible but the driver implements a stop function, it is responsible
to set the WDOG_HW_RUNNING flag in its stop function.
[...]
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
[...]
@@ -651,9 +665,24 @@ int watchdog_dev_register(struct watchdog_device
*wdd)
if (wdd->id == 0) {
misc_deregister(&watchdog_miscdev);
old_wdd = NULL;
+ if (wdd->ops->unref)
+ wdd->ops->unref(wdd);
[...]

This doesn't look right. This is decrementing a reference counter
because the old_wdd reference is going away, but we never increment a
reference counter when old_wdd is set.

The upstream commit doesn't make any change here. There is a
kref_put() here in the upstream version but that was added as part of
the watchdog_device/watchdog_core_data split.

Ben.
* On 10/25/17 12:46, Ben Hutchings wrote:
Please use one of these formats for upstream references in the commit
messages:

commit 0123456789012345678901234567890123456789 upstream.

[ Upstream commit 0123456789012345678901234567890123456789 ]

Ben.
<<<<<<<<<<<<<


This series contains

* backport of patches of watchdog core infrastructure
supporting handling watchdog keepalive.

* imx21_wdt converted to use infrastructure triggered keepalives.

* backported support of WATCHDOG_HANDLE_BOOT_ENABLED option

On some systems it's desirable to have watchdog reboot the system
when it does not come up fast enough. This adds a kernel parameter
to disable the auto-update of watchdog before userspace takes over
and a kernel option to set the default.

One of the important use cases is safe system update.
In that case, the bootloader enables the watchdog, and Linux should only
take it over at the point the whole Linux system is sure to be able to
have a fully working system again, including userspace services that
could take the next update. If we start the trigger the watchdog too
early, like it was so far, we risk to enter a system state where the
kernel works but not all other services we need.

i.MX6 based board was used as a test platform.

We suppose that watchdog is enabled in bootloader and set to 120 seconds
(maximum timeout supported by i.MX watchdog timer)

After applying these patches, built-in i.MX watchdog (imx21_wdt) into
kernel and perform the following test cases:

1) option WATCHDOG_HANDLE_BOOT_ENABLED is turned off in kernel configuration
w/o userspace application triggering the watchdog.

Make sure that no userspace application is triggering watchdog.

After around 120 seconds (timeout set in bootloader) board will reboot.

2) option WATCHDOG_HANDLE_BOOT_ENABLED is turned off in kernel configuration
but userspace application is triggering watchdog more frequently that watchdog's
timeout (by default set to 60 seconds for imx21_wdt).

Watchdog will not reboot the board until the application re-arms
the watchdog.

3) option WATCHDOG_HANDLE_BOOT_ENABLED is turned on in kernel configuration
w/o userspace application triggering the watchdog.

Make sure that no userspace application is triggering watchdog.
Board will not reboot (watchdog keepalive is supported by watchdog core
infrastructure, not by driver itself)

Kind regards,
Maxim.

Guenter Roeck (6):
watchdog: Introduce hardware maximum heartbeat in watchdog core
watchdog: Introduce WDOG_HW_RUNNING flag
watchdog: Make stop function optional
watchdog: imx2: Convert to use infrastructure triggered keepalives
watchdog: core: Fix circular locking dependency
watchdog: core: Clear WDOG_HW_RUNNING before calling the stop function

Pratyush Anand (1):
watchdog: skip min and max timeout validity check when
max_hw_heartbeat_ms is defined

Rasmus Villemoes (1):
watchdog: change watchdog_need_worker logic

Sebastian Reichel (1):
watchdog: core: add option to avoid early handling of watchdog

Wei Yongjun (1):
watchdog: core: Fix error handling of watchdog_dev_init()

Documentation/watchdog/watchdog-kernel-api.txt | 51 +++++--
drivers/watchdog/Kconfig | 11 ++
drivers/watchdog/imx2_wdt.c | 74 ++--------
drivers/watchdog/watchdog_core.c | 4 +-
drivers/watchdog/watchdog_dev.c | 196 +++++++++++++++++++++++--
include/linux/watchdog.h | 40 ++++-
6 files changed, 277 insertions(+), 99 deletions(-)

--
2.11.0


Re: Status of cip-core

Binh Thanh. Nguyen <binh.nguyen.uw@...>
 

Hi Chris, Daniel,

Subject: RE: [cip-dev] Status of cip-core

Hello Daniel,

Apologies for the slow response. Lots of ramblings below, happy to set
up a call if it's easier to go through the details.

From: Daniel Sangorrin [mailto:daniel.sangorrin@...]
Sent: 24 October 2017 06:17

Hello Chris,

-----Original Message-----
From: Chris Paterson [mailto:Chris.Paterson2@...]
Sent: Tuesday, October 24, 2017 3:23 AM
To: Daniel Sangorrin; cip-dev@...; 'Jan Kiszka'
Subject: RE: [cip-dev] Status of cip-core

Hello Daniel,

Thank you for the update.

- modules.tgz is not generated for iwg20m (for bbb it is): I
have to investigate why but I think it is not a big problem
since they are included in the file system images. Probably the
reason is just that shmobile_defconfig has modules disabled.
Correct :)
I have noticed that the shmobile_defconfig from the official
linux-cip repo is quite different from the one at
https://github.com/renesas-
rz/renesas-cip.

Let me explain...

The renesas-rz github website is used to host the complete Yocto BSP
package for the RZ/G iWave platforms. This is used as a basis for the
"Renesas RZ/G Linux Platform" which was launched in October.

This BSP is based on the CIP Kernel, but as you can see has a lot of
patches applied on top. The main reason for these additional patches
is that the iwg20m platform is not yet fully supported upstream.

If using the above BSP for your testing, I'd recommend that you stick
with the
v4.4.55-cip3 branch. This has been fully verified and should be working.

Whilst Renesas are continuously working on the renesas-rz github
repository, in tandem we are also upstreaming support to the mainline
Kernel for the iwg20m platform. Periodically once a new Kernel release
has been made we are backporting support to the official CIP Kernel.

The latest CIP Kernel (v4.4.92-cip11) includes support for clk,
pinctrl, GPIO, Ethernet and eMMC on the iwg20m platform.



Previously I tested the official kernel using a ramdisk and it was
successful so I didn't notice the difference. But today I've tried
to boot the official kernel with the SD Card file system and it didn't work.
Currently the official CIP Kernel does not support SD on the iwg20m.
Support upstream will be included in Kernel v4.15. We plan to backport
support once
v4.15 is released, but can backport when v4.15-rc1 is out if there is
an urgent need.


One problem was that EXT3/4 wasn't enabled.
Yes. Historically shmobile_defconfig doesn't have ext3/4 enabled
upstream :( I guess one option would be to enable this on the CIP Kernel?

I decided to use renesas-cip's
shmobile_defconfig to build the official linux-cip kernel. However,
I got a build error because
https://gitlab.com/cip-project/linux-cip/blob/linux-4.4.y-
cip/drivers/usb/host/xhci-rcar.c
references a firmware called "r8a779x_usb3_v1.dlmem" which wasn't
available on the official kernel. I could solve that either by
copying the firmware from renesas-cip or by commenting out the
corresponding configuration flags.
There may well be other differences between the config used in the
renesas- rz BSP compared to the upstream/CIP version. It might be best
to stick with the version in the CIP Kernel and just enable EXT3/4 as
required. This is what we use to test when backporting to the CIP Kernel.


I managed to build the official kernel but still I couldn't get it
to boot from the SD Card.
Again, due to lack of SD support in the official Kernel (see
r8a7743-iwg20d- q7.dts).

# By the way, I noticed that the mmc numbering also changed
(mmcblk0p2 was working on renesas-cip kernel as the 2nd partition of
the SD Card, but for the official kernel that was a partition on the
eMMC and the kernel could not recognize mmcblk2p2).

I wonder if this problem comes from the device tree. On the
renesas-cip repository there was a r8a7743-iwgm.dts file that worked
fine, but on the official repository the one with the closest name
was
r8a7743-iwg20d-q7.dts.
This one worked fine when I boot from a ramdisk but not when I boot
from the SD Card.

Should I use renesas-cip kernel again?
As you have no need for the entire BSP (you only need the Kernel), I'd
recommend using the official CIP Kernel. For now this would mean that
you'd need to either use NFS or eMMC for your RFS. If using eMMC
you'll need to enable EXT3/4 on top of shmobile_defconfig.

This has the added benefit that CIP Core will use and test the
official CIP Kernel, rather than Renesas' out of tree BSP version.

I noticed that the new versions are been merged. I was using
renesas-cip's
v4.4.69-cip4 but I can see a newer branch called v4.4.83-cip8
v4.4.55-cip3 is the only version properly tested. I think the other
versions are just rebases.

Binh-san, could you confirm the current status of the newer branches
on renesas-rz?
v4.4.55-cip3 was fully tested, including UT,IT,ST.
v4.4.69-cip4 , v4.4.75-cip6 also worked with just some simple tests.

Best regards,
Binh Nguyen




Kind regards, Chris


Thanks,
Daniel


Re: Kselftest use-cases - Shuah Khan

Daniel Sangorrin <daniel.sangorrin@...>
 

Hi Ben,
# I added the fuego mailing list to Cc

Thanks for the notes!

-----Original Message-----
From: cip-dev-bounces@... [mailto:cip-dev-bounces@...] On Behalf Of Ben Hutchings
Sent: Wednesday, November 08, 2017 2:44 AM
To: cip-dev@...
Subject: [cip-dev] Kselftest use-cases - Shuah Khan

## Kselftest use-cases - Shuah Khan

[Description](https://osseu17.sched.com/event/CnFp/)

kselftest is the kernel developer regression test suite. It is
written by kernel developers and users. It is used by developers,
users, automated test infrastructure (kernel-ci.org, 0-day test
robot).
It also works on Fuego now!
# Thanks Tim for integrating my patches.

How to run tests:

* `make --silent kselftest` - run all default targets
(`TARGETS` in `tools/testing/selftests/Makefile`).
* `make --silent TARGETS=timers kselftest` - run all
non-destructive tests in `tools/testing/selftests/timers`
* `make --silent -C tools/testing/selftests/timers run_tests`
- same
In Fuego we use a different approach. First we cross-compile and install
the tests on a temporary folder. At this stage a script called "run_kselftest.sh"
is generated. Then, we deploy the binaries and the script to the target where
the "run_kselftest.sh" script is invoked.

The good part of this approach is that Fuego does not require the target board
to have a toolchain installed, kernel source on the target, etc.

Set `O=dir` for an out-of-tree build. (But cureently this
may require a `.config` file in the source directory.)

Set `quicktest=1` to exclude time-consuming tests.

kselftest outputs a summary of results (since 4.14) following
TAP (Test Anything Protocol).
Actually I think that this was proposed by Tim.
There is a TAP plugin for Jenkins that can be used for parsing the results in Fuego.
However, currently "run_kselftest.sh" doesn't seem to use the TAP format. humm..
Maybe this is on the TODO list upstream, I need to investigate it further.

The output of individual tests can be found in `/tmp` (currently),
but it should be changed to allow specifying directory.

It is possible to run latest selftests on older kernels, but there
will be some failures due to missing features. Ideally missing
dependencies result in a "skip" result but some maintainers aren't
happy to support that. One reason is that if a feature is broken so
badly it isn't detected, tests may be skipped rather than failed.
In Fuego there is now full control for specifying which test cases
are allowed to fail and which not. I will enable that functionality
on Fuego's integration scripts.

Some tests apparently check for dependencies in a kernel config file.
(It wasn't clear to me where they look for it.)
On some kselftest folders there is a "config" file that specifies the kernel
configuration options that need to be enabled (or disabled). From what I see
there is not a general script to check that they are configured on
the target kernel.

Fuego does support checking kernel configuration before running the test (using
information from /proc/config.gz or /boot/config-`uname -r`). Maybe it would a good
idea to add support on Fuego for checking individual kselftest's config files

Thank,
Daniel

Tips and hints:

* Use the `--silent` option to suppress make output
* Some tests need to run as root
* Beware that some tests are disruptive

More information:

* [Documentation/dev-tools/kselftest.rst](https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html)
* [Blog entries](https://blogs.s-osg.org/author/shuahkh/)


--
Ben Hutchings
Software Developer, Codethink Ltd.

_______________________________________________
cip-dev mailing list
cip-dev@...
https://lists.cip-project.org/mailman/listinfo/cip-dev


Re: Status of cip-core

Chris Paterson
 

Hello Daniel,

Apologies for the slow response. Lots of ramblings below, happy to set up a call if it's easier to go through the details.

From: Daniel Sangorrin [mailto:daniel.sangorrin@...]
Sent: 24 October 2017 06:17

Hello Chris,

-----Original Message-----
From: Chris Paterson [mailto:Chris.Paterson2@...]
Sent: Tuesday, October 24, 2017 3:23 AM
To: Daniel Sangorrin; cip-dev@...; 'Jan Kiszka'
Subject: RE: [cip-dev] Status of cip-core

Hello Daniel,

Thank you for the update.

- modules.tgz is not generated for iwg20m (for bbb it is): I have to
investigate why but I think it is not a big problem since they are
included in the file system images. Probably the reason is just that
shmobile_defconfig has modules disabled.
Correct :)
I have noticed that the shmobile_defconfig from the official linux-cip repo is
quite different from the one at https://github.com/renesas-rz/renesas-cip.
Let me explain...

The renesas-rz github website is used to host the complete Yocto BSP package for the RZ/G iWave platforms. This is used as a basis for the "Renesas RZ/G Linux Platform" which was launched in October.

This BSP is based on the CIP Kernel, but as you can see has a lot of patches applied on top. The main reason for these additional patches is that the iwg20m platform is not yet fully supported upstream.

If using the above BSP for your testing, I'd recommend that you stick with the v4.4.55-cip3 branch. This has been fully verified and should be working.

Whilst Renesas are continuously working on the renesas-rz github repository, in tandem we are also upstreaming support to the mainline Kernel for the iwg20m platform. Periodically once a new Kernel release has been made we are backporting support to the official CIP Kernel.

The latest CIP Kernel (v4.4.92-cip11) includes support for clk, pinctrl, GPIO, Ethernet and eMMC on the iwg20m platform.



Previously I tested the official kernel using a ramdisk and it was successful so
I didn't notice the difference. But today I've tried to boot the official kernel
with the SD Card file system and it didn't work.
Currently the official CIP Kernel does not support SD on the iwg20m. Support upstream will be included in Kernel v4.15. We plan to backport support once v4.15 is released, but can backport when v4.15-rc1 is out if there is an urgent need.


One problem was that EXT3/4 wasn't enabled.
Yes. Historically shmobile_defconfig doesn't have ext3/4 enabled upstream :( I guess one option would be to enable this on the CIP Kernel?

I decided to use renesas-cip's
shmobile_defconfig to build the official linux-cip kernel. However, I got a
build error because https://gitlab.com/cip-project/linux-cip/blob/linux-4.4.y-
cip/drivers/usb/host/xhci-rcar.c
references a firmware called "r8a779x_usb3_v1.dlmem" which wasn't
available on the official kernel. I could solve that either by copying the
firmware from renesas-cip or by commenting out the corresponding
configuration flags.
There may well be other differences between the config used in the renesas-rz BSP compared to the upstream/CIP version. It might be best to stick with the version in the CIP Kernel and just enable EXT3/4 as required. This is what we use to test when backporting to the CIP Kernel.


I managed to build the official kernel but still I couldn't get it to boot from the
SD Card.
Again, due to lack of SD support in the official Kernel (see r8a7743-iwg20d-q7.dts).

# By the way, I noticed that the mmc numbering also changed (mmcblk0p2
was working on renesas-cip kernel as the 2nd partition of the SD Card, but for
the official kernel that was a partition on the eMMC and the kernel could not
recognize mmcblk2p2).

I wonder if this problem comes from the device tree. On the renesas-cip
repository there was a r8a7743-iwgm.dts file that worked fine, but on the
official repository the one with the closest name was r8a7743-iwg20d-q7.dts.
This one worked fine when I boot from a ramdisk but not when I boot from
the SD Card.

Should I use renesas-cip kernel again?
As you have no need for the entire BSP (you only need the Kernel), I'd recommend using the official CIP Kernel. For now this would mean that you'd need to either use NFS or eMMC for your RFS. If using eMMC you'll need to enable EXT3/4 on top of shmobile_defconfig.

This has the added benefit that CIP Core will use and test the official CIP Kernel, rather than Renesas' out of tree BSP version.

I noticed that the new versions are been merged. I was using renesas-cip's
v4.4.69-cip4 but I can see a newer branch called v4.4.83-cip8
v4.4.55-cip3 is the only version properly tested. I think the other versions are just rebases.

Binh-san, could you confirm the current status of the newer branches on renesas-rz?


Kind regards, Chris


Thanks,
Daniel


Re: Interesting talks at OSSE/Kernel Summit

Chris Paterson
 

From: cip-dev-bounces@... [mailto:cip-dev-
bounces@...] On Behalf Of Jan Kiszka
Sent: 08 November 2017 07:33

On 2017-11-07 18:40, Ben Hutchings wrote:
I attended several talks at OSSE and Kernel Summit in Prague that
might be interesting to CIP members. These weren't recorded but I
made some notes on them. I'll send my notes on each talk as a reply
to this message.

Ben.
Thanks for the valuable summaries, Ben!
+1

9441 - 9460 of 10158