[ovs-dev] [PATCH] datapath-windows: Solved BSOD when uninstalling the driver (race condition)

Sorin Vinturis svinturis at cloudbasesolutions.com
Fri Apr 3 08:54:52 UTC 2015


Eitan,

I see what you mean. You are right. I will provide a v2 patch to address your point.

Thanks,
Sorin

-----Original Message-----
From: Eitan Eliahu [mailto:eliahue at vmware.com] 
Sent: Friday, 3 April, 2015 02:11
To: Sorin Vinturis; dev at openvswitch.org
Subject: RE: [PATCH] datapath-windows: Solved BSOD when uninstalling the driver (race condition)


Hi Sorin,
Per your example: suppose there is another IRP which is currently executing and almost completed. This thread will call release and free the memory pointed by switchContext. As you said the switchContext of the first thread maintains its value. Then, reference to - switchContext ->refCount will generate a memory violation.

Also, I think gOvsSwitchContext need to be memory synchronized with a memory barrier.
Thanks,
Eitan


-----Original Message-----
From: Sorin Vinturis [mailto:svinturis at cloudbasesolutions.com] 
Sent: Thursday, April 02, 2015 3:12 PM
To: Eitan Eliahu; dev at openvswitch.org
Subject: RE: [PATCH] datapath-windows: Solved BSOD when uninstalling the driver (race condition)

Hi Eitan,

I will try to explain with an example.

Considering that switchContext != NULL and switchContext->refCount == 1, we have the following scenario:

On thread 1, a new packet is received and OvsAcquireSwitchContext() is called. The first condition checks the switchContext against NULL and it is FALSE. The thread 1 execution is stalled and the context is switched to thread 2.

On thread 2, filter detach is called which later calls OvsReleaseSwitchContext(). The first condition checks to see if switchContext->refCount == 1. The condition is TRUE and switchContext->refCount is set to 0 (zero). Then the gOvsSwitchContext is released and set to NULL. Thread 1 resumes execution.

On thread 1, the execution is resumed and the second condition is checked. The second condition checks to see if switchContext->refCount == 0. The condition is TRUE and the function returns FALSE, whithout increasing the switchContext->refCount. Here, please observe that switchContext is the argument passed to the OvsAcquireSwitchContext() function and it points to the address of gOvsSwitchContext. If the latter is released and set to NULL, the former retains its value. Thus, the value at &switchContext->refCount is valid and zero. Subsequent calls of OvsAcquireSwitchContext() will return FALSE, because the switchContext is NULL.

I hope the above explanation answers your question.

Thanks,
Sorin

-----Original Message-----
From: Eitan Eliahu [mailto:eliahue at vmware.com] 
Sent: Friday, 3 April, 2015 00:21
To: Sorin Vinturis; dev at openvswitch.org
Subject: RE: [PATCH] datapath-windows: Solved BSOD when uninstalling the driver (race condition)



Hi Sorin,

Can you please explain how the case where  FilterDetach is called on a different core concurrently with the Release and between switchContext is checked for NULL and the swicthContext->refCount is referenced ? (where it is marked ---->>) It seems that FilterDetach would set gOvsSwitchContext to NULL even if there are outstanding IRPs and the switch context has not been released as of yet.

+    do {
+        if (NULL == switchContext) {
+            break;
+        }
---->>
+        if (0 == InterlockedCompareExchange(
+            (LONG volatile *)&switchContext->refCount, 0, 0)) {
+            break;
+        }

Can you inline the acquisition and release functions?
Thanks!
Eitan

-----Original Message-----
From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Sorin Vinturis
Sent: Friday, March 20, 2015 10:28 AM
To: dev at openvswitch.org
Subject: [ovs-dev] [PATCH] datapath-windows: Solved BSOD when uninstalling the driver (race condition)

The BSOD occurred because the FilterAttach routine released the switch context, while there were IRPs in processing.

The solution was to add a reference count in the switch context to prevent premature deallocation of the switch context structure.

Signed-off-by: Sorin Vinturis <svinturis at cloudbasesolutions.com>
Reported-by: Alin Gabriel Serdean <aserdean at cloudbasesolutions.com>
Reported-at: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs-2Dissues_issues_58&d=AwIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=CWsgHUxi6ExLXY798tmo3LJ4e3geGYp56lkcH-5cLCY&m=bKJV7k8Y9bG2Z1Hczw1uCkihZsQUivQ4nMAjrAWy20I&s=JgFv3Pq9CjJRPHRfSRgXB4ibNGyMlSl8a6kKOfNQ2_k&e=
---
 datapath-windows/ovsext/Datapath.c | 10 +++++----
 datapath-windows/ovsext/Switch.c   | 44 ++++++++++++++++++++++++++++++++++++++
 datapath-windows/ovsext/Switch.h   | 12 ++++++++++-
 3 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/datapath-windows/ovsext/Datapath.c b/datapath-windows/ovsext/Datapath.c
index 8eb13f1..8b561a3 100644
--- a/datapath-windows/ovsext/Datapath.c
+++ b/datapath-windows/ovsext/Datapath.c
@@ -699,10 +699,9 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject,
     outputBufferLen = irpSp->Parameters.DeviceIoControl.OutputBufferLength;
     inputBuffer = irp->AssociatedIrp.SystemBuffer;
 
-    /* Check if the extension is enabled. */
-    if (NULL == gOvsSwitchContext) {
-        status = STATUS_DEVICE_NOT_READY;
-        goto done;
+    if (!OvsAcquireSwitchContext(gOvsSwitchContext)) {
+        status = STATUS_NOT_FOUND;
+        goto exit;
     }
 
     /* Concurrent netlink operations are not supported. */ @@ -874,6 +873,9 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject,
     status = InvokeNetlinkCmdHandler(&usrParamsCtx, nlFamilyOps, &replyLen);
 
 done:
+    OvsReleaseSwitchContext(gOvsSwitchContext);
+
+exit:
     KeMemoryBarrier();
     instance->inUse = 0;
 
diff --git a/datapath-windows/ovsext/Switch.c b/datapath-windows/ovsext/Switch.c
index a228d8e..8366944 100644
--- a/datapath-windows/ovsext/Switch.c
+++ b/datapath-windows/ovsext/Switch.c
@@ -420,6 +420,9 @@ OvsInitSwitchContext(POVS_SWITCH_CONTEXT switchContext)
     switchContext->isActivateFailed = FALSE;
     switchContext->dpNo = OVS_DP_NUMBER;
     ovsTimeIncrementPerTick = KeQueryTimeIncrement() / 10000;
+
+    switchContext->refCount = 1;
+
     OVS_LOG_TRACE("Exit: Succesfully initialized switchContext: %p",
                   switchContext);
     return NDIS_STATUS_SUCCESS;
@@ -428,6 +431,12 @@ OvsInitSwitchContext(POVS_SWITCH_CONTEXT switchContext)  static VOID  OvsUninitSwitchContext(POVS_SWITCH_CONTEXT switchContext)  {
+    OvsReleaseSwitchContext(switchContext);
+}
+
+VOID
+OvsDeleteSwitchContext(POVS_SWITCH_CONTEXT switchContext) {
     OVS_LOG_TRACE("Enter: Delete switchContext:%p", switchContext);
 
     /* We need to do cleanup for tunnel port here. */ @@ -447,9 +456,44 @@ OvsUninitSwitchContext(POVS_SWITCH_CONTEXT switchContext)
     switchContext->pidHashArray = NULL;
     OvsDeleteFlowTable(&switchContext->datapath);
     OvsCleanupBufferPool(switchContext);
+    switchContext->refCount = 0;
     OVS_LOG_TRACE("Exit: Delete switchContext: %p", switchContext);  }
 
+VOID
+OvsReleaseSwitchContext(POVS_SWITCH_CONTEXT switchContext) {
+    if (1 == InterlockedCompareExchange(
+        (LONG volatile *)&switchContext->refCount, 0, 1)) {
+        OvsDeleteSwitchContext(switchContext);
+    }
+    else {
+        InterlockedDecrement((LONG volatile *)&switchContext->refCount);
+    }
+}
+
+BOOLEAN
+OvsAcquireSwitchContext(POVS_SWITCH_CONTEXT switchContext) {
+    BOOLEAN ret = FALSE;
+
+    do {
+        if (NULL == switchContext) {
+            break;
+        }
+
+        if (0 == InterlockedCompareExchange(
+            (LONG volatile *)&switchContext->refCount, 0, 0)) {
+            break;
+        }
+
+        InterlockedIncrement((LONG volatile *)&switchContext->refCount);
+        ret = TRUE;
+    } while (!ret);
+
+    return ret;
+}
+
 /*
  * --------------------------------------------------------------------------
  *  This function activates the switch by initializing it with all the runtime diff --git a/datapath-windows/ovsext/Switch.h b/datapath-windows/ovsext/Switch.h
index 7960072..25db636 100644
--- a/datapath-windows/ovsext/Switch.h
+++ b/datapath-windows/ovsext/Switch.h
@@ -179,6 +179,12 @@ typedef struct _OVS_SWITCH_CONTEXT
     volatile LONG pendingOidCount;
 
     OVS_NBL_POOL            ovsPool;
+
+    /*
+     * Reference count used to prevent premature deallocation of the switch
+     * context.
+     */
+    INT32                   refCount;
 } OVS_SWITCH_CONTEXT, *POVS_SWITCH_CONTEXT;
 
 
@@ -202,7 +208,6 @@ OvsAcquireDatapathWrite(OVS_DATAPATH *datapath,
                            dispatch ? NDIS_RWL_AT_DISPATCH_LEVEL : 0);  }
 
-
 static __inline VOID
 OvsReleaseDatapath(OVS_DATAPATH *datapath,
                    LOCK_STATE_EX *lockState) @@ -211,6 +216,11 @@ OvsReleaseDatapath(OVS_DATAPATH *datapath,
     NdisReleaseRWLock(datapath->lock, lockState);  }
 
+BOOLEAN
+OvsAcquireSwitchContext(POVS_SWITCH_CONTEXT switchContext);
+
+VOID
+OvsReleaseSwitchContext(POVS_SWITCH_CONTEXT switchContext);
 
 PVOID OvsGetExternalVport();
 
--
1.9.0.msysgit.0
_______________________________________________
dev mailing list
dev at openvswitch.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailman_listinfo_dev&d=AwIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=CWsgHUxi6ExLXY798tmo3LJ4e3geGYp56lkcH-5cLCY&m=bKJV7k8Y9bG2Z1Hczw1uCkihZsQUivQ4nMAjrAWy20I&s=kQtENH_VEx8FY3DCnOS32-4BoIo4cicBvy-57X1DFfY&e= 


More information about the dev mailing list