Sign Language Feature Extraction and Hand Normalization Questions

Tags: #<Tag:0x007fbb2569e2f8>

deathslice246 2018-04-01 16:35:24 UTC #1

So at this moment what I'm trying to do is to extract certain features such as the distance from the fingertip to the hand's center and the distance from adjacent fingertips to be able to recognize gestures from ASL. I not sure if this is the best way of going about it so I would like some help with this.

Here is code that attempts to do that

/*
 * featureVectorList[0] = PalmToThumb Distance
 * featureVectorList[1] = PalmToIndex Distance
 * featureVectorList[2] = PalmToMiddle Distance
 * featureVectorList[3] = PalmToRing Distance
 * featureVectorList[4] = PalmToPinky Distance
 */
private void calculatePalmToFingerDistances(List<double> featureVectorList, Frame frame)
{
    foreach (Hand hand in frame.Hands)
    {
        Matrix handTransform = hand.Basis;
        handTransform.origin = hand.PalmPosition;
        handTransform = handTransform.RigidInverse();

        foreach (Finger finger in hand.Fingers)
        {
            Vector transformedTipPosition = handTransform.TransformPoint(finger.TipPosition);
            Vector transformedPalmPosition = handTransform.TransformPoint(hand.PalmPosition);

            featureVectorList.Add(transformedTipPosition.DistanceTo(transformedPalmPosition));
        }
    }
}

/* 
 * featureVectorList[5] = PinkyToRing Distance
 * featureVectorList[6] = RingToMiddle Distance
 * featureVectorList[7] = MiddleToIndex Distance
 * featureVectorList[8] = IndexToThumb Distance
 */

private void calculateAdjacentFingerDistances(List<double> featureVectorList, Frame frame)
{
    foreach (Hand hand in frame.Hands)
    {
        Matrix handTransform = hand.Basis;
        handTransform.origin = hand.PalmPosition;
        handTransform = handTransform.RigidInverse();

        for (int i = hand.Fingers.Count - 1; i > 0; i--)
        {
            Vector currentFinger = hand.Fingers[i].TipPosition - hand.PalmPosition;
            Vector previousFinger = hand.Fingers[i - 1].TipPosition - hand.PalmPosition;

            Vector transformedCurrTipPosition = handTransform.TransformPoint(currentFinger);
            Vector transformedPrevTipPosition = handTransform.TransformPoint(previousFinger);

            featureVectorList.Add(transformedCurrTipPosition.DistanceTo(transformedPrevTipPosition));
        }
    }

I don't fully understand what a basis matrix is(or an inverse matrix for that matter) or if it is necessary to transform each position from one coordinate system to another in my case. So if someone can briefly explain what a basis matrix is and how it can be used in this situation that would be great.

Next question is I also want to make the distances invariant to the size of the user's hand(basically normalize the distances). The two ways I thought about achieving was by using this formula on each feature

So for example if I have a list with 10 feature vectors with all these attributes(PalmToThumb, PalmToIndex etc), I would find the min and max for each feature and use that formula above to normalize them all.

My second idea was to divide each distance by hand's sphere radius. Which one would be more appropriate for my case?

I'm also standardizing all of my features by subtracting the mean from the data point and dividing the result by the standard deviation but again I'm not sure if that is necessary.

Last thing to note is that I'm using unity 2017 with leap motion V2.3.1 tracking. Many thanks for your assistance.