Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Pose Estimation Using POSIT

Yufeng Zhu University of Southern California Deepak Subramanian University of Southern California Anu Raj Srivastava University of Southern California

Figure 1: (left) Wii Remote; (center) Kinect for Xbox 360; (right) PlayStation Move

Abstract
Pose estimation is a hot research topic recently in several research elds, especially computer vision. So far, quite a lot of researchers have contributed to working out how to capture the real world objects efciently and extract the useful pose information robustly. We started a new project this term and referred to the POSIT algorithm rst proposed by Daniel DeMenthon from University of Maryland to deal with the single camera based pose estimation problem. Moreover, we made slightly change to make the system stable and the output numerical correct. Finally, we set up several interactive applications combining our pose estimation system as the input to demonstrate how it can be related with our life. Keywords: computer vision, pose estimation, blob tracking, POSIT, matrix orthonormalizaion

frared camera to handle the pose estimation problem. Furthermore, we will create several applications to demonstrate how the system works.

3
3.1

Software and Device


Software Used

OpenCV library, Intel Math Kernel library, OpenGL library, Visual Studio C++ 2008, Visual Fortran Compiler 11

3.2

Device Used

8x10 checkerboard, infrared camera, infrared reective markers

Related Work

4
4.1

Method
Description of System

In the history of video games, the seventh generation of consoles is the current one, made up of those consoles released since late 2005 by Nintendo, Microsoft, and Sony. Each new console introduced a new type of breakthrough in technology. The Xbox 360 offered HD upscaling to 1080p, the PlayStation 3 offered full 1080p highdenition graphics and Blu-ray Disc technology, and the Wii focused on integrating controllers with movement sensors as well as joysticks (Figure1. left). Recently, joining Nintendo in the motion market, Sony has released the PlayStation Move (Figure1. right). The PlayStation move features motion sensing gaming, similar to that of the Nintendos Wii. Microsoft has also joined Sony and Nintendo, with its new Kinect (Figure1. center). Unlike the other two systems (PlayStation 3 and Wii), Kinect does not use any controllers of any sort and makes users the controller. Our research goal is to try another technique which is quite stable and especially more cheap and easy-made in order to provide opportunity to more people with this brand new input control experience.

The input of our system is frame series captured from a calibrated single infrared camera. Then each frame will go through the following steps: Binary Filter Blob Tracking Mapping POSIT Matrix Orthonormalization Applied Pose Information and Rendering

Objectives

In this project, we tend to set up a robust and real-time system which captures the real world objects pose information using a single in e-mail: e-mail:

yufengzh@usc.edu dsubrama@usc.edu e-mail: arsrivas@usc.edu

Figure 2: Device Used

4.1.1

Binary Filter

4.1.4

POSIT

In image processing, it is called thresholding which is the simplest method of image segmentation. From a grayscale image, binary lter can be used to create binary images. During the ltering process, individual pixels in an image are marked as object pixels if their value is greater than some threshold value and as background pixels otherwise. Typically, an object pixel is given a value of 1 while a background pixel is given a value of 0. Finally, a binary image is created by coloring each pixel white or black.
4.1.2 Blob Tracking

This part can be divided into two modules: blob detection and tracking. In the area of computer vision, blob detection refers to visual modules that are aimed at detecting points or regions in the image that are either brighter or darker than the surrounding. Then the tracking module provides a way to track blobs (collections of pixels) from one frame to the next. Often as part of any object tracking solution it is necessary to identify not only that object moved but which object was which when comparing the current frame and the last frame. The blob tracking module will label each blob with a specic id that will be attached to the same or similar blob in the next frame. What denes a blob as being similar in two images depends on how you have congured the blob tracking module.
4.1.3 Mapping

It is a method for nding the pose of an object from a single image. It assumes that the users can detect and match in the image four or more noncoplanar feature points of the object, and also know their relative geometry on the object. This method combines two algorithms. The rst algorithm, POS (Pose from Orthography and Scaling) approximates the perspective projection with a scaled orthographic projection and nds the rotation matrix and the translation vector of the object by solving a linear system; the second algorithm, POSIT (POS with Iterations), uses in its iteration loop the approximate pose found by POS in order to compute better scaled orthographic projections of the feature points, then applies POS to these projections instead of the original image projections. POSIT converges to accurate pose measurements in a few iterations.

Figure 3: Scaled Orthographic Projection Model


4.1.5 Matrix Orthonormalization

For this part, we tried two ways to cope with the mapping solution. As the original POSIT algorithm cannot build the correspondence relationship of the feature points between two spaces, namely model space and image space, the users have to provide the mapping denition before they feed the input parameters to POSIT. However, this is not a practical solution especially when the objects grow complex. At rst, we tried to add a remapping module in each frame processing before POSIT step, which is stable and robust when self occlusion occurs, even though it still cannot handle the mapping automatically. Then we referred to an improved version POSIT algorithm called SoftPOSIT which is also provided by Daniel DeMenthon. I tried to mix compiling with a Fortran version library code provided by a Caltech Phd student and a C++ calling program. Moreover, I also converted the Fortran library into a C++ version. Even though both of them did not work properly when combined with our system, I believe it is only due to the input parameter format. Further contribution will be focused on the combination.

As we applied iterative numerical method in previous step, the numerical error is inevitable. The most evident effect is that the rotation matrix which is the output of POSIT is not a pure rotation anymore. It is due to the fact that a pure rotation matrixs column vectors (or row vectors depending on which hand coordinate system the users applied) should be perpendicular with each other and also with unit length themselves. However the numerical error may cause the vectors to slightly tilt or stretch. In order to enforce the rotation matrix satisfying a pure rotation constraint, we introduced a matrix orthonormolization step after POSIT processing. The method we applied in this step is GramSchmidt process.
4.1.6 Applied Pose Information and Rendering

Finally we applied the corrected pose information to a virtual object and rendered using OpenGL pipeline to check whether the system works properly.

Moreover, the system is compatible with other interactive application as long as users provide a pose denition to input denition mapping. We provide a application control interface based on Windows Message Mechanism. Users can take our pose estimation system as an alternate input approach. Nowadays, more and more interactive input methods have been created. For instance, Wii Remote, Kinect for Xbox 360, etc. Our system is also a good choice for interactive control.

(a)

(b)

Conclusion

Figure 4: Rendered Result Comparison: (a) is without Orthonormalization; (b) is after Orthonormalization

4.2

Description of Device

In our project, the devices used are quite simple. Even the infrared camera can be made from a simple web camera as long as you add a optical lter to it.

We have already build a robust and stable system handling pose estimation accurately. The system can be applied to interactive application with simple control object. Moreover, we also provide a stable and feasible SoftPOSIT library. Due to time limitation, the combination of SoftPOSIT is still in progress. Further contributions are needed to focus on the combination of the SoftPOSIT module, especially working out the proper input parameter format. To be more complicated, we can replace the infrared camera with a optical camera, which will help us get rid of the annoying reective marker. Several papers have already discussed this issue, which will be a good help for further improvement. Kinect of Microsoft mechanism is also a good choice for implementation.

Milestones Acknowledgements
This is a research project conducted by USC ICT group. We would like to thank David Martin and Chien-Yen Chang for their great help to us during the whole semester.

October 02, blob tracking module nished October 08, camera calibration module nished October 08, POSIT module nished October 14, start module combination October 14, blob tracking module combined October 14, camera calibration module combined October 14, POSIT module combined October 15, xed bugs in the system and get the pose information October 15, OpenGL rendering module nished November 22, SoftPOSIT module nished December 03, SoftPOSIT module combined December 06, interactive application nished December 07, orthonormalization module combined

References
DAVID , P., D E M ENTHON , D., D URAISWAMI , R., AND S AMET, H. 2004. Softposit: Simultaneous pose and correspondence determination. International Journal of Computer Vision 59, 259 284. 10.1023/B:VISI.0000025800.10423.1f. D EMENTHON , D. F., AND DAVIS , L. S. 1995. Model-based object pose in 25 lines of code. International Journal of Computer Vision 15, 123141. 10.1007/BF01450852. G OLD , S., AND R ANGARAJAN , A. 1996. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 377388. G OLD , S., R ANGARAJAN , A., PING L U , C., AND M JOLSNESS , E. 1997. New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition 31, 957 964. PARK , I., G ERMANN , M., B REITENSTEIN , M., AND P FISTER , H. 2010. Fast and automatic object pose estimation for range images on the gpu. Machine Vision and Applications 21, 749 766. 10.1007/s00138-009-0209-8.

Video Link

http://vimeo.com/17685762

Achievements

Finally, we build a real-time system coping with the pose estimation problem using a single infrared camera. Any time you use a new camera, you need to calibrate it before start our system. The program will store the conguration information of the camera into le system, which means for each camera you only need to calibrate it once. However, to achieve more accurate result, you can calibrate it whenever you want to use the system. After you start pose estimation, the camera will generate a video stream, namely frame series, as the input of our system. The stream will go through the processes as we listed in Section 2. Finally, we can control the virtual object using our system accurately.

You might also like