Wednesday, October 29, 2008

Time domain Saliency Map in video(idea)

We use multiscale image to analysis its saliency map (and do most of other jobs), mainly because it reflect in different scale level the apperance of objects. So in video, we should be able to use multiscale in time domain (multiple time span) to capture the motion of objects.

Salient features in an image are statistically distinguishable from the background (Rosenholtz,1999; Torralba, 2003). So the salient movements in an video are also distinguishable from the background, in motion feature space.

application in top my head: distinguish a ship from sea, find something in flowing water.

Tuesday, October 21, 2008

Multi-View HDR research (finished)

EC520 course project
Berkin and I are going to take a little journey in HDRI technology.
New ideas: content-aware HDRI
Experiment Setup:

Preliminary result:
Result using 'Memorial' image sequence. Using our method, it only take seconds in Matlab for full calculation of the final HDR image. The actual dynamic range of the resulting image is very high, so we have to do linear transformation to display. In the final color image you can even see blue sky through the windows.
other results:
anlysis process of the image, final image is showed in the last image of the 6.
report:
B. Abanoz and M. Wang, "A review of high dynamic range imaging on static scenes," Tech. Rep. 2008-04, Boston University, Dept. of Electr. and Comp. Eng., Dec. 2008 (EC520 course project), [abstract] [PDF: 2,020KB].

Motion analysis Using Ultra-high Temporal Resolution Camera (Proposal)

Proposed system can detect high-speed-moving objects in heavy-ocluded environment (like forest), in which most of current motion detection technique will fail. I proposed a novel motion estimation scheme which can settle this problems and enble parellel implementations.

Some ideas towards this proposal:
1.Current motion estimation is based on two consecuted images in a video stream, about 30fps. This is not what we human do to detect motion. Although our eyes' temporal resolution is about 50fps, but it is during that period of exposure to the scene (1s/50) we perceive motion information, not after.
2.Current methods cannot tackle occlusion well unless using fancy but complicated tricks.
3.One interesting paper: CVPR 2008 "Motion from Blur"
4.Such camera is available on the market with a reasonable price, Casio EX-FH20 and EX-F1 with astonishing 1200fps!
5.The result could be carried out in parellel so that we can us GPU to do that.

Preliminary Experiment Result:
Gone are complexed feature-extraction or classic optical flow methods, just simple image-shift and sum can get the job done!

Muti-level Semantic Segmentation and Self-grouping in Scene Understanding (On-going)

The basic idea is: for each pixel in the image, find its neighbors in feature space (color, MRF, SIFT....) the feature space is ranged in a way that reflects structural 'importance'. For ocean and grass, in the case of color and markov field moldel for example, waters (or grass) will put togather as neighbors because their share similar color (or pattern). Each group of pixels in different features will jointly unveil its underlying properties.

Nov 19 2008:

Saddly, I found out today on Antonio's course 6.870 that this idea have been tested by D. Hoiem in "Geometric Context from a Single Image", ICCV 2005. http://www.cs.uiuc.edu/homes/dhoiem/projects/context/index.html but on the other side, their work showed my idea works! It is exciting when I see this methods generate good segmentations, which is expected. But a more principled way to organize of those 'over-segmented patches' is expetected, instead of being rather 'ad-hoc' as it is in Hoiem's work.

Image-based mate matching (Proposal)

This project is to tell you how to find a suitable male or female who you think is pretty, in a systematic manner, using online resources (for CV geeks only :)).
1. Download all the image from Flickr only with geo-tag and following tags: woman, girl, single, home.... (done)
2. Select some image of girls you think is bueatiful (positive samples) and not so bueatiful (negtive samples)
3. Using any face detection algrithm (as for me i will use openCV for convience) to detect all the faces in the database you just downloaded.
4. Extract SIFT or GIST or other features from the faces you just got from the database. You can weight each feature the contribution according to their performances.
5. Train the SVM or other linear classifiers with 'positive' and 'negtive' faces in samples.
6. Use the trained SVM to find all the positive faces in database.
7. Now, you get all the females/males who is single and you think pretty, and you know where her home is.

Spatial image filtering

ICA+FFT=>Spatial image filtering
result:
in the image showed on the right, their are obvious two kind of textons: plus and minus signs. This filter can take these images and return the ICs in frequency domain with which we can do spatial filtering.
Further results, pending...
conclusion: Fourier transform perserve statistical indepence of images well. Using this techniques we can 'filter out' different shapes in images without ANY classification algorithms. This is done by fourier analysis of these images and pick the indepent frequency components.

SeamCarving

Simple implementation of Seam Carving, with a gradient cost function. I mask the letter 'I' in the IBM logo, and it removed it without infecting other parts of the image.
Reference: "Seam Carving for Content-Aware Image Resizing" by Shai Avidan and Ariel Shamir

Trajetory based event detection, Bachelor Thesis, With Prof. Zhang Rui :

Abnormal event detection, based on trajectories. A action has never or rarely been observed are abnormal events. Green lines showed in pictures above is 'normal' events, which previously saw a lot by my surveillance system.

Trajetory extraction

This is a trajectory extraction program on FilterTest.

using 1.robust motion estimation techniques and 2.feature based tracking 3. particle filtering. Trajetories are used for further grouping using a HMM-based clustring approach.
Blue points represent the background objects' interesting points. and the pink trajectories represents the moving foreground moving objects. Note: the moving shadow on the wall has been rejected any trajectories.

Binocular Vision

This is a multi-camera system, which is used to test various kinds of CV techniques adress multi-view points issues.

A simple real-time depthmap generator filter is tested. This require at least 2 camerasto work. Currently yielding low resolution result due to computational concerns.

Motion area detection using 'grass fire' model

My program finds object/areas that needs more attention. Attention is drawn by detecting saliency in color, motion or/and texture. In this case, I use motion area as 'important area' that need to pay more attention by my system. Motion detection use HS optical flow algorithm, and contect each part use grass fire model.

Multi-3D Track can track objects features

Immediate application: 3D free space version of MultiTouch. Images showed above is my small application of finger tracking. The pink lines are trajectories, linking from the previous 20 finger/palm positions up to current position, and the 'older' positions get darker when new position comes. And the width of the lines represent the relative scale.

Fourier Analysis in finding different patterns in real-time system

My tool can find different patterns in real-time system, using Fourier Analysis the left image is from AV camera, and my program finds the black dot in that vedio (top-right)


This tool can be used to extract information from highly textured background. Two images above show letter extraction.

Previous Projects I

Above image is one example of drusen detection (applied in retina images). The oval white blobs are drusens, varying in size and shape and locations (sometimes they overlap). Pink box are detected blobs and the size of the box indicates the scale of the blobs.

Image retrieval A (a screen capture of my computer vision program) find the input object in the view of a PC camera in real time. For an IBM logo, about 10 fps.
My program finds input objects in local database. It can find all the images that contain the same object as the input. A list of similarity is the output, and the most similar one is shown.

Vision-Based-Control Robot

Our 'Terrain-Walker', a vision-based control robot. The control processing unit is my computer vision program. The robot transmit video to base-station and get orders in return.

Computer Graphics

iXland, first/third person shooting game (Computer Graphics, Xbox360 project). for more information about this project at
http://mhot64.googlepages.com/
Course info: http://www.ece.unm.edu/course/ece412/final_proj.html