Saturday, September 3, 2011

New Image Swirl Launched!

Link: http://image-swirl.googlelabs.com/html/zoom.html?q=eiffel%20tower

This summer, teamed up with two other Google interns Xin Yan and David Vetrano, I took 20% time and created a new image image swirl exploration tool. Inspired by Google maps' intuitive and smooth UI, we aimed to build a similar UI to explore large sets of images. We organize the image so that similar images are located closer on the 'image map'.

Here is what we have done:
1. Get images that returned by image search for a query word.
2. Build a large 'image map', so that similar images are located closer to each other.
3. Design a UI: 'Pan' to show different parts of the map and 'zoom' to show different granularity of similarity.

Since similar images are located tightly, we think this would work great on small display devices. We further speculate that pinch-to-zoom is a perfect match to this UI.

Check it out and let me know what do you think!

Monday, February 14, 2011

My very first paper!

This is my first peer reviewed paper in CVPR 2011.
"Image Saliency: From Local to Global Context" (PDF)

Thursday, January 6, 2011

Realtime video streaming

This is my fun project. This prototype can stream whatever on one computer screen to another computer given its IP address.
On the server side, the video is encoded in H264 format and transmitted to the client through network. Client decodes the video and display.
My goal is reduce the latency as much as I can. Currently the latency is not bad at 512x512 with normal among of motion with 30fps. See video here:
Please stay tuned.
Update: I added remote control function that transmit the mouse and keyboard events, like click and key press back to the streaming end. So the a person on the receiving end can enjoy full functionality of any software that installed on the streaming end.
See video here:


Wednesday, November 24, 2010

Safe Projector

Are you tired of annoying intense light from the the projector when you do a presentation? Maybe you should try my Safe Projector(TM). This projector will automatically prevent intensive light from projector get in the way of your presentation by detecting you.

Using a Microsoft Kinect, I can easily picked up the area of presenter, and dim the light inside that area. It's my first application of mighty Kinect! Please stay tuned!

Here is the results:
Normal projector, it throws light to the screen as well as the presenter


Safe Projector, it only throws light to the screen


HAPPY THANKSGIVING EVERYONE!!

Thursday, May 27, 2010

Joining Google

I will joint Google Computer Vision Research Group this summer, expecting to conduct some really cool large scale projects. It is my first adventure into the west coast, really really looking forward to it. Please stay tuned...

Saturday, May 1, 2010

Google Video Challenge

This is a presentation of our term project with Yuecheng Shao. Click play to start and click on the image to view next slide. Select full screen for HD. Enjoy!

Wednesday, March 17, 2010

Laser Pointer as Mouse

As a demo for our VGC club, I programmed this fun plug-in of my vision platform - a simple laser pointer mouse. it use a laser as a input and can let the user interact with large projected image.Video is here!

Structured Light

Structured light is a technique used to acquire accurate depth map of a scene. It used a projector to project coded light to the scene and then 'decode' the signal captured by the camera. Previously I call it CDMA projector because only one channel can receive its own coded light.
No dark room is need, the system is relatively robust to 'channel noise' (ambient light), and can be made into real-time because both the fresh rate of projector and the camera are very high, at least 240hz (DLP) for the projector and more than 500hz for the camera. The corrected code is shown above, the dot in the center is the signal for that channel. The two pictures on the left are two continuous frames of the scene, or 2 bit in the Gray code in CDMA, for all the channels.

The structured light system, my computer, projector and the camera.

Friday, February 12, 2010

Google Image Swirl ... Matlab version!!

Today, I implement what is so called Google Image Swirl in Google Lab webpage. I developed this without even knowing that Google has similar applications. After checking out the techniques they use in a paper from Google, it is EXACT the same method! But anyway it still worth logging my experiment.
I programmed a Matlab version (well, not pure Matlab, the main function is in C, will move to CUDA later) of image search engine, using SIFT features and network analysis techniques similar to what Google are using.
A simple test result is shown as above, the engine successfully rank images such that images from the same scene are grouped together. Codes are currently efficient enough and ready to tackle bigger image databases. If anyone interested, contact me. Google Image Swirl: http://image-swirl.googlelabs.com/

Update: the CUDA version with Matlab interface is here!! 200 images 'swirl' for just merely 30 seconds on my GTX295. I am only use one chip in 295 so it can further improve. 200 images, that's 40000 comparisons.That's about 100 times speed up to my c implementation with Matlab interface, Yay!

Thursday, January 7, 2010

Multi Layer Bayesian Network for Image Fill-in

In this term project, we proposed an new semi-supervised technique for image fill-in.A hierarchical Bayesian model is learnt from natural images and user specified image database, like face database if the image to be fill-in is a face.
Each layer is composed of a set of constellation model using the previous layer.At the first layer, for example, Gabor like image patches are learnt, higher layers are composed of several Gabors.
At each layer, MAP is used as patch detector and use this decision to guide the fill-in process from the top to bottom.

Such fill-in method can also be further developped to do: super resolution, occlusion detection, 3D from single image, etc.

similar related work by others: "Convolutional Deep
Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations" H. Lee. et.al ICML 2009.

Saturday, December 12, 2009

Party is ON

I proudly introduce our VGC Club, Prof. Janusz Konrad as co-founder.
Video Geek Club is a research group, a forum composed of undergrads and grads and professors, who are interested in exploiting super computers in following (but not restricted to) fields: computer vision, image processing, machine learning, computer graphics, etc..
As the first member and coordinator, I organize the first activity: interactive demo show at ECE Christmas party.

Please visit our website: http://blogs.bu.edu/vgc/

Friday, April 24, 2009

SeamCarving---Revised

No connectivity constraint used in this extented seamcarving. Use cost function to constraint shift in different lines, with a parameter alpha to control the weight of this cost, thus control the connectivity indirectly.
results:
The first two and the fourth use the algorithm mentioned above, the first one use the shear cost with respect to the previous carved image, and the second use the shear cost WRT to the original image, the fourth use a big alpha value. All three are compared with the third one, i.e. the scaled original image, 75% of original width is kept. Notice that lena is less narrowed (face hat and shoulder) with image carve method. If alpha is very small (first two images), the connectivity is high, and vise versa. So in the last image above (very low connectivity), artifacts are obvious. and for small alpha, seam
s tend to be connected.














The right image showed all the seams that carved out from the
 original image. The black line showed in the left image is the first seam to carved out. It is not satisfy the connectivity constraint, but can give superior results because it can achieve optimum min-cost during DP. The downside of this algorithm is the computational complexity, but, it is highly parallelable, can benifit from high performance GPU computing.


Saturday, March 7, 2009

Reconstruction of non-uniform sampling motion field

Suppose we only have the motion information of a few points in the motion field, how to recover the whole true motion field? Is that even possible? The answer is yes, because we human can do it. It turn out that we are very good at giving an unique and stable solution to this inverse problem, we can always yield sound inference of motion only based on a few parts of an object.

To tackle this problem we need a kind of interpolation, or ‘fill-in’. This kind of inverse problem is classical, but sort of tricky in such a motion estimation setting because it is not uniformly sampled, and this sampling process is highly content dependant. So no fixed 'ideal' kernel will be find because it should vary from frame to frame. Another classical problem in motion estimation is aperture problem, my method addresses this problem with an ‘iron fist’: Fill in blank area (no motion information can be readily extracted) with ANY reliable motion with in or near by that area. A KNN-based method is used here. It is not very theoretically sound suggestion but a practically reasonable.

In the images showed above, blue and green color indicates the horizontal and vertical velocity respectively, and the brightness of the color means the speed (high or low). This algorithm’s complexity is relatively low so that it can run in real-time on my laptop at a QVGA resolution.

Further work: this problem can even fit in transductive online learning framework.

Wednesday, February 4, 2009

CUDA Tested on previous projects

To integrate CUDA in to my previous projects are not very hard. the Ease to use is incomparable. What I did here is just simple call 2D fft to the real time webcam video.

Now I am planning to do more serious kernel programming. the only problem is to find an interesting project to do.

Lately, I have tried PyCUDA on my notebook, and it is AWESOME! Everything is so easy with Python, and with CUDA massive parallel computation power is near at hand! Perfect tool for every one.

Wednesday, October 29, 2008

Time domain Saliency Map in video(idea)

We use multiscale image to analysis its saliency map (and do most of other jobs), mainly because it reflect in different scale level the apperance of objects. So in video, we should be able to use multiscale in time domain (multiple time span) to capture the motion of objects.

Salient features in an image are statistically distinguishable from the background (Rosenholtz,1999; Torralba, 2003). So the salient movements in an video are also distinguishable from the background, in motion feature space.

application in top my head: distinguish a ship from sea, find something in flowing water.

Tuesday, October 21, 2008

Multi-View HDR research (finished)

EC520 course project
Berkin and I are going to take a little journey in HDRI technology.
New ideas: content-aware HDRI
Experiment Setup:

Preliminary result:
Result using 'Memorial' image sequence. Using our method, it only take seconds in Matlab for full calculation of the final HDR image. The actual dynamic range of the resulting image is very high, so we have to do linear transformation to display. In the final color image you can even see blue sky through the windows.
other results:
anlysis process of the image, final image is showed in the last image of the 6.
report:
B. Abanoz and M. Wang, "A review of high dynamic range imaging on static scenes," Tech. Rep. 2008-04, Boston University, Dept. of Electr. and Comp. Eng., Dec. 2008 (EC520 course project), [abstract] [PDF: 2,020KB].

Motion analysis Using Ultra-high Temporal Resolution Camera (Proposal)

Proposed system can detect high-speed-moving objects in heavy-ocluded environment (like forest), in which most of current motion detection technique will fail. I proposed a novel motion estimation scheme which can settle this problems and enble parellel implementations.

Some ideas towards this proposal:
1.Current motion estimation is based on two consecuted images in a video stream, about 30fps. This is not what we human do to detect motion. Although our eyes' temporal resolution is about 50fps, but it is during that period of exposure to the scene (1s/50) we perceive motion information, not after.
2.Current methods cannot tackle occlusion well unless using fancy but complicated tricks.
3.One interesting paper: CVPR 2008 "Motion from Blur"
4.Such camera is available on the market with a reasonable price, Casio EX-FH20 and EX-F1 with astonishing 1200fps!
5.The result could be carried out in parellel so that we can us GPU to do that.

Preliminary Experiment Result:
Gone are complexed feature-extraction or classic optical flow methods, just simple image-shift and sum can get the job done!

Muti-level Semantic Segmentation and Self-grouping in Scene Understanding (On-going)

The basic idea is: for each pixel in the image, find its neighbors in feature space (color, MRF, SIFT....) the feature space is ranged in a way that reflects structural 'importance'. For ocean and grass, in the case of color and markov field moldel for example, waters (or grass) will put togather as neighbors because their share similar color (or pattern). Each group of pixels in different features will jointly unveil its underlying properties.

Nov 19 2008:

Saddly, I found out today on Antonio's course 6.870 that this idea have been tested by D. Hoiem in "Geometric Context from a Single Image", ICCV 2005. http://www.cs.uiuc.edu/homes/dhoiem/projects/context/index.html but on the other side, their work showed my idea works! It is exciting when I see this methods generate good segmentations, which is expected. But a more principled way to organize of those 'over-segmented patches' is expetected, instead of being rather 'ad-hoc' as it is in Hoiem's work.

Image-based mate matching (Proposal)

This project is to tell you how to find a suitable male or female who you think is pretty, in a systematic manner, using online resources (for CV geeks only :)).
1. Download all the image from Flickr only with geo-tag and following tags: woman, girl, single, home.... (done)
2. Select some image of girls you think is bueatiful (positive samples) and not so bueatiful (negtive samples)
3. Using any face detection algrithm (as for me i will use openCV for convience) to detect all the faces in the database you just downloaded.
4. Extract SIFT or GIST or other features from the faces you just got from the database. You can weight each feature the contribution according to their performances.
5. Train the SVM or other linear classifiers with 'positive' and 'negtive' faces in samples.
6. Use the trained SVM to find all the positive faces in database.
7. Now, you get all the females/males who is single and you think pretty, and you know where her home is.

Spatial image filtering

ICA+FFT=>Spatial image filtering
result:
in the image showed on the right, their are obvious two kind of textons: plus and minus signs. This filter can take these images and return the ICs in frequency domain with which we can do spatial filtering.
Further results, pending...
conclusion: Fourier transform perserve statistical indepence of images well. Using this techniques we can 'filter out' different shapes in images without ANY classification algorithms. This is done by fourier analysis of these images and pick the indepent frequency components.

SeamCarving

Simple implementation of Seam Carving, with a gradient cost function. I mask the letter 'I' in the IBM logo, and it removed it without infecting other parts of the image.
Reference: "Seam Carving for Content-Aware Image Resizing" by Shai Avidan and Ariel Shamir

Trajetory based event detection, Bachelor Thesis, With Prof. Zhang Rui :

Abnormal event detection, based on trajectories. A action has never or rarely been observed are abnormal events. Green lines showed in pictures above is 'normal' events, which previously saw a lot by my surveillance system.

Trajetory extraction

This is a trajectory extraction program on FilterTest.

using 1.robust motion estimation techniques and 2.feature based tracking 3. particle filtering. Trajetories are used for further grouping using a HMM-based clustring approach.
Blue points represent the background objects' interesting points. and the pink trajectories represents the moving foreground moving objects. Note: the moving shadow on the wall has been rejected any trajectories.

Binocular Vision

This is a multi-camera system, which is used to test various kinds of CV techniques adress multi-view points issues.

A simple real-time depthmap generator filter is tested. This require at least 2 camerasto work. Currently yielding low resolution result due to computational concerns.

Motion area detection using 'grass fire' model

My program finds object/areas that needs more attention. Attention is drawn by detecting saliency in color, motion or/and texture. In this case, I use motion area as 'important area' that need to pay more attention by my system. Motion detection use HS optical flow algorithm, and contect each part use grass fire model.

Multi-3D Track can track objects features

Immediate application: 3D free space version of MultiTouch. Images showed above is my small application of finger tracking. The pink lines are trajectories, linking from the previous 20 finger/palm positions up to current position, and the 'older' positions get darker when new position comes. And the width of the lines represent the relative scale.

Fourier Analysis in finding different patterns in real-time system

My tool can find different patterns in real-time system, using Fourier Analysis the left image is from AV camera, and my program finds the black dot in that vedio (top-right)


This tool can be used to extract information from highly textured background. Two images above show letter extraction.

Previous Projects I

Above image is one example of drusen detection (applied in retina images). The oval white blobs are drusens, varying in size and shape and locations (sometimes they overlap). Pink box are detected blobs and the size of the box indicates the scale of the blobs.

Image retrieval A (a screen capture of my computer vision program) find the input object in the view of a PC camera in real time. For an IBM logo, about 10 fps.
My program finds input objects in local database. It can find all the images that contain the same object as the input. A list of similarity is the output, and the most similar one is shown.

Vision-Based-Control Robot

Our 'Terrain-Walker', a vision-based control robot. The control processing unit is my computer vision program. The robot transmit video to base-station and get orders in return.

Computer Graphics

iXland, first/third person shooting game (Computer Graphics, Xbox360 project). for more information about this project at
http://mhot64.googlepages.com/
Course info: http://www.ece.unm.edu/course/ece412/final_proj.html