3d point reconstruction from 2d Images
up vote
11
down vote
favorite
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
|
show 4 more comments
up vote
11
down vote
favorite
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15
|
show 4 more comments
up vote
11
down vote
favorite
up vote
11
down vote
favorite
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:
For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)
We find the corresponding points in the two images using methods like SIFT or SURF etc.
After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)
Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters
Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.
After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py
code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino
) as below:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2
from camera import Camera
import structure
import processor
import features
def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)
fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()
height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])
return points1, points2, intrinsic
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))
# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)
ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)
# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)
if d1[2] > 0 and d2[2] > 0:
ind = i
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)
if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)
fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()
But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.
python image-processing computer-vision 3d-reconstruction
python image-processing computer-vision 3d-reconstruction
edited Nov 21 at 19:42
Alexander Reynolds
8,90611639
8,90611639
asked Oct 26 at 13:38
flamelite
8011523
8011523
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15
|
show 4 more comments
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Your code doesn't include the the definition of thedino
function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15
3
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 at 8:16
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 at 8:16
1
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
1
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15
|
show 4 more comments
3 Answers
3
active
oldest
votes
up vote
0
down vote
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
up vote
-1
down vote
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
add a comment |
up vote
-1
down vote
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
up vote
0
down vote
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
add a comment |
up vote
0
down vote
up vote
0
down vote
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.
Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).
The steps of OpenSfM at a high level:
Read image exif for any prior information you can use (e.g. focal
length)Extract feature points (e.g. SIFT)
Match feature points
Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
a track instead of match(1,2), match(2,3), etc...)Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
images to the reconstruction, triangulate new points, and refine the
poses/point positions using a process called Bundle Adjustment.
Hopefully that helps.
answered Nov 21 at 19:35
Jomnipotent17
127319
127319
add a comment |
add a comment |
up vote
-1
down vote
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
add a comment |
up vote
-1
down vote
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
add a comment |
up vote
-1
down vote
up vote
-1
down vote
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
The general idea is as follows.
In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.
What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.
Here is how to do this.
First, before the loop, initialize an accumulation matrix absolute_P1
:
points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
for item in range(len-1):
# ...
Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:
# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])
abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!
if not points3d.size:
points3d = abs_tripoints3d
else:
points3d = np.concatenate((points3d, abs_tripoints3d), 1)
# ...
edited Nov 21 at 6:19
answered Nov 20 at 20:49
AldurDisciple
6,35521537
6,35521537
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
add a comment |
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
– tel
Nov 21 at 3:45
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
– AldurDisciple
Nov 21 at 6:22
add a comment |
up vote
-1
down vote
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
up vote
-1
down vote
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
add a comment |
up vote
-1
down vote
up vote
-1
down vote
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
TL;DR
You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.
The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.
code
...in progress
answered Nov 21 at 17:14
tel
4,33911429
4,33911429
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05
Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05
Your code doesn't include the the definition of the
dino
function, and neither does the code you link to. Can you please add it in?– tel
Nov 16 at 8:16
1
@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12
1
Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15