3d point reconstruction from 2d Images











up vote
11
down vote

favorite
1












I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:



For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)




  • We find the corresponding points in the two images using methods like SIFT or SURF etc.


  • After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)


  • Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters


  • Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.



After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:



import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)

fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()

height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])

return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))

# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)

ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)

# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)

if d1[2] > 0 and d2[2] > 0:
ind = i

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()


But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.










share|improve this question




















  • 3




    If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
    – AldurDisciple
    Nov 15 at 20:05












  • Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
    – flamelite
    Nov 16 at 4:05












  • Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
    – tel
    Nov 16 at 8:16






  • 1




    @AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
    – tel
    Nov 21 at 20:12






  • 1




    Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
    – Alexander Reynolds
    Nov 21 at 20:15

















up vote
11
down vote

favorite
1












I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:



For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)




  • We find the corresponding points in the two images using methods like SIFT or SURF etc.


  • After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)


  • Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters


  • Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.



After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:



import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)

fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()

height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])

return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))

# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)

ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)

# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)

if d1[2] > 0 and d2[2] > 0:
ind = i

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()


But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.










share|improve this question




















  • 3




    If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
    – AldurDisciple
    Nov 15 at 20:05












  • Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
    – flamelite
    Nov 16 at 4:05












  • Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
    – tel
    Nov 16 at 8:16






  • 1




    @AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
    – tel
    Nov 21 at 20:12






  • 1




    Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
    – Alexander Reynolds
    Nov 21 at 20:15















up vote
11
down vote

favorite
1









up vote
11
down vote

favorite
1






1





I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:



For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)




  • We find the corresponding points in the two images using methods like SIFT or SURF etc.


  • After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)


  • Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters


  • Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.



After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:



import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)

fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()

height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])

return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))

# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)

ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)

# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)

if d1[2] > 0 and d2[2] > 0:
ind = i

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()


But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.










share|improve this question















I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:



For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)




  • We find the corresponding points in the two images using methods like SIFT or SURF etc.


  • After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)


  • Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters


  • Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.



After going through theory section, as my first experiment I tried to run the code available here,
Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:



import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
# Dino
img1 = cv2.imread('imgs/dinos/viff.003.ppm')
img2 = cv2.imread('imgs/dinos/viff.001.ppm')
pts1, pts2 = features.find_correspondence_points(img1, img2)
points1 = processor.cart2hom(pts1)
points2 = processor.cart2hom(pts2)

fig, ax = plt.subplots(1, 2)
ax[0].autoscale_view('tight')
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].plot(points1[0], points1[1], 'r.')
ax[1].autoscale_view('tight')
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].plot(points2[0], points2[1], 'r.')
fig.show()

height, width, ch = img1.shape
intrinsic = np.array([ # for dino
[2360, 0, width / 2],
[0, 2360, height / 2],
[0, 0, 1]])

return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
print(files[item], files[(item+1)%len])
#dino() function takes 2 images as input
#and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
#print(('Length', len(points1))
# Calculate essential matrix with 2d points.
# Result will be up to a scale
# First, normalize points
points1n = np.dot(np.linalg.inv(intrinsic), points1)
points2n = np.dot(np.linalg.inv(intrinsic), points2)
E = structure.compute_essential_normalized(points1n, points2n)
print('Computed essential matrix:', (-E / E[0][1]))

# Given we are at camera 1, calculate the parameters for camera 2
# Using the essential matrix returns 4 possible camera paramters
P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
P2s = structure.compute_P_from_essential(E)

ind = -1
for i, P2 in enumerate(P2s):
# Find the correct camera parameters
d1 = structure.reconstruct_one_point(
points1n[:, 0], points2n[:, 0], P1, P2)

# Convert P2 from camera view to world view
P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
d2 = np.dot(P2_homogenous[:3, :4], d1)

if d1[2] > 0 and d2[2] > 0:
ind = i

P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
#tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

if not points3d.size:
points3d = tripoints3d
else:
points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()


But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.







python image-processing computer-vision 3d-reconstruction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 19:42









Alexander Reynolds

8,90611639




8,90611639










asked Oct 26 at 13:38









flamelite

8011523




8011523








  • 3




    If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
    – AldurDisciple
    Nov 15 at 20:05












  • Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
    – flamelite
    Nov 16 at 4:05












  • Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
    – tel
    Nov 16 at 8:16






  • 1




    @AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
    – tel
    Nov 21 at 20:12






  • 1




    Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
    – Alexander Reynolds
    Nov 21 at 20:15
















  • 3




    If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
    – AldurDisciple
    Nov 15 at 20:05












  • Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
    – flamelite
    Nov 16 at 4:05












  • Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
    – tel
    Nov 16 at 8:16






  • 1




    @AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
    – tel
    Nov 21 at 20:12






  • 1




    Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
    – Alexander Reynolds
    Nov 21 at 20:15










3




3




If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05






If you proceed like this, the 3D points reconstructed from each pair will be in different coordinate frames, so simply concatenating them will not give anything meaningful. Let's say you want to build a panorama from a series of pictures taken by rotating the camera progressively. If you just stack the pictures on top of each other, you won't get a panorama. For that you would need to shift the images as they rotate. For the point cloud it is the same, you need to align the separate point clouds consistently with one another.
– AldurDisciple
Nov 15 at 20:05














Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05






Thanks @aldurdisciple, yes I learned your point a couple of days ago. That is why I updated my question to How to merge multiple point clouds of different views?
– flamelite
Nov 16 at 4:05














Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 at 8:16




Your code doesn't include the the definition of the dino function, and neither does the code you link to. Can you please add it in?
– tel
Nov 16 at 8:16




1




1




@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12




@AlexanderReynolds Could you please add a link to a good resource describing an actual bundle adjustment algorithm/implementation?
– tel
Nov 21 at 20:12




1




1




Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15






Not sure the best resource for 3D scenes, but for 2d/panoramas, Richard Szeliski's Image Alignment and Stitching: A Tutorial is a great resource which gives a good high level overview with really great references to dig in more. Hope it's helpful.
– Alexander Reynolds
Nov 21 at 20:15














3 Answers
3






active

oldest

votes

















up vote
0
down vote













Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.



Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).



The steps of OpenSfM at a high level:




  1. Read image exif for any prior information you can use (e.g. focal
    length)


  2. Extract feature points (e.g. SIFT)


  3. Match feature points


  4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
    a track instead of match(1,2), match(2,3), etc...)


  5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
    images to the reconstruction, triangulate new points, and refine the
    poses/point positions using a process called Bundle Adjustment.



Hopefully that helps.






share|improve this answer




























    up vote
    -1
    down vote













    The general idea is as follows.



    In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.



    What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.



    Here is how to do this.



    First, before the loop, initialize an accumulation matrix absolute_P1:



    points3d = np.empty((0,0))
    files = glob.glob("imgs/dinos/*.ppm")
    len = len(files)
    absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

    for item in range(len-1):
    # ...


    Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:



    # ...
    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

    abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
    absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

    if not points3d.size:
    points3d = abs_tripoints3d
    else:
    points3d = np.concatenate((points3d, abs_tripoints3d), 1)

    # ...





    share|improve this answer























    • This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
      – tel
      Nov 21 at 3:45












    • Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
      – AldurDisciple
      Nov 21 at 6:22


















    up vote
    -1
    down vote













    TL;DR



    You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.



    The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.



    code



    ...in progress






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote













      Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.



      Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).



      The steps of OpenSfM at a high level:




      1. Read image exif for any prior information you can use (e.g. focal
        length)


      2. Extract feature points (e.g. SIFT)


      3. Match feature points


      4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
        a track instead of match(1,2), match(2,3), etc...)


      5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
        images to the reconstruction, triangulate new points, and refine the
        poses/point positions using a process called Bundle Adjustment.



      Hopefully that helps.






      share|improve this answer

























        up vote
        0
        down vote













        Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.



        Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).



        The steps of OpenSfM at a high level:




        1. Read image exif for any prior information you can use (e.g. focal
          length)


        2. Extract feature points (e.g. SIFT)


        3. Match feature points


        4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
          a track instead of match(1,2), match(2,3), etc...)


        5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
          images to the reconstruction, triangulate new points, and refine the
          poses/point positions using a process called Bundle Adjustment.



        Hopefully that helps.






        share|improve this answer























          up vote
          0
          down vote










          up vote
          0
          down vote









          Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.



          Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).



          The steps of OpenSfM at a high level:




          1. Read image exif for any prior information you can use (e.g. focal
            length)


          2. Extract feature points (e.g. SIFT)


          3. Match feature points


          4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
            a track instead of match(1,2), match(2,3), etc...)


          5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
            images to the reconstruction, triangulate new points, and refine the
            poses/point positions using a process called Bundle Adjustment.



          Hopefully that helps.






          share|improve this answer












          Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.



          Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).



          The steps of OpenSfM at a high level:




          1. Read image exif for any prior information you can use (e.g. focal
            length)


          2. Extract feature points (e.g. SIFT)


          3. Match feature points


          4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into
            a track instead of match(1,2), match(2,3), etc...)


          5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add
            images to the reconstruction, triangulate new points, and refine the
            poses/point positions using a process called Bundle Adjustment.



          Hopefully that helps.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 at 19:35









          Jomnipotent17

          127319




          127319
























              up vote
              -1
              down vote













              The general idea is as follows.



              In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.



              What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.



              Here is how to do this.



              First, before the loop, initialize an accumulation matrix absolute_P1:



              points3d = np.empty((0,0))
              files = glob.glob("imgs/dinos/*.ppm")
              len = len(files)
              absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

              for item in range(len-1):
              # ...


              Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:



              # ...
              P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
              tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

              abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
              absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

              if not points3d.size:
              points3d = abs_tripoints3d
              else:
              points3d = np.concatenate((points3d, abs_tripoints3d), 1)

              # ...





              share|improve this answer























              • This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
                – tel
                Nov 21 at 3:45












              • Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
                – AldurDisciple
                Nov 21 at 6:22















              up vote
              -1
              down vote













              The general idea is as follows.



              In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.



              What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.



              Here is how to do this.



              First, before the loop, initialize an accumulation matrix absolute_P1:



              points3d = np.empty((0,0))
              files = glob.glob("imgs/dinos/*.ppm")
              len = len(files)
              absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

              for item in range(len-1):
              # ...


              Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:



              # ...
              P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
              tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

              abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
              absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

              if not points3d.size:
              points3d = abs_tripoints3d
              else:
              points3d = np.concatenate((points3d, abs_tripoints3d), 1)

              # ...





              share|improve this answer























              • This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
                – tel
                Nov 21 at 3:45












              • Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
                – AldurDisciple
                Nov 21 at 6:22













              up vote
              -1
              down vote










              up vote
              -1
              down vote









              The general idea is as follows.



              In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.



              What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.



              Here is how to do this.



              First, before the loop, initialize an accumulation matrix absolute_P1:



              points3d = np.empty((0,0))
              files = glob.glob("imgs/dinos/*.ppm")
              len = len(files)
              absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

              for item in range(len-1):
              # ...


              Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:



              # ...
              P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
              tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

              abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
              absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

              if not points3d.size:
              points3d = abs_tripoints3d
              else:
              points3d = np.concatenate((points3d, abs_tripoints3d), 1)

              # ...





              share|improve this answer














              The general idea is as follows.



              In each iteration of your code, you compute the relative pose of the right camera with respect to the left. Then you triangulate the 2D points and concatenate the resulting 3D points in a big array. But the concatenated points are not in the same coordinate frame.



              What you need to do instead is to accumulate the estimated relative poses in order to maintain an absolute pose estimate. Then you can triangulate the 2D points as before, but before concatenating the resulting points, you need to map them to the coordinate frame of the first camera.



              Here is how to do this.



              First, before the loop, initialize an accumulation matrix absolute_P1:



              points3d = np.empty((0,0))
              files = glob.glob("imgs/dinos/*.ppm")
              len = len(files)
              absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

              for item in range(len-1):
              # ...


              Then, after the feature triangulation, map the 3D points to the coordinate frame of the first camera and update the accumulated pose:



              # ...
              P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
              tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

              abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
              absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

              if not points3d.size:
              points3d = abs_tripoints3d
              else:
              points3d = np.concatenate((points3d, abs_tripoints3d), 1)

              # ...






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 21 at 6:19

























              answered Nov 20 at 20:49









              AldurDisciple

              6,35521537




              6,35521537












              • This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
                – tel
                Nov 21 at 3:45












              • Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
                – AldurDisciple
                Nov 21 at 6:22


















              • This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
                – tel
                Nov 21 at 3:45












              • Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
                – AldurDisciple
                Nov 21 at 6:22
















              This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
              – tel
              Nov 21 at 3:45






              This answer is not right. Here's a figure with what it produces for the first two sets of dino images. It just gets worse from there.
              – tel
              Nov 21 at 3:45














              Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
              – AldurDisciple
              Nov 21 at 6:22




              Thanks for the feedback @tel, I must have misunderstood the camera pose convention OP is using. I updated the second code snippet, it should work better now.
              – AldurDisciple
              Nov 21 at 6:22










              up vote
              -1
              down vote













              TL;DR



              You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.



              The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.



              code



              ...in progress






              share|improve this answer

























                up vote
                -1
                down vote













                TL;DR



                You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.



                The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.



                code



                ...in progress






                share|improve this answer























                  up vote
                  -1
                  down vote










                  up vote
                  -1
                  down vote









                  TL;DR



                  You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.



                  The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.



                  code



                  ...in progress






                  share|improve this answer












                  TL;DR



                  You may not be able to get the full 3D reconstruction you want by just combining all of the 2 image reconstructions. I tried to do this in many different ways, and none of them worked. Basically, the failures all seem to boil down to noise in the 2 image pose estimation algorithm, which frequently produces unreasonable results. Any attempt to track the absolute pose by simply combining all of the 2 image poses just propagates the noise throughout the reconstruction.



                  The code in the repo that the OP is working with is based on a textbook, Multiple View Geometry in Computer Vision. Chapter 19 cites a paper that discusses a successful 3D reconstruction of the dinosaur sequence, and their approach is somewhat more involved. In addition to 2 image reconstructions, they also use 3 image reconstructions, and (maybe most importantly) a fitting step at the end that helps to ensure that no single spurious result ruins the reconstruction.



                  code



                  ...in progress







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 at 17:14









                  tel

                  4,33911429




                  4,33911429






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53010027%2f3d-point-reconstruction-from-2d-images%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to change which sound is reproduced for terminal bell?

                      Can I use Tabulator js library in my java Spring + Thymeleaf project?

                      Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents