Dear author, I noticed that you claim "Furthermore, if certain pixels in the input frames are unwanted (e.g., sea, glass), you can mask them by setting the corresponding pixel values to 0 or 1, and then pass the masked images to the model."
But I found when I masked out the walking person in this video, the model takes the black part as a black walking person rather than non-sense parts. So how could we achieve the masking similar to in VGGsfm or colmap? Like ignoring the masked pixels during feature matching.


Dear author, I noticed that you claim "Furthermore, if certain pixels in the input frames are unwanted (e.g., sea, glass), you can mask them by setting the corresponding pixel values to 0 or 1, and then pass the masked images to the model."
But I found when I masked out the walking person in this video, the model takes the black part as a black walking person rather than non-sense parts. So how could we achieve the masking similar to in VGGsfm or colmap? Like ignoring the masked pixels during feature matching.