Conversation
jwmandeville
left a comment
There was a problem hiding this comment.
Looks good and I tested and it works on my system as well.
A few suggestions:
- Add a link to the dataset download page: https://www.cis.upenn.edu/~jshi/ped_html/
- Change the dataset paths and checkpoint paths to variables
- Change the paths to something generic like
path/to/dataset/PennFudanPedinstead of your PC's location
Thanks Jack! Just made the changes. |
xt_training/runner.py
Outdated
| y_tmp = [] | ||
| for y_i in y: | ||
| if isinstance(y_i, dict): | ||
| y_i = {k: v.to(device) for k, v in y_i.items()} | ||
| else: | ||
| y_i.to(device) | ||
| y_tmp.append(y_i) | ||
| y = y_tmp |
There was a problem hiding this comment.
Can we add a comment describing what is happening here and what the use case is?
xt_training/runner.py
Outdated
| try: | ||
| y_pred = model(x) | ||
| loss_batch = loss_fn(y_pred, y) | ||
| # The output of Object Detection is a tuple of (loss, y_pred), and when | ||
| # not in training mode (model.training==False), loss is an empty dictionary. | ||
| if isinstance(y_pred, tuple): | ||
| y_pred = y_pred[-1] | ||
| except ValueError: | ||
| # Object Detection training process requires (x, y) as input. | ||
| loss, y_pred = model(x, y) | ||
| # During training, y_pred is an empty list with len==0. | ||
| # Assign a length to it to make the PooledMean metrics work. | ||
| y_pred = [None] | ||
| # The Object Detection model already calculated the loss, | ||
| # so loss_fn should be declared to work with it. | ||
| loss_batch = loss_fn(y_pred=y_pred, y=loss) |
There was a problem hiding this comment.
I don't think we should rely on error handling to do non-erroneous control flow.
Also, in general, we shouldn't let the structure of a single object detection model impact the structure of our package so much. I think it would be better to modify the model so that it behaves normally and outputs only the predictions all the time and move the loss calculation to a separate loss function.
There was a problem hiding this comment.
Yep, I made some modification and discarded the error handling.
| elif isinstance(y, Iterable): | ||
| y = [y_i.to(device) for y_i in y] |
There was a problem hiding this comment.
I'm pretty sure this will error for the data you are expecting. If y_i is a dict, then it won't have a to method
There was a problem hiding this comment.
I think a better practice for object detection would be to put the to calls inside the forward method.
There was a problem hiding this comment.
Thanks for the suggestion Tim. I wrote an ODInterface based on the SKInterface, and moved the to calls inside the forward method.
…on the scikit learn wrappers.
…on the scikit learn wrappers.
|
@timesler The modifications I made:
|
timesler
left a comment
There was a problem hiding this comment.
In general, I still think this departs too much from our standard workflow in xt-training, because we are trying to bend over backwards to reproduce a specific tutorial, or to allow the use of the torchvision object detection structure.
In essence, object detection should have no problems conforming to the normal process:
- A dataset returns a tuple of x and y
- The model forward function accepts a single argument (x) and always returns a single variable (y_pred)
- Loss and metric functions accept y and y_pred and return scalar values
The way it's done in torchvision is different from the torchvision classification models but there's not good reason for that as far I know. I think we should build our example using a modified version of the model that doesn't do anything with loss inside the model itself.
| self.value_sum += self.latest_value.detach() * self.latest_num_samples | ||
| tmp = self.value_sum + self.latest_value.detach() * self.latest_num_samples | ||
| # self.value_sum += self.latest_value.detach() * self.latest_num_samples | ||
| self.value_sum = tmp |
There was a problem hiding this comment.
Is there any reason for this? I think it's equivalent
| """ | ||
| cls: The class that is goint to calculate the average precision. | ||
| iouthreshold: The IOU threshold to determine whether a detection is true/false positive. | ||
| method: The interpolation method to calculate the average precision. Can choose between | ||
| every point interpolation or eleven point interpolation. | ||
| """ |
There was a problem hiding this comment.
This doesn't follow our docstring style. I recommend installing the "Python Docstring Generator" extension in VSCode and configuring it to the correct style. That will make it really easy to add new docstrings with the correct format
| return newBoundingBox | ||
|
|
||
|
|
||
| class BoundingBoxes: |
There was a problem hiding this comment.
If this is intended to be imported by users, we should add a docstring. If not, it should probably be named _BoundingBoxes to indicate it's private
This PR consists of 3 parts:
yused in object detection.Change type:
Checklist: