Yes, you have to report all of your data. You also need additional expertise in experimental design, statistics, the scientific method and Quality Control/Quality Assurance.
Say I have a hypothesis and then did the experiment 10 times and collected 10 data points.
The way this is described, the 'experiment' is your one independent data point. The repeat executions of that experiment do not qualify as additional unique data points because they should be highly correlated... but they arent...
Five of the data points agree well with my hypothesis, but the other five are outliers.
(Sigh.) Half of your data is not an outlier by definition. An observation doesnt become an outlier because it doesnt support your hypothesis.
I strongly believe in the validity of my hypothesis (which every experimentalist does I guess),
Stop this talk right now. This is wrong on every level. As an experimentalist, you definitely do not speak for me. The data does the speaking. Imagine this was a forensics experiment as part of a court case and you were the prosecutions witness and you said this under oath in court. Imagine that for a moment.
but I can't find an explanation for the outliers, or the possible explanations are too many.
Which is it? The hypothesis you do advance has to be testable and falsifiable. Remember, that your different experiments are the data points, and NOT the unique repetitions of the same experiment. You clearly have enough explanations. If you have too many explanations, reformulate the question. over and over and over.
Can I still publish with five good data points?
Of course you can submit all the data. It sounds to me like you have bigger problems.
How should I deal with the other five?
Include the single experiment-with-multiple-repetitions as 'not supporting our hypothesis'. Formulate a testable, falsifiable, defensible, reproducible hypothesis that can be tested and muse on how that should be tested as 'future work'.
Get some help in experimental design, statistics, quality.
I assume it's unethical to not report them, right?
It is unethical yes, but it goes well beyond unethical. The phrase I've heard at work is that you 'know enough to be dangerous'. The path you are dancing around is a mix of fraud and negligence and needs to be addressed today.
Update: Sorry I didn't make my situation clear. We did use a lot of different techniques to test our hypothesis.
Good. These are your 'unique data points'. As written, you only have 'one outlier', not 5. Behold the magic of statistics!
The 50% outlier is just one technique we used. But the other techniques we used all converge very well and support the hypothesis.
Great! Time to submit, pending the new hypothesis for explaining the new/repeat observations.
The 50% outlier for that one technique probably only accounts for 10% of the total data points of all the techniques we used.
Apparently there is a complete lack of uncertainty analysis, statistics, quality control etc. The technical term is 'physics envy' :)
That's what bothers me and my question is
- Can I only report data points for the other techniques we used?
No, but you can change the 'resolution' of reporting from unique executions to unique approaches. Changing the 'resolution' of your aggregate data is the best, most common, most defensible, most reproducible way (that i know of) to fundamentally alter (shall we say 'improve') the data you have available.
- If I have to report data for the technique that has a lot of outliers, do I have to find a reasonable explanation for the outliers?
You cannot have 'a lot of outliers'. That is contrary the definition of an outlier. You can have 'too many'.
You need something that is testable and falsifiable at a minimum. My personal suggestion is not to 'guess the correct answer' but to advance multiple, multiple hypothesis. If you advance multiple hypothesis, the odds are better you will list the 'correct' one. You want a path forward, this is the opportunity to lay that path. With multiple hypothesis advanced, you have room for natural selection to run its course and let the data select the most robust hypothesis.