You can try to make a version of Floyd-Warshall that is faster on sparse matrices.
First, let us recall what this algorithm does:
Let $M$ be a matrix of distances. At the beginning of the algorithm $M_{i,j}$ is the weight of the edge $i \rightarrow j$. If this edge does not exist then $M_{i,j}=\infty$.
The algorithm has $V$ steps. In step $k$ of the algorithm, for every pair of nodes $i,j$ we set
$$M_{i,j} \leftarrow \min\{M_{i,j}, M_{i,k}+M_{k,j}\}.$$
Clearly if $M_{i,k}=\infty$ or $M_{k,j}=\infty$ then no update needs to be performed. Thus, in the first steps of the algorithm, we only need to perform roughly $deg_{in}(k) \cdot deg_{out}(k)$ comparisons where $deg_{in}(k)$ and $deg_{out}(k)$ denote the number of incoming and outgoing edges of the $k$-th node respectively. As the algorithm progresses, more and more entries of the matrix $M$ are filled. Hence, the last steps might take much longer.
Note that we need an efficient way to iterate only over non-infinite cells in the $k$-th row and column of the matrix. This can be done by keeping a set of incoming and outgoing edges for every node.
It appears that the first steps of the algorithm can greatly benefit from sparsity. For example, a randomly constructed graph with $E=O(V)$ suggests that the first iteration ($k=0$) is only $O(1)$ steps. If the graph is split into many connected components then the matrix $M$ would remain relatively sparse throughout the algorithm and the total runtime could be as low as $O(V)$.
On the other hand, if the graph contains just one connected component, then the last iteration $k=|V|$ is expected to take $O(V^2)$ steps. In this case the total run-time could be $O(V^3)$. As large as the non-sparse version.