1
\$\begingroup\$

I've build simple application that allows me to generate all numbers that have desired length and start with prefix.
For example if I specify 12 as prefix and set desired length to 4 it will generate numbers from 1200 to 1299. I'm storing all generated numbers in List<int> then using shuffe method from Jon's answer I'm saving them in pseudo-random order to files, 200000 records per file.

This is my code:

public partial class Form1 : Form
{
    private List<int> _prefixes;
    private List<int> _results;
    private readonly Object _resultsLock = new Object();   

    public Form1()
    {
        InitializeComponent();
    } 

    private void button1_Click(object sender, EventArgs e)
    {
        _prefixes  = new List<int>();
        _results = new List<int>();
        int i = 0;

        foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
        {
            _prefixes.Add(i);
        }

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        _prefixes.AsParallel().ForAll(item =>
        {
            string prefix = item.ToString();
            //here I'm determining how many numbers i must generate?
            //can't his be done simpler?
            var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));

            for (int j = 0; j < count; j++)
            {
                var res = prefix + j.ToString().PadLeft(9 - prefix.Length, '0');
                lock (_resultsLock)
                {
                    _results.Add(Convert.ToInt32(res));
                }
            }
        });

        stopwatch.Stop();
        Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
    }

    //Can't this be done simpler and also parallel?
    private void SaveClick(object sender, EventArgs e)
    {
        const string dir = @"C:\TESTS";
        int fileCount = 1;
        var file = Path.Combine(dir, string.Format("{0}.csv", fileCount));
        var sw = new StreamWriter(file, false);

        int i = 0;

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        var rnd = new Random();
        foreach (int res in _results.Shuffle(rnd))
        {
            sw.WriteLine(res);
            i++;
            if (i % 200000 != 0) continue;
            fileCount++;
            sw.Close();
            file = Path.Combine(dir, string.Format("{0}.csv", fileCount));
            sw = new StreamWriter(file, false);
        }
        sw.Close();

        stopwatch.Stop();
        Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
    }
}

For 200 prefixes all 9 digits numbers are generated on my PC in about 80-90 seconds, when I add some more prefixes I get OutOfMemory exception.

Saving to files takes about 6-8 minutes, probably because I store 200000 results per file and I have more than 100 milion of generated results.

I'd like to optimize this as much as possible, time is priority, but memory usage is most important.

All suggestions are welcome!


First fix (thanks to RobH) - remove AsParallel

private void button1_Click(object sender, EventArgs e)
{
    _prefixes  = new List<int>();
    _results = new List<int>();
    int i = 0;

    foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
    {
        _prefixes.Add(i);
    }

    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    foreach (var p in _prefixes)
    {
        string prefix = p.ToString();
        var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));
        for (int j = 0; j < count; j++)
        {
            var res = prefix + j.ToString().PadLeft(9 - prefix.Length, '0');
            _simpleResults.Add(Convert.ToInt32(res));
        }
    }

    stopwatch.Stop();
    Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
}

This decreased time by 40 seconds (almost 50%), from 80-90, to about 40-50 seconds. Thanks!

\$\endgroup\$
5
  • \$\begingroup\$ Parallel operations with a lock in a tight loop are going to be really really slow. I bet it's faster if you remove the AsParallel and get rid of the locking. \$\endgroup\$
    – RobH
    Commented Oct 13, 2015 at 10:11
  • \$\begingroup\$ @RobH I'll try that right away. \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 10:12
  • \$\begingroup\$ @RobH You were right, now it generates same result set in about 40-45 seconds. I'll correct that in my question. What about rest of code? Maybe You could suggest more improvements? :) \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 10:17
  • \$\begingroup\$ @Heslacher It is a textbox in which I can paste all prefixes as new lines, for example 123, 1254, 22456. So they can be different length. \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 10:58
  • \$\begingroup\$ I have rolled back the last edit. Please see what you may and may not do after receiving answers. \$\endgroup\$
    – Heslacher
    Commented Oct 13, 2015 at 15:01

2 Answers 2

2
\$\begingroup\$
//here I'm determining how many numbers i must generate?
//can't his be done simpler?
var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));

Yes, I think it can be simpler. You basically want to generate from the minimum value to the maximum value while padding zeros to make sure everything is the same length as enteredLength - prefixLength.

I think you would be better off doing something like this:

List<string> generatedItems = new List<string>();
int lengthToGenerate = enteredLength - prefix.Length;

for (int i = int.Parse(prefix + new string('0', lengthToGenerate)); i <= int.Parse(prefix + new string('9', lengthToGenerate)); i++)
{
    generatedItems.Add(i.ToString());
}

Let's take a closer look at this. You'll notice that I save the generated items as a collection of strings. We can do this because they need to be written to a file anyway so it doesn't matter if it's a string or an integer.

int i = int.Parse(prefix + new string('0', lengthToGenerate)); i <= int.Parse(prefix + new string('9', lengthToGenerate))

This line of code allows us to loop from 0 to 9 or 99 or 999 whatever length we need. In effect this means we can iterate every value inbetween.

Afterwards we create our entire new string by simply calling .ToString() on the generated number.

The difference with your code is that it now skips every Convert.ToInt32() call and a whole lot of padding.


Since it's going to be a bunch of unique values anyway, you might as well use a HashSet<T> instead of a List<T>.


You're constantly locking the result set. It's probably faster to just omit the parallellism altogether since we're not doing that much work anyway but if you really want to have it, I would suggest looking into splitting the to-generate dataset beforehand, computing the data and storing it in separate collections and at the end combining those collections. Though that might be slightly overdoing things.


Automatically dispose of your StreamWriter by using a using block:

foreach(int res in _results.Shuffle(rnd))
{
    using(var sw = new StreamWriter(file, false)
    {
        // Get funky
    }
}

\$\endgroup\$
7
  • \$\begingroup\$ Thank You for all those suggestions. as first I've tried using List<string> but I got OutOfMemory exceptions a lot. I'll try to apply all improvements suggested by You and I'll get back to You. P.S. You suggested using Take to get items I must save it single file, How to use it to save all items? How should I use it in loop? \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 10:48
  • \$\begingroup\$ Mhhm yeah, I guess that a string will take more memory than an integer. I might consider writing the data to a file sooner so you can let the garbage collector collect those unnecessary strings but that would require a change in the way your application works. The Take() suggestion wasn't a good one -- I removed it. \$\endgroup\$ Commented Oct 13, 2015 at 10:53
  • \$\begingroup\$ I don't mind changing my application. It must generate all numbers from given prefixes, shuffle them and save to files, 200 000 records per file. I know how to use StreamWriter in using but I must create new file for each group (200 000 records) \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 10:56
  • \$\begingroup\$ You can just put the file reassignment above the using(StreamWriter) statement and it will use the new file for each new writer. \$\endgroup\$ Commented Oct 13, 2015 at 10:59
  • \$\begingroup\$ I've edited my question with fix based on Your suggestion. Thank You for that! Only writing results to files are left. \$\endgroup\$
    – Misiu
    Commented Oct 13, 2015 at 11:20
2
\$\begingroup\$

In addition to what has been said already (and if I understood the problem correctly), I'd say that you are complicating things too much. Let's think about it (on a single prefix for now):

  1. We have a prefix indicated by a string (that has a numeric format alright, but it is a string)
  2. We have a max length, let's call it maxLength, of the numbers you can consider.
  3. The allowed length of the prefix, let's call it prefixLength, should be <= maxLength
  4. We need all the numbers that start with the given prefix and have the given maxLength

Given the premises, we could do the following (in pseudo code, similar to C#):

List<int> GetAllNumbers(string prefix, int maxLength)
{
    int minNumber = int.Parse(prefix);
    int maxNumber = minNumber + 1;
    int lengthDifference = maxLength - prefix.Length;
    List<int> result = new List<int>();

    for(int i = 0; i < lengthDifference; i++)
    {
        minNumber *= 10;
        maxNumber *= 10;
    }

    for(int i = minNumber; i < maxNumber; i++)
    {
        result.Add(i);
    }

    return result;
}

Let's take an example. If we have that prefix = "15" and maxLength = 5 we'd need the numbers from 15000 to 15999. We just parse prefix to int, multiply it by 10 for 3 times (length of prefix is 2 and maxLength is 5) and return the numbers from 15000 to 15999 (maxNumber is set to 16000 which is the first value that is not allowed).

In addition, if you just need an IEnumerable<int> the previous method would be something like the following:

IEnumerable<int> GetAllNumbers(string prefix, int maxLength)
{
    int minNumber = int.Parse(prefix);
    int maxNumber = minNumber + 1;
    int lengthDifference = maxLength - prefix.Length;

    for(int i = 0; i < lengthDifference; i++)
    {
        minNumber *= 10;
        maxNumber *= 10;
    }

    for(int i = minNumber; i < maxNumber; i++)
    {
        yield return i;
    }
}

The methods are not tested, they are just an idea of how to approach the problem for a single prefix.

In the case of multiple prefixes we could also execute the method in a different thread for each prefix.

Also, I'd suggest you transform private List<int> _prefixes; into private IEnumerable<int> _prefixes; or, better yet, into private IEnumerable<string> _prefixes; (see point 1 and this page for the reason why). In the last case we could transform:

    _prefixes  = new List<int>();

    // other code here

    foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
    {
        _prefixes.Add(i);
    }

into something like:

    _prefixes  = AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i));

After that we just need to do something like:

foreach (var prefix in _prefixes)
{
    _results.AddRange(GetAllNumbers(prefix, maxLength));
}

and then proceed as you are already doing (randomize the _results list and write it in a file, or whatever the case may be).

Let me know if something is unclear.

\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.