1

I am writing a script that reads files from different directories; then I am using the file ID to search in the csv file. Here is the piece of code.

import os
import glob

searchfile = open("file.csv", "r")
train_file = open('train.csv','w')



listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
searchfile.close()
train_file.close()

However, I am only able search couple of ID's from the csv file. Can someone point out my mistake. (please see comments for description)

EDITED

Instance of the text file.

192397335,carrello porta utensili 18x27 eh l 411 x p 572 x h 872 6 cassetti,,691.74,192397335.jpg
3
  • is id meant to be the filename, without extension? Commented Apr 6, 2017 at 8:52
  • yeah. Here I extracted that id = d.split("/") id = id[-1].split(".")
    – cpwah
    Commented Apr 6, 2017 at 8:54
  • You can see the instance of the text file @asongtoruin
    – cpwah
    Commented Apr 6, 2017 at 8:59

2 Answers 2

1

Your issue is that when you do for line in searchfile: you're looping over a generator. The file doesn't reset for every id - for example, if the first id you pass to it is in line 50, the next id will start checking at line 51.

Instead, you can read your file to a list and loop over the list instead:

import os
import glob

with open("file.csv", "r") as s:
    search_file = s.readlines()

train_file = open('train.csv', 'w')

list_of_files = os.listdir("train")
for l in list_of_files:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        fname = os.path.splitext(os.path.basename(d))
        print fname[0] # ID
        for line in search_file:
            if fname[0] in line: # search in csv file
                value = line.split(",") 
                value = value[1]+" " + value[2] + "\n"
                train_file.write(fname[0]+","+value) # write description
                break

train_file.close()

I made a couple of other changes too - firstly, you shouldn't use the name id as it has meaning in Python - I picked fname instead to indicate the file name. Secondly, I canged your CamelCase names to lowercase, as is the convention. Finally, getting the file name and extension is neat and fairly consistent through a combination of os.path.splitext and os.path.basename.

3
  • Actually you want os.path.splitext(os.path.basename(d)). You may also simplify the search using glob.glob('/train/*/*.jpg') Commented Apr 6, 2017 at 9:12
  • yeah. Thank you. I was optimizing the code based on the recommendation. @asongtoruin
    – cpwah
    Commented Apr 6, 2017 at 10:20
  • I believe, I asked a pretty valid question however I still don't positive vote. I never understood that part of stackoverflow @asongtoruin
    – cpwah
    Commented Apr 6, 2017 at 10:31
1

You need to browse of lines of searchfile for each id found, but as you open the file outside of the loop, you only read each line once in the whole loop.

You should either load the whole file in a list and iterate the list of lines inside the loop, or if searchfile is really large and would hardly fit in memory reopen the file inside the loop:

List version:

with open("file.csv", "r") as searchfile:
    searchlines = searchfile.readlines()

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchlines:   # now a list so start at the beginning on each pass
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
train_file.close()

Re-open version

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        searchfile = open("file.csv", "r")
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
        searchfile.close()
train_file.close()

Not the answer you're looking for? Browse other questions tagged or ask your own question.