Why does csv.reader with TextIOWrapper include new line characters?

I have two functions, one downloads individual csv files and the other downloads a zip with multiple csv files.

The download_and_process_csv function works correctly and seems to replace new line characters with a space.

'Chicken, water, cornmeal, salt, dextrose, sugar, sodium phosphate, sodium erythorbate, sodium nitrite. Produced in a facility where allergens are present such as eggs, milk, soy, wheat, mustard, gluten, oats, dairy.'

The download_and_process_zip function seems to include new line characters for some reason (\n\n). I've tried newline='' in io.TextIOWrapper however it just replaces it with \r\n.

'Chicken, water, cornmeal, salt, dextrose, sugar, sodium phosphate, sodium erythorbate, sodium nitrite. \n\nProduced in a facility where allergens are present such as eggs, milk, soy, wheat, mustard, gluten, oats, dairy.'

Is there a way to modify download_and_process_zip so that new line characters are excluded/replaced or do I have to iterate over all the rows and manually replace the characters?

@request_exceptions
def download_and_process_csv(client, url, model_class):
    with closing(client.get(url, stream=True)) as response:
        response.raise_for_status()
        response.encoding = 'utf-8'
        reader = csv.reader(response.iter_lines(decode_unicode=True))
        process_copy_from_csv(model_class, reader)


@request_exceptions
def download_and_process_zip(client, url):
    with closing(client.get(url, stream=True)) as response:
        response.raise_for_status()

        with io.BytesIO(response.content) as buffer:
            with zipfile.ZipFile(buffer, 'r') as z:
                for filename in z.namelist():
                    base_filename, file_extension = os.path.splitext(filename)
                    model_class = apps.get_model(base_filename)
                    if file_extension == '.csv':
                        with z.open(filename) as csv_file:
                            reader = csv.reader(io.TextIOWrapper(
                                csv_file,
                                encoding='utf-8',
                                # newline='',
                            ))
                            process_copy_from_csv(model_class, reader)

1. When invoking open() with only a file path as argument, the default arguments for the other parameters are used. The file is opened in text mode for reading by default —mode='r'. The function then returns an instance of io.TextIOWrapper because of this default mode so why wrap this in another io.TextIOWrapper?

2. The docs explicitly mentions that the newline argument should be '' but the default value for the newline argument is what you are using —newline=None.

Python csv.reader() [python.org]

If csvfile is a file object, it should be opened with newline=''.

The fix for this would be:

#•••Rest of code•••
with z.open(filename, newline='', encoding='utf-8') as csv_file:
    reader = csv.reader(csv_file) #[1]
    #•••Rest of code•••

[1] csv_file is an instance of io.TextIOWrapper so the unnecessary code was removed.

I've played around with a mock server which serves this CSV file:

"foo
bar"

The CSV has a single field, "foo\nbar", in a single row. I call a newline in the data an embedded newline.

When I use the iter_content method on the Response object:

print("Getting CSV")
resp = requests.get("http://localhost:8999/csv")
x = resp.iter_content(decode_unicode=True)

reader = csv.reader(x)
for row in reader:
    print(row)

I get the correct output, a single row prints out with a single field of data:

Getting CSV
['foo\nbar']

If I change iter_content to iter_lines, I get the wrong output:

Getting CSV
['foobar']

I suspect, based on the name, that iter_lines looks for any newline-like character sequence and stops there, before handing the line to the csv reader (without the newline), and so the embedded newline is effectively removed. I cannot speak for your result where the newline appeared to be replaced with a space... there's no replacement going on, just effectively deleting.

This popular SO, Use python requests to download CSV, asks the general question about downloading a CSV with the requests module, but every answer seems tailored to the fact that the CSV in question doesn't contain embedded newlines, and so there are a lot of answers with iter_lines. I don't know when iter_content() was added to requests, but no answer makes mention of it.

Back to Top