1

I have a lot of data in a text file and I'm trying to figure out the fastest way to convert what are individual lines in the text file to a single cell entry.

Each Person has a number of entries that they may have entered, and I want each Person to have all the information entered in a single cell. So instead of having Bob's Name, Address, and Phone in three separate cells, I am trying to have Bob have a single cell, and each of those three entries should be on a new line WITHIN the cell.

So instead of this:
    -----------------------
    |Name: Bob             |
     -----------------------
    |Address: 123 Main St. |
    -----------------------
    |Phone Number: 555-5555|
    ------------------------
    |                      |
    -----------------------
    |Name: Jerry           |
     -----------------------
    |Address: 355 Main St. |
    ------------------------



I want this:
    -----------------------
    |Name: Bob             |
    |Address: 123 Main St. |
    |Phone Number: 555-5555|
    ------------------------
    |Name: Jerry           |
    |Address: 355 Main St. |
    ------------------------

Here is a screen shot if it makes things clearer:

The left side is what is getting imported, the right is what I want

I would love if there was an automated way to do this (a macro or anything else) since I have to do this so often.

greyspace
  • 113
  • 4
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jan 29 '23 at 20:53
  • This is most likely a few lines of e.g. Python code, to merge those fields. Take a few lines from your file, replace names and addresses with "fictive" ones. and paste them into you posting as an example. Oh, and mark that as "code" e.g. by selecting it and pressing `CTRL+K` – Hannu Jan 29 '23 at 21:20
  • @hannu I have updated the post. One of the things that makes it difficult is that there are variable numbers of lines which I tried to demonstrate (Bob and Jerry don't have the same number of entries). The current separator in the text file is just an empty line which imports as an empty cell (also demonstrated in the code field above) – greyspace Jan 29 '23 at 21:39
  • Please show what you have tried to solve the problem, even if not successful. If every "group" starts with the `Name:` line, then this can be done easily in Power Query, available in Windows Excel 2010+ and Excel 365. – Ron Rosenfeld Jan 30 '23 at 01:10
  • @RonRosenfeld Every group does not start with the same entry. Some are missing entries and they're never in the same order. I have mostly tried converting the original text file into a format that Excel might understand the formatting of. I've tried replacing ^p in word (which finds where new lines are happening) with a character Excel might use to create a new line within the cell. I've tried /n, ^p, and CHAR(10) but excel imports these as text. I've tried these using Text Import Wizard as well as through the insert data function under the Data Tab – greyspace Jan 30 '23 at 01:36
  • I can get it to work taking a ton of steps but I hope to automate this somehow. If I replace all double line breaks with =, then all line breaks with the @ character, then all = with line breaks again, I can import the file and then find and replace the @ character with control-J which is the new line character. But that takes forever and is mostly done in Microsoft Word instead of Excel – greyspace Jan 30 '23 at 01:53
  • 2
    Will there **always** be a blank line between data sets? What order do you want the results? Seems doable in PQ – Ron Rosenfeld Jan 30 '23 at 02:14
  • @RonRosenfeld Yes! That is actually the only true common thing among all entries. All entries, no matter what the text says, begin with an empty line and end with an empty line. It makes me think it should be easy but my lack of familiarity with Excel is making me struggle hard – greyspace Jan 30 '23 at 02:17
  • And what about the output order -- same as input or some defined order? If a defined order, what are the possible entries and what should that order be?? – Ron Rosenfeld Jan 30 '23 at 02:20
  • @RonRosenfeld Sorry I missed that part of the question. I want them in the order they were entered. So the blank marks a new cell, then the first line after the blank is the first line, the second after it the second, etc, until the next blank. – greyspace Jan 30 '23 at 02:27

2 Answers2

2

Power Query Solution: To use Power Query

  • Select some cell in your Data Table
  • Data => Get&Transform => from Table/Range or from within sheet
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to understand the algorithm

M Code

let

//change next line to reflect actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],

//set data type to text
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),

//add blank row at bottom of table
    #"Add blank row" = Table.FromRecords(Table.ToRecords(Source) & {[Column1=null]} ),

//create a column for grouping of the data sets
    #"Added Index" = Table.AddIndexColumn(#"Add blank row", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Grouper", each if [Column1]= null  then [Index] else null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
    #"Filled Up" = Table.FillUp(#"Removed Columns",{"Grouper"}),

//Group on the grouper column
//then combine the data
    #"Grouped Rows" = Table.Group(#"Filled Up", {"Grouper"}, {
        {"Data", each "#(lf)" & Text.Combine([Column1], "#(lf)"), type text}
        }),

    #"Remove Grouper Column" = Table.RemoveColumns(#"Grouped Rows",{"Grouper"})
in
    #"Remove Grouper Column"

enter image description here

Ron Rosenfeld
  • 8,198
  • 3
  • 13
  • 17
  • This is nearly perfect! In your example, and in my own results, all the results have a first line that is blank: Is there a way to remove that line? – greyspace Jan 30 '23 at 03:37
  • @greyspace That is what you showed in your screenshot. If you don't want it, just remove the initial `"#(lf)" &` in the `Table.Group` aggregation list. . – Ron Rosenfeld Jan 30 '23 at 03:50
  • I didn't realize I had done that. Thank you so much this is perfect! – greyspace Jan 30 '23 at 03:52
0

With Python3 installed...

Test data:

H:\> type x.txt
Name: Bob
Address: 123 Main St.
Phone number: 555-5555

Name: Jerry
Address: 355 Main St.

The script required:

H:\> type x.py
#/usr/bin/env python

import sys

person=[]
for line in sys.stdin:
  line=line.strip(' \r\n') # remove spaces, CR and LF
  if line!='':
    # not empty, append to grouped data about person
    person.append(line)
  else:
    # empty line; print read data, get ready for next person...
    person='\n'.join(person) # join read lines with LF between
    print(f'"{person}"\n') # output one person data
    person=[] # get ready for next person

And the test run...

H:\> python x.py <x.txt
"Name: Bob
Address: 123 Main St.
Phone number: 555-5555"

"Name: Jerry
Address: 355 Main St."

To actually do the conversion into a file that is importable into Excel or LibreOffice:

H:\> python x.py <x.txt >out.txt
Hannu
  • 8,740
  • 3
  • 21
  • 39